Machine Learning Pipeline

Social Media Engagement Prediction

Developed a complete machine learning pipeline to predict engagement on social media posts using structured data. This project demonstrates data cleaning, feature preparation, preprocessing pipelines, model training, and evaluation in scikit-learn.

Key Contributions:

  • Dataset: Social media posts dataset with features such as platform, post type, post time, likes, comments, shares, and sentiment.
  • Exploratory Data Analysis (EDA): Inspected distributions, correlations, and engagement patterns by platform and post type.
  • Data Preprocessing:
    • Scaled numeric features (likes, comments, shares) with StandardScaler.
    • Encoded categorical features (platform, post_type, post_day, sentiment_score) using OneHotEncoder.
  • Pipeline Construction: Built a full scikit-learn Pipeline combining preprocessing with model training.
  • Model Training:
    • Tested multiple models, including Logistic Regression and Decision Tree Classifier.
    • Evaluated with accuracy scores on test data.
  • Reflection: Explained design choices for feature prep, imputation (none required), and model selection.

Skills Demonstrated:

  • Machine Learning (classification pipelines)
  • Feature scaling, encoding, and preprocessing with ColumnTransformer
  • Pipeline design & evaluation in scikit-learn
  • EDA and visualization with Pandas, Matplotlib, Seaborn
  • Structured documentation of methodology & reasoning

Links:

Other Projects