Machine Learning Pipeline
Social Media Engagement Prediction
Developed a complete machine learning pipeline to predict engagement on social media posts using structured data. This project demonstrates data cleaning, feature preparation, preprocessing pipelines, model training, and evaluation in scikit-learn.
Key Contributions:
- Dataset: Social media posts dataset with features such as platform, post type, post time, likes, comments, shares, and sentiment.
- Exploratory Data Analysis (EDA): Inspected distributions, correlations, and engagement patterns by platform and post type.
- Data Preprocessing:
- Scaled numeric features (likes, comments, shares) with StandardScaler.
- Encoded categorical features (platform, post_type, post_day, sentiment_score) using OneHotEncoder.
- Pipeline Construction: Built a full scikit-learn Pipeline combining preprocessing with model training.
- Model Training:
- Tested multiple models, including Logistic Regression and Decision Tree Classifier.
- Evaluated with accuracy scores on test data.
- Reflection: Explained design choices for feature prep, imputation (none required), and model selection.
Skills Demonstrated:
- Machine Learning (classification pipelines)
- Feature scaling, encoding, and preprocessing with ColumnTransformer
- Pipeline design & evaluation in scikit-learn
- EDA and visualization with Pandas, Matplotlib, Seaborn
- Structured documentation of methodology & reasoning
Links:
