ML Pipelines for Supervised Learning

Regression & Classification Project

Developed supervised learning pipelines in Python to solve both regression and classification problems using real datasets. This project demonstrates the ability to preprocess structured data, build scikit-learn pipelines, train multiple models, and evaluate predictive performance with appropriate metrics.

Key Contributions:

  • Regression Task (Customer Attributes Dataset):
    • Target: Customer age prediction from demographic and behavioral features.
    • Implemented preprocessing with SimpleImputer, StandardScaler, and OneHotEncoder.
    • Trained Linear Regression and Random Forest Regressor, compared performance with MSE and R².
  • Classification Task (Sports Betting Dataset):
    • Target: Match outcome prediction (Actual Winner vs Predicted Winner).
    • Features: Team names, betting odds, draw odds, etc.
    • Built pipelines with categorical encodings and scaling.
    • Trained Decision Tree and Logistic Regression classifiers, evaluated with accuracy, precision, and recall.

Skills Demonstrated:

  • Machine Learning (supervised regression & classification)
  • Data preprocessing (handling missing values, categorical encoding, scaling)
  • Pipeline construction with ColumnTransformer and Pipeline
  • Model evaluation (MSE, R², accuracy, precision, recall)
  • Python ML stack: scikit-learn, pandas, numpy, matplotlib, seaborn

Links:

Other Projects