ML & Cybersecutity Project

Anomaly Detection in IoT Network Traffic

Developed an anomaly detection pipeline on the CTU-IoT Malware dataset to identify malicious vs. benign network connections. This project demonstrates applying unsupervised learning techniques to cybersecurity data, including feature engineering, preprocessing, and model evaluation.

Key Contributions:

  • Exploratory Data Analysis: Inspected 23 features across 23k+ network traffic entries. Explored data types, distributions, and correlations.
  • Data Preprocessing:
    • Handled missing values and categorical encodings.
    • Converted IPs and ports into categorical features.
    • Engineered new features (e.g., rolling connection counts over time windows).
  • Pipeline Construction: Built preprocessing pipelines with ColumnTransformer and Pipeline to standardize numeric features and encode categorical ones.
  • Anomaly Detection Models: Experimented with clustering and unsupervised methods to detect unusual patterns that may indicate malware activity.
  • Cybersecurity Application: Interpreted anomalies in the context of malicious traffic detection.

Skills Demonstrated:

  • Machine Learning (unsupervised learning, clustering, anomaly detection)
  • Feature engineering for network traffic data
  • Python ML stack: pandas, scikit-learn, numpy, matplotlib, seaborn
  • Cybersecurity analytics (malware/attack traffic detection)
  • Model pipeline design & evaluation

Links:

Other Projects