MITRE ATT&CK Techniques

Machine Learning Pipeline to Predict MITRE Att&cks

Developed a machine learning pipeline to predict MITRE ATT&CK techniques based on categorical features from a cybersecurity dataset. This project combined cybersecurity domain knowledge with supervised ML to explore whether attack attributes (tools, targets, tags) could predict underlying adversarial techniques.

Key Contributions:

Preprocessed a noisy, multi-label dataset from Kaggle, fixing inconsistent labels, handling missing targets, and applying dimensionality reduction through binning.
Engineered a custom preprocessing pipeline with ColumnTransformer, OneHotEncoder, and a MultiLabelBinarizer transformer to handle categorical and list-type features.
Trained and compared multiple models:
- Naive Bayes (baseline, poor fit)
- Linear SGD (one-vs-rest) for high-dimensional classification
- LightGBM for gradient boosting on large, sparse features
Conducted exploratory data analysis, revealing skewed distributions and unexpected co-occurrences (e.g., tools like Burp Suite mapping to specific techniques).
Documented challenges such as RAM overload, training inefficiencies, and imbalance issues, with reflections on future improvements.

Skills Demonstrated:

Cybersecurity analytics (MITRE ATT&CK framework)
Machine learning (multi-label classification, high-dimensional sparse features)
Python ML stack: scikit-learn, LightGBM, pandas, numpy
Feature engineering, label encoding, and handling imbalanced datasets
Critical reflection and iteration on preprocessing + modeling

Links:

Other Projects

Web App Backend

The project enhances the CS:GO gaming experience by offering a user-friendly platform for easily finding, comparing, and purchasing in-game skins with secure transactions.

Discover

Algorithms in Python

Projects demonstrating coding fundamentals, algorithms, probability, reinforcement learning, and Machine Learning using Python.