Machine Learning + NLP Project

Resume Selector using Naive Bayes

Coursera Guided Project: Built and trained a Naive Bayes classifier to automatically classify resumes as flagged or not flagged. The project applied natural language processing (NLP) techniques to preprocess resume text, vectorize features, and evaluate model performance.

Key Contributions:

Data preprocessing: Cleaned raw resume text, removed stopwords, lemmatized/stemmed tokens, and applied custom preprocessing with gensim and nltk.
Exploratory Data Analysis: Examined class imbalance (92 not-flagged vs. 33 flagged resumes), visualized distributions, and generated word clouds for each category.
Feature engineering: Converted resume text into numerical features using CountVectorizer (bag-of-words).
Model training: Trained a Multinomial Naive Bayes classifier on 125 labeled resumes.
Evaluation: Achieved near-perfect results with:
- Precision, Recall, F1-score = 1.00 (20% test split)
- Accuracy = 97% (30% test split)
- Visualized confusion matrices with seaborn heatmaps.

Skills Demonstrated:

Machine Learning (Naive Bayes classification)
Natural Language Processing (text preprocessing, tokenization, stopword removal)
Feature extraction (bag-of-words, CountVectorizer)
Model evaluation (classification reports, confusion matrices)
Python ML stack: scikit-learn, nltk, gensim, pandas, matplotlib, seaborn

Project Report PDF: Resume Selector with Naive Bayes