Skip to content

A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes

License

Notifications You must be signed in to change notification settings

DeepraMazumder/Resume-Management-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resume Classification using Naive Bayes Classifier

Overview

This repository documents a machine learning project for the classification of resumes into different categories using the UpdatedResumeDataset.csv dataset. The project follows a systematic approach, including pre-processing, exploratory data analysis (EDA), text cleaning, stopwords removal, feature extraction, and model evaluation.

Steps

1. Read the UpdatedResumeDataset.csv dataset

  • Loaded the dataset using the pandas library to initiate the project.

2. Displayed Categories and Counts

  • Examined the distribution of resume categories within the dataset using the value_counts() method. This step provides an initial understanding of the dataset's class distribution.

3. Created a Count Plot

  • Visualized the count of resumes for each category using a horizontal bar plot. This plot provides a clear representation of the number of resumes in each category, aiding in identifying any class imbalances.

4. Created a Pie Plot

  • Generated a pie chart illustrating the percentage distribution of resumes across different categories. This visualization helps in grasping the proportional contribution of each category to the overall dataset.

5. Converted all the Resume text to lower case

  • Standardized the text data by converting all resume text to lowercase.

6. Defined a function to clean the resume text

  • Developed a cleaning function to remove special characters, URLs, RT, punctuations, and extra whitespace.
  • Stored the cleaned text in a new column for further analysis.

7. Used nltk package to find the most common words and Generate Word Cloud

  • Utilized the nltk library to tokenize the cleaned resume text and identify the most common words.
  • Created a Word Cloud to visually represent the most frequently occurring words in the resume text.

8. Converted the categorical variable Category to a numerical feature

  • Encoded the categorical variable 'Category' into numerical values using label encoding.

9. Converted Text to Feature Vectors (TF-IDF)

  • Utilized the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer to convert the cleaned resume text into numerical feature vectors. This process involves tokenizing documents, learning vocabulary, and calculating inverse document frequency weightings.

10. Applied Naive Bayes Classifier

  • Splitted the data into training and testing sets.
  • Implemented a Naive Bayes Classifier, specifically the MultinomialNB model, to train on the feature vectors and make predictions.
  • Evaluated the model's performance by calculating accuracy and providing a detailed classification report.

Project Link

Explore the complete project, including the Jupyter notebook with code and visualizations, at Project Link.

Feel free to adapt and use the provided code for your own resume classification tasks. For any questions or suggestions, please open an issue or reach out. Happy exploring!

About

A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages