This project is a course recommendation system built using Python and various machine learning techniques. The system takes a dataset of course descriptions and recommends similar courses based on the content similarity.
The dataset used in this project is the 'Coursera.csv' file, which contains information about various courses offered on the Coursera platform. The dataset includes columns such as 'Course Name', 'Difficulty Level', 'Course Description', and 'Skills'.
The recommendation system is based on the following steps:
-
Data Preprocessing: The text data in the dataset is cleaned and preprocessed to remove unnecessary characters, convert to lowercase, and perform stemming.
-
Text Vectorization: The preprocessed text data is converted into numerical vectors using the CountVectorizer from the scikit-learn library.
-
Similarity Calculation: The cosine similarity between the vectorized course descriptions is calculated to measure the similarity between courses.
-
Recommendation: Given a course name, the system finds the most similar courses based on the cosine similarity scores and recommends the top-ranked courses.
The following Python libraries are required to run the project:
- NumPy
- Pandas
- Matplotlib
- Seaborn
- scikit-learn
- NLTK
- Clone the repository or download the project files.
- Install the required dependencies by running
pip install -r requirements.txt
(if you have arequirements.txt
file). - Place the 'Coursera.csv' dataset file in the 'Dataset' directory.
- Run the 'recommendation.ipynb' notebook to preprocess the data, train the recommendation system, and get course recommendations.
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.