Skip to content

Portfolio with Data Science projects | Machine Learning | Python

License

Notifications You must be signed in to change notification settings

aziz-ullah-khan/data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

author contributions welcome

Aziz Ullah Khan

Linkedin

Projects:

Here in this repo you can find notebooks of my data science and machine learning projects.

  • Automate your Stuff: This project focuses on the task automation on files mainly Excel files with the usage of AutoGPT. The project aims to develop an approach that will automate all the boring stuff like data entry, google searching for certain keywords, analysing keywords etc.

  • Football Player Tracking: This project focuses on the task of football player tracking using the Sports Videos in the Wild dataset from the University of Michigan and the Ultralytics YOLOv5 algorithm. The project aims to develop a functioning model that can accurately track players in football matches and provide insights into player performance and team dynamics.

  • Explanation Notebook: This is an explanation notebook that provides a detailed overview of logistic regression, a popular algorithm for binary classification. The notebook uses the Iris plant dataset from Scikit-learn and the Vowpal Wabbit library to demonstrate logistic regression in action. It serves as a valuable resource for anyone looking to understand the underlying concepts of logistic regression and its implementation.

  • Autoplay CartPole: Explore the power of reinforcement learning with CartPole! In this project, we'll use the Proximal Policy Optimization (PPO) algorithm to train agents to balance a pole on top of a cart. With the Autoplay feature, we can observe the agent's performance in real-time as it masters the game. The OpenAI gym CartPole dataset provides the game environment, while the Ray library offers the necessary tools to build and train our agent. By the end of this project, you will gain a better understanding of how reinforcement learning can be applied to solve complex problems such as CartPole.

  • Celebrity Identification: This project uses the Face API to detect and identify celebrities in images, leveraging its high accuracy and advanced features. Our Celebrity Face Recognition Dataset contains images of famous individuals from various fields, making it a valuable tool for security, social media, and marketing applications. Join us as we explore the exciting world of Celebrity Identification with the Face API!

  • News Recommendation: This project is focused on building a news recommendation system using NLP and recommendation systems. Using the Microsoft MIND dataset and the Microsoft Recommenders library, we implement the LSTUR algorithm to provide a personalized and accurate experience for users.

  • Paper Summarization: This project demonstrates the use of the Bidirectional and Auto-Regressive Transformer (BART) model provided by the Huggingface Transformers library to generate summaries for academic papers in the arXiv Summarization Dataset. By the end of this project, you will have a better understanding of how to use BART to generate summaries for academic papers, and how to evaluate the quality of the generated summaries.

  • Cartoon Classification: This project, CARTOON CLASSIFICATION, uses deep learning to identify characters from the TV show "The Simpsons." The ResNet18 algorithm is utilized to train a model on the Kaggle Simpsons Characters Data, with the aim of providing an intuitive and hands-on approach to image classification using deep learning.

  • Movie Recommendation: In this project, I use the Movielens dataset and Microsoft Recommenders library to build a movie recommendation system using SAR algorithm. The system is based on user's past movie preferences and can suggest movies to them. The project aims to provide accurate recommendations for both popular and niche items.

  • Forecasting of COVID: In this project, I used time-series forecasting and the Meta AI Prophet library to predict the spread of COVID-19 across countries using the Kaggle COVID-19 dataset. I explored and preprocessed the data, built baseline and Prophet models, and evaluated their performance using various metrics. My goal was to provide insights to inform public health policies and interventions in the fight against COVID-19.

  • Hate Speech Detection Using Transformer: In this project, I implemented a hate speech detection model using the DeBERTa algorithm and the Huggingface Transformers library. The model is capable of classifying tweets into one of three categories: hate speech, offensive language, or neutral. I used the Kaggle Hate Speech and Offensive Language Dataset to train and test the model.

  • Microbusiness Density Forecasting: In this project, I participated in a forecasting competition to predict the density of microbusinesses in counties across the United States. I used various regression models to make predictions, and the best performing model was chosen for final submission.

  • Swift Code Information Extractor Using Transformer NER: This project is like a prototype for understanding Swift code by a layman with a native understandable language. The trained model detect the variable type, variable name, object type, object name etc in Swift code.

  • Firefox Bugs Classification Using Classical Machine Learning and BERT: For the purpose of bugs reports classification, information gathered from the users as feedback or captured from reporting the bugs to the developers, here in this research a tool is developed which are classifying the bugs into their respective bug type with the subsequent classification of the bug reports into their respective components as well. Machine learning techniques such as BERT are proved quite performing in this study for the classification of bugs. It also showed that the bug reports with the help of machine learning can quickly fixed the bugs by identifying the bug in no time and user experience is also promised. Moreover, the proposed model can be further enhanced in different areas.

  • Quora Question Pairs: Sentences or questions similarities is utmost important in many applications and machine learning approaches have applied to solve this problem by many researchers. Questions similarities is very hot topic these days and many researchers are approaching the problem to solve with a reasonable accuracy. In this project, we deep dived with different algorithms/experiments to find the optimal solution to the problem along with the comparison of multiple models. The dataset used for this project is obtained from the Kaggle competition with the name “Quora Question Pairs” and is available openly on the website Kaggle.com. In our approach we implemented many classical models like logistic regressions, decision tree. Subsequently we configured BERT transformers with different model type.

  • Entities Identification in Healthcare Data: To derive Name Entity Recognition (NER) on Medical data set so that we can classify the given word as of type Disease, Treatment or Others

  • Chatbot: Chatbot - Specific for Japanese but can be used for any language with slight tunning!

  • Microsoft Research Sentence Completion: Sentence completion is a challenging task and is extremely time consuming, which is the challenge of today. Many researchers worked on the autocompletion of the sentence but the complexity arise with the high volume of textual data processing. N-grams along with word2vec and with the combination of wordnet play a key role in the solving the problem by allowing the words to vectorize with the help of word2vec and the distances between the words can be found for similarity. Wordnet which is a lexical database and having semantic relationship between words in over twenty hundred languages and is the part of Natural Language Toolkit corpus. In this study n-grams (uni-gram, bi-gram, tri-gram) are implemented separately and also along with wordnet with word2vec. The dataset used in this study is the Microsoft research sentence completion challenge. The models are first trained with the training textual data and performance is evaluated on the multiple-choice question with five options. The evaluation metrics shown for this study are accuracy, precession, recall, f-1 score respectively. In this novel approach satisfactory results are generated.


Made with 💖 by Aziz Ullah Khan

About

Portfolio with Data Science projects | Machine Learning | Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published