Technolvers launched the #Data Explainability Challenge (DEC) to emphasize the importance of Data Science and Data Analytics for tackling real-world problems like the current COVID-19 pandemic. Participants were required to rank the top five regions of Pakistan from the Most Successful to the Most Struggling on the basis of the provided COVID-19 dataset. Our team consisting of three members (Farhana Shafi, Rahat Ul Ain, Bia Chaudhry (Team Lead)), performed data analysis in four major steps. First, we implemented a time-series analysis to examine the region-wise trends of several factors. Then, we compared all the regions considering parameter ratios. Third, we drew insights from previous comparisons. And finally, we rated the top five regions. #Alhumdullilah. We stood second runner-up in this all Pakistan DEC competition. Special thanks to Dr.Semaab Latif for mentoring us. The details of our analysis can be found here: Source Code
I collected and analysed the data for getting the answer to the question: "What Motivates you to read?". I applied Clustering & Decision Tree approach to generate results. The details can be found in my medium blog: Reading Habit Analysis - Medium Blog Post. I used Kaggle kernel for the analysis. The source code can be found on Kaggle. Source Code. You are very welcome to use the dataset as well as improve these results. It can be used to create an End-to-End system to motivate users to read.
Using NLP techniques I found the similarity between several documents. The processes I used include:
- TF-IDF
- Cosine-similarity
- Soft-Cosine-Similarity
- Fasttext Word Embeddings
- Clustering : Dendograms
- Visualization : Network Graph
- Evaluation Metric : Accuracy The detailed code with further explanation can be found here: Code & Description The Presentation of this project can be found here: Presentation
This project is more of a data analytical project. But you can always initiate a data science problem to find a solution. In this project I collected data from Final Year University students (mainly from my university NUST), when I was still in my final year. The survey aims at figuring out the sentiments of graduation class regarding several aspects of their univeristy and other experiences. The details of this project can be found in my blog post at: Survey of Final Year Students
This study is meant to provide a brief overview of how the productivity of different sectors effected amid the current pandemic situation.
The data is collected using Google Forms. Hence it is inserted in the form of CSV file to generate visualizations. The CSV file contains the data in the form of structured rows and columns. Around 150 people participated in the data collection phase. This report contains a total of 5 pages only.
The report and visualization charts are created using Google Data Studio. Google Data Studio helps to create and manage analytical reports easily and effectively.
The graphical representations along with the description can be found in this report, showing several results and data insights.
From the current analysis, we can draw several insights such as:
- The male participants for this survey outnumbered the female participants.
- Out of all the participants, the major portion is covered by students.
- The reason for the above two conclusions is that the Google form reached that audience the most.
- Overall productivity decreases.
- The reduction in productivity is due to non-familiarity with distance learning and work from home.
Conclusively I would say that such analysis allows us to better tackle the situation like current pandemic effectively, and enables us to use technology to move forward with maximum productivity. FULL REPORT
I worked on data-driven web applications. These applications provide an interactive view of data in the form of a dashboard. I found Streamlit (Python Framework) very useful in this regard, so I built and deployed an interactive web application using Python and Streamlit. The application is able to:
1. Show the dataset on the user's choice.
2. Show the products corresponding to the selected sellers.
3. Show per year MRP, discount%, and sale price.
You can find the complete source code and explanation in my medium post here
Happy Data Sciencing :)