These are the questions for every data practitioner (Data analysts and scientists) under different categories: Beginner, Intermediate, and Advanced.
The goal of this competition is to provide a platform for individuals to learn, build and collaborate on innovative and impactful projects that address real-world challenges. Through this competition, participants will have the opportunity to develop their skills in areas such as programming, data analysis/science, and machine learning while also working collaboratively with others to create solutions that can make a positive difference in the world.
By the end of the competition, participants will have gained valuable experience, expanded their network, and contributed to the development of innovative projects that can help to address important societal issues.
- Excel
- SQL
- Python
- How do you use Excel to clean and transform messy data, including techniques like filtering, sorting, and text manipulation?
- Can you explain the concept of data validation in Excel and describe some techniques for enforcing data integrity and preventing errors?
- How would you use Excel to build and analyze pivot tables and charts, including techniques for summarizing and aggregating data?
- Can you explain the basic concepts of relational databases and SQL and describe how they are used to store and retrieve data?
- How do you use SQL to create and modify database objects like tables, views, indexes, and stored procedures?
- How do you write basic SQL queries to extract data from a database, including techniques like filtering, sorting, and grouping data?
- How do you use Python libraries like NumPy and Pandas to preprocess and manipulate data for machine learning applications?
- How would you load and preprocess a large dataset in Python using libraries like Pandas and NumPy, and what techniques would you use to clean and transform the data?
- Exploratory Data Analysis for Customer Sales: Conduct exploratory data analysis on a customer sales dataset, identifying trends, patterns, and insights to inform business decision-making. This could involve using statistical analysis techniques, data visualization, and hypothesis testing.
- Predicting House Prices: Develop a machine learning model to predict house prices based on features like location, square footage, and number of bedrooms/bathrooms. This could involve data cleaning, feature engineering, model selection, and evaluation metrics.
- Create a new branch or fork this repo, your answers to the theory questions should be in a readme or txt file.
- The solutions to the hands-on can be created in the repository too.
- Create a pull request when you are ready to submit.