Skip to content

Exploratory data analysis using Python, Numpy, Pandas, Seaborn for Carsale Advertisement Dataset

Notifications You must be signed in to change notification settings

rohinegi548/EDA_Carsale_Dataset

Repository files navigation

EDA - Carsale Advertisement Dataset

Exploratory data analysis using Python, Numpy, Pandas, Seaborn

Table of Contents

  • Problem Statement
  • Data Loading and Description
  • Data Profiling
  • Understanding the Dataset
  • Profiling_1
  • Preprocessing_1
  • Profiling_2
  • Preprocessing_2
  • Post Profiling
  • Conclusions

Problem Statement

The notebooks explores the basic use of Pandas and will cover the basic commands of Exploratory Data Analysis(EDA) which includes cleaning, munging, combining, reshaping, slicing, dicing, and transforming data for analysis purpose.

    Exploratory Data Analysis

    Understand the data by EDA and derive simple models with Pandas as baseline. EDA ia a critical and first step in analyzing the data and we do this for below reasons :

  • Finding patterns in Data
  • Determining relationships in Data
  • Checking of assumptions
  • Preliminary selection of appropriate models
  • Detection of mistakes

Data Loading and Description

  • The dataset consists information collected from car sale advertisements for study/practice purpose where most of them're used cars.
  • The dataset comprises of 9576 observations of 10 columns. Below is a table showing names of all the columns and their description.


Column Name Description
car Manufacturer brand
price Seller’s price in advertisement (in USD)
body Car body type
mileage as mentioned in advertisement (‘000 Km)
engV rounded engine volume (‘000 cubic cm)
engType type of fuel (“Other” in this case should be treated as NA)
registration whether car registered in Ukraine or not
year year of production
model specific model name
drive drive type

Some Background Information

  • This data was collected from private car sale advertisements in Ukraine and provided by INSAID team to perform Exploratory Data Analysis.
  • This dataset has real raw data which has all inconvenient moments (as NA’s for example).
  • This dataset contains data for more than 9.5K cars sale in Ukraine. Most of them are used cars so it opens the possibility to analyze features related to car operation.