This analysis is motivated by the fact that the company wants us to explore the book sales data and gain valuable insights from it. Therein, we have quantitative data i.e. price and qualitative data such as reviews. More information is also provided i.e. the book name and state where the sale was made. The main question we’re trying to answer is how profitable are the book sales.
In the data preparation, some of the things that had to be done to make it more usable include: Converting data in the review column from string to numeric i.e. 1 to 5, 5 being excellent Filtering missing data and reviewing it closely Standardizing state names to respective abbreviated code names eg from California to CA Aggregating the data using group by and count functions among others. Sorting out the data to find out which had the highest/lowest number of factor of interest eg highest no. of books sold, ordering revenue earned from each book, book with the most no. of favorable reviews. Used control flow and logicals to categorize the data.
In conclusion, the most profitable book in terms of revenue earned was Secrets of R For Advanced Students & the book with highest number sold was Fundamentals of R For Beginners. More information could also be provided that would aid in knowing what time of the year most sales are made. It could also be helpful in determining whether or not to do a sales & marketing campaign and what time would be the best to do that. These findings could be helpful to the stores & procurement or publishing section of the company. They’re now more aware of which books they should stock up more of.