Sales Analysis

Aadil Aftab Shaik
4 min readAug 12, 2020

This is my first Pandas project, a basic data analysis of sales in the year 2019. I used Pandas, OS, Matplotlib, Itertools, and Collections libraries to analyse the data.

I set the max. columns and display width first for all my Pandas work to easily access the data.

I made a list of all the file names that are used in the data analysis with the help of OS library, and then I made an empty data frame to dump all the data into it.

Following I made a for loop to help concatenate all different data sets into one big data frame, and then I merged all the data into the same data frame and saved it to an CSV file.

Now comes the cleaning of data, as I removed all the rows which contain NaN values and removed rows which contains the ‘Or’ string.

To get the highest sales per month, I created a new column called ‘Month’ using the first two digits of the column called ‘Order Date’ and converted them to ‘int32’ data type by using ‘astype’ function.

Then I created a ‘Sales’ column by multiplying the ‘Quantity Ordered’ and ‘Price Each’ columns. Thereafter, I grouped the data by ‘Month’ and added the data and plotted the bar graph of sales per month.

To get the highest sales per city analysis, I created ‘get_city’ and ‘get_state’ function which I used to create the city column by the help of ‘Purchase Address’ column in the data and plotted bar graph of sales per city.

To get the highest orders per hour, I first converted the data in ‘Order Date’ column to date and time format.

Then I created hour and minute columns by the help of ‘Order Date’ column, ‘dt.hour’, and ‘dt.minute’ functions. Thereafter I plotted a line graph of orders per hour.

To find the products most sold together, I made a new data frame called duplicate with data of duplicate Order ID, and then I created a new column called ‘Grouped’ in duplicate data frame by grouping the same Order ID products separated by ‘,’ .

Then I used combinations in Itertools library and Counter in Collections library to iterate through duplicate data frame’s Grouped column and identified the most commonly sold together items.

To find the most sold product and why, I made a new data frame called product_group, by grouping the data of ‘Product’ column from main data frame, ‘data’, and I summed the product_group and entered the data from ‘Quantity Ordered’ into a variable called quantity_ordered.

I then plotted graph of product per quantity ordered, and I also need to show why it’s sold the most. On the first glance it looks like the cheap items sold the most. So for calculating the prices, I grouped ‘Products’ and calculated the mean of ‘Price Each’ and plotted a subplot for product, quantity_ordered, and prices.

CONCLUSION

I did all this to practice Pandas library, and following are all the things I learned while doing this project:

Learned how to concatenate many data sets into one big data frame.

Learned how to create a function to use in apply function.

Learned how to separate the data by hours and minutes.

Learned how to use Itertools and Collections libraries.

Learned how to use sub-plots.

Project Github: Sales-Analysis

--

--