Car Price Prediction

Aadil Aftab Shaik
4 min readDec 15, 2020

This is my Linear Regression project; data analysis and prediction of car prices based on the data. I used Pandas, NumPy, Stats Models, Seaborn, and Scikit learn libraries to analyse the data and build the model.

I imported the dataset as a ‘data’ variable and analysed the data by using - ‘head’ and ‘transpose’ functions.

After analysing the data, I realized that there are many useless columns for our model, so I removed them by using a ‘drop’ function.

Now comes the cleaning of data. There are many ‘?’ data in our data frame, so I replace them with NaN values.

Thereafter I created a new column called ‘cylinders’, to store all the string values of ‘num_of_cylinders’ column as numbers, and converted all the string values of the remaining columns into float values by using an ‘astype’ function.

I replaced the NaN values of every column with their respective column medians and visualized the data with a pair plot, by using a ‘pair plot’ function from Seaborn library.

Pair Plot of all the data

I divided the data into an independent variable (x) and target variable (y). As my target is to predict the price, I took price as ‘y’ and remaining data as ‘x’.

I further divided ‘x’ and ‘y’ values into the train and test values with the help of ‘train_test_split’ module from Scikit learn, and took 25% of data as the test set, and remaining 75% as the training set.

I find the best fit line for the model by using a ‘fit’ function on a training set of ‘x’ and ‘y’ from Linear Regression (from Scikit learn).

I print all the coefficients (m) and intercept (c) of the model by using ‘coef_’ and ‘intercept_’ functions respectively.

M and C

You can check the score of the model by using ‘score’ function from Linear Regression.

Score

I use Stats Model library to further analyse and improve the model. In this library, the target variable (y) and independent variable (x) should be in the same data frame, so I concatenate (x, y) into one data frame.

I find the best fit line by using ‘ols’ function to assign a formula and using ‘fit’ function on it.

We can get coefficients (m) and intercept (c) of this best fit line by using a ‘params’ function.

M and C

And to get the detailed information we can use a ‘summary’ function.

Summary

CONCLUSION

I did all this to practice Linear Regression, and the following are all the things I learned while doing this project:

Learned how to decide the useless columns in order to remove them.

Learned how to analyse data from a pair plot.

Learned how to use Stats Models library to analyse and improve the model.

Project Github: Car Price Prediction

--

--