Property Cost (Analysis and Cleaning)
This is my data cleaning challenge (from a friend); data analysis and cleaning of property cost from Dubai based on the data. I used Pandas, NumPy, Regular Expressions, Seaborn, and, Scikit learn libraries to clean and analyse the data.
I imported the CSV file of property cost in Dubai as ‘data’ variable and did data analysis.
As you can see, the data is messy. First, I tried to clean the cost and area features with the help of regular expressions library.
I made a loop that loops over ‘cost’ feature of the data and gets the numeric values from the string. Then, I made a series of these numbers and replaced it with ‘cost’ feature.
I did the same with ‘area’ feature but I implemented try-except to avoid getting any error because of the NaN values.
Now comes the cleaning of ‘no. of bed’ and ‘Bathrooms’ features. Here, I simply replaced the ‘studio’ with 0 and ‘7+’ with 8.
Then, I converted these values into numeric values.
There are categorical features i.e., ‘type’ and ‘Location’ features. I replaced them with the respective ranking numbers based on the cost ascendingly.
As you can see, there are many NaN values in the data, I fixed that issue by replacing them with the median.
As you can see, there is no huge relation in the given data because the data is small and random.
CONCLUSION
I did all this to practice cleaning data, the scrapped data was too small and random to build a model out of it.
Project Github: Property Cost