Property Cost (Analysis and Cleaning)

Aadil Aftab Shaik
3 min readDec 24, 2020

This is my data cleaning challenge (from a friend); data analysis and cleaning of property cost from Dubai based on the data. I used Pandas, NumPy, Regular Expressions, Seaborn, and, Scikit learn libraries to clean and analyse the data.

I imported the CSV file of property cost in Dubai as ‘data’ variable and did data analysis.


As you can see, the data is messy. First, I tried to clean the cost and area features with the help of regular expressions library.


I made a loop that loops over ‘cost’ feature of the data and gets the numeric values from the string. Then, I made a series of these numbers and replaced it with ‘cost’ feature.

I did the same with ‘area’ feature but I implemented try-except to avoid getting any error because of the NaN values.

Now comes the cleaning of ‘no. of bed’ and ‘Bathrooms’ features. Here, I simply replaced the ‘studio’ with 0 and ‘7+’ with 8.

data[‘no.of bed’].unique()

Then, I converted these values into numeric values.


There are categorical features i.e., ‘type’ and ‘Location’ features. I replaced them with the respective ranking numbers based on the cost ascendingly.


As you can see, there are many NaN values in the data, I fixed that issue by replacing them with the median.


As you can see, there is no huge relation in the given data because the data is small and random.



I did all this to practice cleaning data, the scrapped data was too small and random to build a model out of it.

Project Github: Property Cost

