Property Cost (Analysis and Cleaning)

3 min readDec 24, 2020

This is my data cleaning challenge (from a friend); data analysis and cleaning of property cost from Dubai based on the data. I used Pandas, NumPy, Regular Expressions, Seaborn, and, Scikit learn libraries to clean and analyse the data.

I imported the CSV file of property cost in Dubai as ‘data’ variable and did data analysis.

As you can see, the data is messy. First, I tried to clean the cost and area features with the help of regular expressions library.

data[‘cost’].unique()

data[‘Area’].unique()

I made a loop that loops over ‘cost’ feature of the data and gets the numeric values from the string. Then, I made a series of these numbers and replaced it with ‘cost’ feature.

I did the same with ‘area’ feature but I implemented try-except to avoid getting any error because of the NaN values.

Now comes the cleaning of ‘no. of bed’ and ‘Bathrooms’ features. Here, I simply replaced the ‘studio’ with 0 and ‘7+’ with 8.

data[‘no.of bed’].unique()

data[‘Bathrooms’].unique()

Then, I converted these values into numeric values.

There are categorical features i.e., ‘type’ and ‘Location’ features. I replaced them with the respective ranking numbers based on the cost ascendingly.