Disease Predictor

Aadil Aftab Shaik
Analytics Vidhya
Published in
3 min readDec 24, 2020

--

This is my deep learning project; data analysis and prediction of 41 diseases based on the symptoms given in the data. I used Pandas, NumPy, Seaborn, TensorFlow, Keras and, Scikit learn libraries to analyse the data and build the model.

I imported the CSV as ‘data’ variable with the help of the pandas library and analyzed it.

data.head().transpose()
data.describe().transpose()
data.isna().sum()

After analyzing data, we can see that after ‘symptom_5’ feature there are NaN values mostly, so i removed those features and replace the remaining NaN values with 0.

Now comes the feature engineering, I replaced the symptoms with their value counts and replaced the diseases with numbers (0 to 40).

data.head(15)
sns.countplot(data[‘Disease’])

I divided the data into x and y variable and split them into train-test data i.e., ‘x_train’, ‘x_test’, ‘y_train’, ‘y_test’. I performed scaling on them to normalize the data before sending it through the artificial neural network. I did one-hot encoding on ‘y_train’ and ‘y_test’.

I built an artificial neural network to classify these diseases and made a classification report of it, it got 91% accuracy on average for 41 diseases.

CONCLUSION

I did all this to practice artificial neural network, and the following are all the things I learned while doing this project:

Learned how to use and when to use artificial neural network.

Learned advantages and disadvantages of deep learning over other basic machine learning algorithms.

Project Github: Disease Prediction

--

--