Book Review Model

Aadil Aftab Shaik
3 min readSep 9, 2020

--

This is a project on sentiment analysis, to predict the review of a book if its positive or negative using bag of words. In this project I used Support Vector Classification, Decision Tree Classifier, Gaussian NB, and Logistic Regression to predict data.

First I opened the file and iterated line by line as ‘review text’ and ‘overall’, and then I appended them to a list as a tuple.

I then organized them into class and made one ‘get_sentiment’ function to determine if its positive review or negative review based on ‘overall’. If the ‘overall’ is above or equal to 3, it’s positive, or else, it’s negative.

Then I divided the list into two parts called training and test using ‘train_test_split’ from ‘sklearn.model_selection’ - each for training the model and testing the model (I chose the test size as 1/4th of the data set, and remaining as training size).

I created the ‘evenly_distribute’ function to evenly distribute the positives and negatives in train and test data to maximize the prediction.

It’s necessary to vectorize the data given in text to train the model. So I used TfidfVectorizer to vectorize the given review data (you can use CountVectorizer too if you want).

Now using the ‘fit_transform’ function to fit the data and transform the data into a matrix of integer.

Now I used Linear SVM, Decision Tree, Naïve Bayes, and Logistic Regression classifiers to train the data.

Now comes the evaluation of the model to see if it works, and to check how correct the prediction rate is. Using ‘score’ function and ‘f1_score’ function, we can determine the correct prediction rate. But ‘score’ function is just mean and ‘f1_score’ function is overall score. So I prefer ‘f1_score’ to determine the prediction rate.

We can also tune the model by GridSearchCV from ‘sklearn.model_selection’, to predict which parameters are best for the selected classifiers, to predict the data.

CONCLUSION

I did all this to practice scikit library, and following are all the things I learned while doing this project:

I learned how to load data line wise from JSON format file.

I learned what is bag of words method.

I learned how to optimize the model for better results.

Project Github: Book Review Model

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Aadil Aftab Shaik
Aadil Aftab Shaik