How to perform sentiment analysis using NLTK

Key takeaways

  • Sentiment analysis is a natural language processing task that classifies text into positive, negative, or neutral categories.

  • NLTK is a powerful Python library that provides tools for sentiment analysis, including text preprocessing, feature extraction, and model building.

  • The code examples demonstrate different approaches to sentiment analysis using NLTK, such as TF-IDF and bag of words and can be adapted for various real-world applications.

Sentiment analysis, an application of natural language processing (NLP), determines the sentiment expressed in a text, such as identifying whether a movie review is positive, negative, or neutral. With the advancements in NLP, there are now several libraries that can perform sentiment analysis, including TextBlob, VADER, NLTK, etc. Among these, NLTK (Natural Language Toolkit) is a particularly powerful Python library that provides tools for various NLP tasks, including sentiment analysis, making it a popular choice for this purpose.

Steps of sentiment analysis

To perform sentiment analysis using NLTK, follow the steps below:

  1. Data preparation: Obtain or create a dataset containing text and corresponding sentiment labels (positive, negative, or neutral).

  2. Preprocessing: Clean the text data by removing elements like non-alphanumeric characters, punctuation, and any language-specific stopwords. Additionally, use stemming or lemmatization to convert words to their root forms.

  3. Feature extraction: Transform the preprocessed text into feature vectors of numerical form, that can be fed into a machine learning model. For this, use techniques like bag of words (BoW), TF-IDF (Term Frequency-Inverse Document Frequency), or word embeddings.

  4. Model building: Train a classification model on the feature vectors and sentiment labels. You can use various algorithms such as Naive Bayes, support vector machines (SVM), or deep learning models like recurrent neural networks (RNNs) or transformers.

  5. Evaluation: Assess the performance of the trained model using metrics like accuracy, precision, recall, and F1 score on the validation or test dataset.

Code example with the TF-IDF approach

Here’s a Python code example demonstrating sentiment analysis using NLTK with a simple TF-IDF approach:

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
# Sample dataset
documents = [
("I absolutely loved this movie! The acting was superb.", "positive"),
("Terrible experience, would not recommend to anyone.", "negative"),
("This book is fantastic, highly recommend it to everyone.", "positive"),
("I hated every minute of it, complete waste of time.", "negative"),
("The restaurant was amazing, best food I've had in a while.", "positive"),
("The service was terrible, rude staff and slow service.", "negative"),
("I enjoyed reading this book, couldn't put it down.", "positive"),
("Worst product ever, broke after one use.", "negative"),
("The concert was phenomenal, an unforgettable experience.", "positive"),
("Disappointed with the quality of service, will not return.", "negative"),
("The movie was okay, nothing special.", "neutral"),
("Decent experience, but nothing remarkable.", "neutral"),
("The book was average, didn't leave much of an impression.", "neutral"),
("The food at the restaurant was mediocre, not worth the hype.", "neutral"),
("The product met my expectations, neither good nor bad.", "neutral")
]
# Preprocessing
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))
cleaned_documents = []
for doc, sentiment in documents:
words = word_tokenize(doc.lower())
words = [lemmatizer.lemmatize(word) for word in words if word.isalnum() and word not in stop_words]
cleaned_documents.append((" ".join(words), sentiment))
# Feature extraction
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform([doc for doc, _ in cleaned_documents])
y = [sentiment for _, sentiment in cleaned_documents]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model training
classifier = LinearSVC(dual=False)
classifier.fit(X_train, y_train)
# Prediction
y_pred = classifier.predict(X_test)
# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

The model’s accuracy of 0.66 is likely low because of the small dataset, which limits its ability to generalize well. The small dataset and possible imbalance between classes may cause the model to lean towards certain sentiments. LinearSVC, being a simpler model, might also struggle with sentiment subtleties. To improve accuracy, we could try a larger, balanced dataset, explore more advanced models, or fine-tune the features to help the model capture sentiment more effectively.

Code explanation

  • Lines 1–8: We import the necessary libraries and modules for the sentiment analysis task.

  • Lines 11–27: We initialize a sample dataset containing text documents along with their corresponding sentiment labels.

  • Lines 30–37: We initialize a WordNet lemmatizer and retrieve English stopwords from NLTK. Then, we preprocess each document by tokenizing it into words, converting words to lowercase, lemmatizing them, removing stopwords and non-alphanumeric characters and finally joining the words back into a string.

  • Lines 40–43: We create a TF-IDF vectorizer and transform a list of documents into a TF-IDF matrix X, extract the corresponding sentiments in a list y. We then split the data into training and testing sets.

  • Lines 46–47: We initialize a support vector classifier and train it on the training data X_train and y_train.

  • Line 50: We use the trained classifier to predict the sentiment labels for the test data X_test.

  • Lines 53–54: We evaluate the model by comparing the predicted labels y_pred with the actual labels y_test and print the accuracy score.

This code snippet demonstrates a basic sentiment analysis pipeline using NLTK. While NLTK does a good job with sentiment analysis, there are other approaches as well.

The following bag of words (BoW) approach takes a dataset, preprocesses it, and then creates a bag of words for each example. Each instance of the final dataset will have a bag of words as features and a corresponding sentiment as the target.

Code example with the bag of words approach

You can replace the sentence on line 85 in the playground below to predict the sentiment of any sentence you choose:

Since the dataset is small, the generalization of the sentiment may be incorrect at times in more complex sentences that combine positive and negative words. Therefore, this is only a sample code that will work better on larger datasets.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
# Sample dataset
documents = [
("I absolutely loved this movie! The acting was superb.", "positive"),
("Terrible experience, would not recommend to anyone.", "negative"),
("This book is fantastic, highly recommend it to everyone.", "positive"),
("I hated every minute of it, complete waste of time.", "negative"),
("The restaurant was amazing, best food I've had in a while.", "positive"),
("The service was terrible, rude staff and slow service.", "negative"),
("I enjoyed reading this book, couldn't put it down.", "positive"),
("Worst product ever, broke after one use.", "negative"),
("The concert was phenomenal, an unforgettable experience.", "positive"),
("Disappointed with the quality of service, will not return.", "negative"),
("The movie was okay, nothing special.", "neutral"),
("Decent experience, but nothing remarkable.", "neutral"),
("The book was average, didn't leave much of an impression.", "neutral"),
("The food at the restaurant was mediocre, not worth the hype.", "neutral"),
("The product met my expectations, neither good nor bad.", "neutral"),
("This app is incredible! It makes everything so much easier.", "positive"),
("I was thoroughly impressed with the quality of this product.", "positive"),
("The customer service was outstanding, they solved my issue in no time.", "positive"),
("What an amazing movie, I would definitely watch it again.", "positive"),
("The book kept me engaged from start to finish, a must-read!", "positive"),
("I had a wonderful experience at the hotel, everything was perfect.", "positive"),
("The new software update has significantly improved performance.", "positive"),
("I love how user-friendly and intuitive this platform is.", "positive"),
("The concert was a blast, the band played all my favorite songs.", "positive"),
("The course was very informative and well-structured, highly recommended.", "positive"),
("The product was faulty and stopped working within a week.", "negative"),
("I had a terrible experience with their customer support, very disappointing.", "negative"),
("The movie was boring and way too long, a complete waste of time.", "negative"),
("I regret buying this, it did not meet my expectations at all.", "negative"),
("The food was cold and tasteless, I will not be returning to this restaurant.", "negative"),
("This software is full of bugs, constantly crashing and freezing.", "negative"),
("The delivery was delayed, and the package arrived damaged.", "negative"),
("The book was poorly written and lacked depth, not worth the read.", "negative"),
("The hotel was noisy and uncomfortable, not a pleasant stay.", "negative"),
("The service was slow and unprofessional, I was very disappointed.", "negative"),
("The product is okay, it does the job but nothing exceptional.", "neutral"),
("The movie was decent, but I wouldn’t go out of my way to watch it again.", "neutral"),
("The book was average, with some good parts and some dull sections.", "neutral"),
("The service was fine, but nothing that stood out.", "neutral"),
("The event was well-organized, but I didn’t find it particularly exciting.", "neutral"),
("The update was neither good nor bad, it didn't change much for me.", "neutral"),
("The restaurant was clean, but the food was just alright.", "neutral"),
("The new feature is useful, but I haven’t used it much yet.", "neutral"),
("The class was informative, but the content was mostly what I already knew.", "neutral"),
("The performance was okay, nothing spectacular but not bad either.", "neutral")
]
# Preprocessing
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))
cleaned_documents = []
for doc, sentiment in documents:
words = word_tokenize(doc.lower())
words = [lemmatizer.lemmatize(word) for word in words if word.isalnum() and word not in stop_words]
cleaned_documents.append((" ".join(words), sentiment))
# Feature extraction using Bag of Words
vectorizer = CountVectorizer()
X = vectorizer.fit_transform([doc for doc, _ in cleaned_documents])
y = [sentiment for _, sentiment in cleaned_documents]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model training
classifier = LinearSVC(dual=False)
classifier.fit(X_train, y_train)
# Prediction
y_pred = classifier.predict(X_test)
new_sentence = "Disappointed with the product, I didn't like it at all."
# Preprocessing the new sentence
words = word_tokenize(new_sentence.lower())
words = [lemmatizer.lemmatize(word) for word in words if word.isalnum() and word not in stop_words]
cleaned_sentence = " ".join(words)
# Feature extraction for the new sentence
X_new = vectorizer.transform([cleaned_sentence])
# Predict the sentiment
prediction = classifier.predict(X_new)
print("Predicted sentiment:", prediction[0])

Code explanation

  • Lines 1–8: We are importing necessary libraries for natural language processing (NLP) and machine learning

  • Lines 11–56: We are creating a sample dataset documents as a list of tuples. Each tuple contains a text document and its corresponding sentiment label.

  • Line 61: We initialize the lemmatizer and load the list of stopwords for text preprocessing.

  • Lines 64–68: This is majorly text cleaning and preprocessing in the form of lowercasing, lemmatization, and stopword removal. The data is then stored in cleaned_documents.

  • Lines 71–74: The code here is converting text to bag of words, storing documents and labels to X and y matrices and also splitting the dataset into 80% training and 20% testing data.

  • Lines 77–78: We initialize the linear support vector classifier and train it on X_train and y_train.

  • Line 81: We use the trained classifier to predict sentiments for the test data X_test.

  • Lines 85–93: We initialize a sentence, preprocess it through tokenization, lemmatization, and stopword removal and then convert it into a numerical feature vector using a vectorizer for analysis or prediction.

  • Lines 96–97: We predict the sentiment of the line and then print the predicted sentiment.

Another approach is to use word embeddings using Word2Vec. You can replace the sample dataset with your own for various real-world applications, such as social media monitoring, customer feedback assessment, market researching, brand monitoring, political analysis, risk management, healthcare, brand sentiment tracking, election forecasting, and many more. Additionally, experimenting with different preprocessing techniques, feature extraction methods, and classification algorithms can further enhance the model’s performance.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


How do you do sentiment analysis step by step?

  • Compile textual information.
  • Clean up and arrange your information.
  • Examine your dataset.
  • Iterate further, test your modifications and confirm your insights.

How accurate is NLTK sentiment analysis?

The accuracy of NLTK’s sentiment analysis depends on several factors, including the model being used, the quality of the training data, and the type of text you’re analyzing. In general, NLTK provides tools to build your own models, but it doesn’t come with a prebuilt, state-of-the-art sentiment analyzer.


What are the three types of sentiment analysis?

The three widely known types of sentiment analysis are:

  • Emotion based
  • Fine-grained
  • Aspect-based

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved