How to perform tokenization of text using TextBlob in Python

Natural Language ProcessingNLP is a fast-growing technology that deals with text data to perform several applications, including chat bots, sentiment analysis, semantic analysis, and more.

TextBlob is one of the most important and basic libraries that deals with finding sentiment scores, filtering, and tokenization.

Before we move on, we need to install TextBlob. To do so, we run the commands mentioned below in the command line tool.

Use the following code on the command line:

pip install -U textblob 
python -m textblob.download_corpora

Tokenization

Before we proceed, it is important to understand the following terms:

Corpus: This is the collection of the text data (in any language) which can be further used in semantic analysis, classification, etc.
Token: This refers to the strings divided from the input text.

Tokenization is the process of dividing or separating sentences or words from the text (corpus) into smaller units.

Example

Suppose that the input text is “I love to eat fast food.”

After applying the tokenization to this input text, the output contains all the words separated from the sentence as follows: [“I”, “love”, “to”, “eat”, “fast”, “food”].

We can also divide a single word into tokens. For instance: banana can be tokenized to b-a-n-a-n-a.

Code

Let’s look at a code for tokenizing text using TextBlob.

Explanation

In line 1, we import the required package.
From lines 3 to 5, we create a sample corpus of text.
In line 7, we create a TextBlob object and pass the corpus we want to tokenize.
In line 9, we print the tokenization of corpus based on the words.
In line 11, we print the tokenization of corpus based on the sentence.

When we pre-process the text data, tokenization plays an important role. It divides the corpus into sentences, words, or even characters.

TextBlob is one of the most important libraries in NLP. It offers a simple API that helps us perform NLP tasks faster.

New on Educative

Learn to Code

Learn any Language as a beginner

Develop a human edge in an AI powered world and learn to code with AI from our beginner friendly catalog

🏆 Leaderboard

Daily Coding Challenge

Solve a new coding challenge every day and climb the leaderboard

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)