Natural Language Processing sequencing takes a sequence of words and converts them into a sequence of numbers. We can perform other data processing techniques after sequencing the text.
The following example explains how NLP sequencing works:
input_text = ['This is Educative','We love Educative','This is an Educative Answer']word_index: {'Educative': 1, 'this': 2, 'is': 3, 'we': 4,'love': 5, 'an': 6, 'Answer': 7}sequences: [[2, 3, 1], [4, 5, 1], [2, 3, 6, 1, 7]]
An integer is assigned to each unique word according to its frequency in the input text, and integer sequences are generated accordingly from the input word sequences.
We can do the sequencing by using the Tokenizer
library from TensorFlow
. The following code demonstrates NLP sequencing:
import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'import tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras.preprocessing.text import Tokenizerinput_text = ['I love Educative','I am reading an Educative Answer','Educative Answer on NLP sequencing']tokenizer = Tokenizer(num_words = 50)tokenizer.fit_on_texts(input_text)word_index = tokenizer.word_indexsequences = tokenizer.texts_to_sequences(input_text)print(word_index)print('\n')print(sequences)
Line 1–2: We'll set up the environment to ignore tf
warnings.
Line 4–6: We'll import the necessary libraries.
Line 13: We are defining a Tokenizer
object and mapping it to the sentences we created.
Line 14: The fit_on_text()
function updates the internal vocabulary based on the list of texts.
Line 16: We are defining our word index by using the word_index
function.
Line 17: The input list sentences
is being converted into an integer sequence by using the text_to_sequences()
function.
Hence, NLP sequencing is used for machine translation or
Free Resources