Text normalization (i.e., preparing text, words and documents) is one of the most fundamental tasks of the Natural Language processing field. These text normalization techniques are called Stemming and Lemmatization. nltk.stem
is one of the most widely used libraries in Python for Stemming and Lemmatization.
Examples of Stemming and Lemmatization
If you look at the words
cars, car’s, CAR, Car, and cars’, you will see that all of them are derived from the nltk.stem
, all of these words will be mapped to car
.
nltk.stem.util
is one of the instances of ntlk.stem
. It contains two major methods,
nltk.stem.util.prefix_replace
and nltk.stem.util.suffix_replace
.
nltk.stem.util.prefix_replace(original, old, new)
This method shown above is used to replace the old (parameter) prefix of the original string (given as a paramater to the function) by a new (parameter) suffix. It returns a string.
Parameters
original
– string
old
– string
new
– string
nltk.stem.util.suffix_replace(original, old, new)
The method above is used to replace the old (parameter) suffix of the original string (given as a parameter to the function) with a new (parameter) suffix. It returns a string.
Parameters
original
– string
old
– string
new
– string
Free Resources