Natural Language Processing (NLP) is the ability of a machine to read, write, understand and derive meaning from a human language.
Let’s try to understand them in more detail.
Text: The cat sat on the bed. Tokens:
The
,cat
,sat
,on
,the
,bed
List of words: Affection, Affects, Affecting, Affected, Affecting
Root word: Affect
List of words: going, gone, went
Lemma: go
4.POS tagging: We identify the parts of speech for different tokens. Check the example below to see how it’s done.
Sentence: The dog killed the bat.
Parts of speech: Definite article, noun, verb, definite article, noun.
5.Named entity recognition: We classify named entities mentioned in the text into categories such as “People,” “Locations,” “Organizations,” and so on. Check the example below to see how it’s done.
Text: Google CEO Sundar Pichai resides in New York.
Named entity recognition:
Google — Organization
Sundar Pichai — Person
New York — Location
6.Chunking: We pick up individual pieces of information and group them into bigger pieces.
import nltk nltk.download('all-nltk') print("\n") # Creating token of words print("Creating token of words:") from nltk.tokenize import word_tokenize text="My name is Adithya Challa I wrote this shot!" tokenize_word=word_tokenize(text) print(tokenize_word) print("\n") # Stemming print("Stemming:") from nltk.stem import PorterStemmer words=["light","lighting","lights"] ps=PorterStemmer() for w in words: rootword=ps.stem(w) print(rootword) print("\n") #Lemmatiztion:Converts allverb forms into root word print("Lemmatiztion:Converts allverb forms into root word:") from nltk.stem import WordNetLemmatizer lem=WordNetLemmatizer() print(lem.lemmatize("playing")) print("\n") #POS Tag print("POS Tag:") from nltk import word_tokenize,pos_tag text="My name is Adithya Challa I wrote this shot!" print(pos_tag(word_tokenize(text)))
Lines 1 and 2: We download the nltk package
and import the module.
Lines 7–9: We use nltk.tokenize
by importing word_tokenize
and divide the string of words into tokens.
Lines 15–20: We use nltk.stem
by importing PorterStemmer
and remove the prefixes and suffixes to obtain a root word.
Lines 25 and 26: We convert all the verb forms into root words by importing WordNetLemmatizer
.
Lines 32 and 33: We find the parts of speech by importing word_tokenize,pos_tag
.