Text summarization in spaCy and NLTK

Text summarization, a subdomain of NLP, is a shortcut to reading an enormous set of documents. There are various popular NLP Libraries, two of which are Spacy and NLTK. Before learning to code, let's first understand the general approach of a text summarizer and the logic we will be following throughout coding.

While performing text summarization, the initial step is text cleaning which is more generally called the text preprocessing step. In this cleaning step, we perform tasks like removing punctuation, converting to lowercase, and handling special characters. The next step is to split our text into sentences and then further into words. After that, we obtain a word-frequency count on whose basis the sentences are ranked. The most important sentences are identified, which are then included in the final summary.

Now, let's learn to code a text summarizer in spaCy and NLTK.

Text summarizer in spaCy

Importing libraries

Firstly, we will import spaCy, which is a popular Python library for natural language processing tasks and other necessary modules.

# Sample input text
text = """"In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been lanched to empower the next generation of students with AI-ready skills. Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. This will require more collaborations and training and working with AI. That’s why it has become more critical than ever for educational institutions to integrate new cloud and AI technologies. The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow. The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry. Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public. The program was developed to provide job ready skills to programmers who wanted to hone their skills in AI and data science with a series of online courses which featured hands-on labs and expert instructors as well. This program also included developer-focused AI school that provided a bunch of assets to help build AI skills."""
# Load the English language model
nlp = spacy.blank("en")
# Convert text to lowercase and tokenize without removing stop words and punctuation
doc = nlp(text.lower())

Preprocessing the text

Code explanation

Line 2: Initializes an empty dictionary to store word frequencies.
Line 3– 8: The code uses a loop to iterate through each token in the doc object. if token.text not in STOP_WORDS and token.text not in punctuation: checks if the token is not a stop word (common words like "the," "and," etc.) or a punctuation mark (e.g., period, comma). If the token meets the above condition, the code checks whether the token's text is already in the word_frequencies dictionary. If it is already in the dictionary, then it updates its frequency count. Otherwise, it stores 1 in place of the count.

Summarized text

text = """"In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been lanched to empower the next generation of students with AI-ready skills. Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses. The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. This will require more collaborations and training and working with AI. That’s why it has become more critical than ever for educational institutions to integrate new cloud and AI technologies. The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow. The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry. Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public. The program was developed to provide job ready skills to programmers who wanted to hone their skills in AI and data science with a series of online courses which featured hands-on labs and expert instructors as well. This program also included developer-focused AI school that provided a bunch of assets to help build AI skills."""
"""
# Convert text to lowercase
text = text.lower()
# Tokenize the text into sentences
sentences = sent_tokenize(text)
# Tokenize the sentences into words
words = [word_tokenize(sentence) for sentence in sentences]
# Flatten the list of words
words = [word for sublist in words for word in sublist]
# Remove stopwords and punctuation
stop_words = set(stopwords.words("english") + list(punctuation))
words = [word for word in words if word not in stop_words]

Code explanation

Line 6: The code uses text.lower() to convert the entire text to lowercase. This step ensures that the text is case-insensitive when performing further operations.
Line 7–12: The code uses sent_tokenize(text) from NLTK to tokenize the text into individual sentences. The resulting sentences variable is a list, then a list comprehension to tokenize each sentence in sentences into words using word_tokenize(sentence). The resulting words variable is a list of lists, where each inner list contains the words of a sentence.
Line 14–19: It creates a new list called words, where each element is an individual word from the original sentences. The code gets the set of English stopwords using set(stopwords.words("english") + list(punctuation)). It then filters out the stopwords and punctuation from the list of words using a list comprehension. The resulting words list now contains only meaningful words without stopwords and punctuation.

Word-frequency count

Code explanation

The code extracts the top 3 sentences with the highest scores from the sentence_scores dictionary and then joins these sentences to create the final summary. The summary represents the most important information extracted from the original text, making it a concise and informative summary of the text's key points.

Conclusion

In conclusion, combining HTML forms with CSS empowers web developers to create visually appealing, responsive, and user-friendly forms. The use of CSS provides flexibility, consistency, and creativity in the form design while adhering to the principles of separation of concerns and code optimization. By prioritizing form aesthetics and functionality, websites can enhance user engagement and overall user experience.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources