Text summarization with sumy

Key takeaways
Text summarization condenses large amounts of information into concise summaries, helping users quickly grasp main points.
Sumy is a Python library that provides various algorithms for automatic text summarization, including LSA, LexRank, and Luhn.
Sumy can be installed via pip and also requires the NLTK library as sumy depends on it.
Summarizing with LSA analyzes relationships between terms and documents; good for topic distribution but computationally intensive.
Summarizing with LexRank is a graph-based approach similar to PageRank; effective for identifying central sentences but may vary in performance based on text structure.
Summarizing with Luhn is simple and fast; focuses on word frequency but may miss broader context in longer texts.
Each algorithm has its strengths and weaknesses, which should be considered based on the specific text and summarization needs.

In today’s world, we’re overwhelmed by more text than we can read—from news articles and research papers to blog posts and social media updates. Text summarization helps by condensing large amounts of text into short, relevant summaries, making it easier to quickly understand the main points. Sumy is a helpful Python library that simplifies this process, offering an easy and effective way to automatically summarize text.

In this Answer, we will explore the capabilities of sumy, demonstrate how to install and use it, provide executable examples and compare different summarization algorithms to help us choose the right one for our needs.

What is sumy?

Sumy is a Python library that provides various algorithms for text summarization. It supports several methods, including:

LSA (latent semantic analysis)
LexRank
Luhn
Edmundson
TextRank
KLSum
Reduction

Each algorithm has its own approach to extracting the most important sentences from a given text. We’ll demonstrate the first three widely used algorithms.

Installing sumy on a local machine

Before we can use sumy, we need to install it. Sumy can be installed using pip:

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
# Example text for better demonstration
text = """
The quick brown fox jumps over the lazy dog. The dog, being very lazy, doesn't move.
The fox, enjoying its agility, continues to jump over the dog. Meanwhile, the dog just yawns and stretches.
This pattern continues for a while, making the fox quite happy and the dog remain lazy.
On the other side of the forest, a squirrel and a rabbit play together. They chase each other around the trees,
occasionally stopping to nibble on nuts and grass. The squirrel shows off its acrobatic skills, jumping from branch to branch,
while the rabbit bounds gracefully across the clearing.
As evening falls, all the animals settle down. The fox curls up under a tree, the squirrel finds a cozy spot in the branches,
and the rabbit digs a shallow burrow for the night. The lazy dog simply stretches out on the cool grass, content and peaceful.
"""
# Create a parser
parser = PlaintextParser.from_string(text, Tokenizer("english"))

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
# Example text for better demonstration
text = """
The quick brown fox jumps over the lazy dog. The dog, being very lazy, doesn't move.
The fox, enjoying its agility, continues to jump over the dog. Meanwhile, the dog just yawns and stretches.
This pattern continues for a while, making the fox quite happy and the dog remain lazy.
On the other side of the forest, a squirrel and a rabbit play together. They chase each other around the trees,
occasionally stopping to nibble on nuts and grass. The squirrel shows off its acrobatic skills, jumping from branch to branch,
while the rabbit bounds gracefully across the clearing.
As evening falls, all the animals settle down. The fox curls up under a tree, the squirrel finds a cozy spot in the branches,
and the rabbit digs a shallow burrow for the night. The lazy dog simply stretches out on the cool grass, content and peaceful.
"""
# Create a parser
parser = PlaintextParser.from_string(text, Tokenizer("english"))
# Create an LSA Summarizer
lsa_summarizer = LsaSummarizer()
# Summarize the text
lsa_summary = lsa_summarizer(parser.document, 4)  # Summarize to 4 sentences
# Print the summary
print("Summary:")
for sentence in lsa_summary:
    print(sentence)

Code explanation

Here is the explanation of the highlighted code.

Line 3: This line imports the LSA summarizer class for performing text summarization.
Line 23: This line creates an instance of the LSA summarizer.
Line 26: This line summarizes the parsed text into four sentences using the LSA summarizer.
Lines 29–31: These lines print each sentence of the generated summary.

Summarizing with LexRank

LexRank is an algorithm for text summarization based on the method of ranking the given sentences in the document. It employs the scores for their similarity in the determination of the importance of sentences and uses a graph-based approach. Here’s how to use it:

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer
# Example text for better demonstration
text = """
The quick brown fox jumps over the lazy dog. The dog, being very lazy, doesn't move.
The fox, enjoying its agility, continues to jump over the dog. Meanwhile, the dog just yawns and stretches.
This pattern continues for a while, making the fox quite happy and the dog remain lazy.
On the other side of the forest, a squirrel and a rabbit play together. They chase each other around the trees,
occasionally stopping to nibble on nuts and grass. The squirrel shows off its acrobatic skills, jumping from branch to branch,
while the rabbit bounds gracefully across the clearing.
As evening falls, all the animals settle down. The fox curls up under a tree, the squirrel finds a cozy spot in the branches,
and the rabbit digs a shallow burrow for the night. The lazy dog simply stretches out on the cool grass, content and peaceful.
"""
# Create a parser
parser = PlaintextParser.from_string(text, Tokenizer("english"))
# Create a LexRank Summarizer
lex_rank_summarizer = LexRankSummarizer()
# Summarize the text
lex_rank_summary = lex_rank_summarizer(parser.document, 4)  # Summarize to 4 sentences
# Print the summary
print("Summary:")
for sentence in lex_rank_summary:
    print(sentence)

Code explanation

Here is the explanation of the highlighted code.

Line 3: This line imports the LexRank summarizer class for performing text summarization.
Line 23: This line creates an instance of the LexRank summarizer.
Line 26: This line summarizes the parsed text into four sentences using the LexRank summarizer.
Lines 29–31: These lines print each sentence of the generated summary.

Summarizing with Luhn

Luhn is an algorithm for automatic text summarization that looks at which sentences in the text are most important. It analyzes how often certain words occur in the text and where in the text those words are located with an eye to which words contain the most information. Here’s how you can use it:

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.luhn import LuhnSummarizer
# Example text for better demonstration
text = """
The quick brown fox jumps over the lazy dog. The dog, being very lazy, doesn't move.
The fox, enjoying its agility, continues to jump over the dog. Meanwhile, the dog just yawns and stretches.
This pattern continues for a while, making the fox quite happy and the dog remain lazy.
On the other side of the forest, a squirrel and a rabbit play together. They chase each other around the trees,
occasionally stopping to nibble on nuts and grass. The squirrel shows off its acrobatic skills, jumping from branch to branch,
while the rabbit bounds gracefully across the clearing.
As evening falls, all the animals settle down. The fox curls up under a tree, the squirrel finds a cozy spot in the branches,
and the rabbit digs a shallow burrow for the night. The lazy dog simply stretches out on the cool grass, content and peaceful.
"""
# Create a parser
parser = PlaintextParser.from_string(text, Tokenizer("english"))
# Create a Luhn Summarizer
luhn_summarizer = LuhnSummarizer()
# Summarize the text
luhn_summary = luhn_summarizer(parser.document, 4)  # Summarize to 4 sentences
# Print the summary
print("Summary:")
for sentence in luhn_summary:
    print(sentence)

Algorithm	Pros	Cons
LSA	Captures the underlying structure of the text	Can be computationally intensive
	Good for understanding the general topic distribution	May not always pick the most representative sentences
LexRank	Uses a graph-based approach similar to Google's PageRank	May require more computation due to graph construction
	Effective for identifying central sentences in the text	Performance can vary based on the text structure
Luhn	Simple and fast	Can be less effective for longer texts with varied vocabulary
	Effective for texts where significant words are indicative of the main content	Might miss the broader context

Text summarization with sumy

What is sumy?

Installing sumy on a local machine

Using sumy for text summarization

Basic setup

Summarizing with LSA

Code explanation

Summarizing with LexRank

Code explanation

Summarizing with Luhn

Code explanation

Comparing algorithms