Sentence segmentation in different languages using spaCy

Sentence segmentation is the process of dividing a chunk of text or a paragraph into individual sentences. This task requires us to identify the boundaries that separate one sentence from another. It is a fundamental task in natural language processing (NLP) and is often an essential preprocessing step for NLP applications as it makes parsing and analysis easier.

Sentence segmentation in spaCy

The spaCy library offers a very simple and easy way for sentence segmentation. We can use the sents property, which is a part of the built-in Doc class. spaCy achieves this using a dependency parser; no other library uses such a sophisticated method of handling sentence segmentation. spaCy also allows us to perform sentence segmentation in different languages by loading different language models.

For our example, we will be using the Spanish and French language models. Let's start with the Spanish example.

import spacy
nlp = spacy.load("es_core_news_sm")
text = "¿Querías saber cuánto durará esto? Hasta la muerte"
doc = nlp(text)
for sent in doc.sents:
print(sent.text)

Let's go over the code:

  • Line 1: We import the spacy library.

  • Line 2: We load the Spanish language model.

  • Line 4–5: We store the Spanish text in a variable called text and add it to an doc object.

  • Line 7–8: We use the sents property that is inside the doc class to loop through the text and print the sentences.

Now let's look at the French example.

import spacy
nlp = spacy.load("fr_core_news_sm")
text = "Frère Jacques Frère Jacques Dormez vous? Dormez vous? Sonnez les matines Sonnez les matines Ding ding dong Ding ding dong"
doc = nlp(text)
for sent in doc.sents:
print(sent.text)

The code is largely the same except for two differences:

  • Line 2: We load a French language model.

  • Line 4: We add a French text that we want to be segmented.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved