There are many ways to perform extractive summarization. In this example we will be using the TextRank algorithm.
https://www.youtube.com/watch?v=PNHB6OuFv7I
First, install https://github.com/DerwenAI/pytextrank
pip install pytextrank
We will also need to download en_core_web_sm
spacy download en_core_web_sm
Let’s start off by first loading this module with spacy
import spacy
import pytextrank # required
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")
We load the language model and add pytextrank
to the spaCy pipeline
Now we can process our text
doc = nlp(text)
We can access our textrank object like so
tr = doc._.textrank
To generate a summary we can invoke the summary
method on tr
summaries = [
str(sent) for sent in tr.summary()
]