Extractive Summarization

There are many ways to perform extractive summarization. In this example we will be using the TextRank algorithm.

Getting started

pip install pytextrank

We will also need to download en_core_web_sm

spacy download en_core_web_sm

Let’s start off by first loading this module with spacy

import spacy
import pytextrank # required

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")

We load the language model and add pytextrank to the spaCy pipeline

Now we can process our text

doc = nlp(text)

We can access our textrank object like so

tr = doc._.textrank

To generate a summary we can invoke the summary method on tr

summaries = [
        str(sent) for sent in tr.summary()
    ]