There are many ways to perform extractive summarization. In this example we will be using the TextRank algorithm.

https://www.youtube.com/watch?v=PNHB6OuFv7I

Getting started

image.png

First, install https://github.com/DerwenAI/pytextrank

pip install pytextrank

We will also need to download en_core_web_sm

spacy download en_core_web_sm

Let’s start off by first loading this module with spacy

import spacy
import pytextrank # required
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")

We load the language model and add pytextrank to the spaCy pipeline

Now we can process our text

doc = nlp(text)

We can access our textrank object like so

tr = doc._.textrank

To generate a summary we can invoke the summary method on tr

summaries = [
        str(sent) for sent in tr.summary()
    ]