the new pope

analysing new Sorrentino TV series script

The new Pope, the TV series created and directed by Paolo Sorrentino, is astonishing, provocative, intense and profound. It makes people think of something different and misterious.

This is the motivation that led to this text analysis: understand the secret of the strength of the script.

the new pope

The New Pope is a drama television series created and directed by Paolo Sorrentino for Sky Atlantic, HBO and Canal+. It is a continuation of the 2016 series The Young Pope, originally announced as its second season. According to Collider “It’s less a story than a sermon with too many subjects, taking on greed, and sex, and faith, and corruption but only in general, arms-length terms.”“The opening credits of the entire show is a crew of cloistered nuns throwing the hell down to Sofi Tukker’s “Good Time Girl“, a pulsating neon crucifix illuminating them in reds, yellows, and blues. It’s wild, but it’s also a pretty clear summary of The New Pope‘s thesis. After all, what’s the difference between faith and abandon?”

“To quote a sermon from Pius XIII himself, deep into The New Pope: “You know what is so beautiful about questions? It’s that we don’t have the answers. In the end, only God has the answers.”"

most common words

The script of each episode can be found at springfieldspringfield, a web site that hosts a database containing thousands of TV show episode scripts and movie scripts.

Before starting the analysis names of characters, and pope appellation such as holy father have been excluded.

It is possible then to visualize the most common words in the whole series script where a word is considered most common when it appears more frequently.

The below bar chart visualize words appearing more than 45 times in the whole script.

The wordcloud below visualizes words appearing more than ten times.

Both visualizations show the main topics on which Sorrentino thinks about: love, God, life, time, church and world.

sentiment analysis

In order to get the general mood of the series, a sentiment analysis is performed on the overall script considering unigram (only one word) using both a polarization lexicon (negative - positive) such as “bing” and an emotion related lexicon such “nrc”.

sentiment polarization

The overall sentiment in the script is negative (valued -222 by the algorithm). The sentiment polarization along the episodes highlight this negativity except for episode 3 in which the new Pope, John Paul III, is elected.

Checking against the episodes storyline confirms the polarization found by sentiment analysis.
episode storyline polarization
1 election and death of Pope Francis II negative
2 attempt to convince Brannox to became Pope negative
3 Brannox is elected Pope John Paul III positive
4 Pope John Paul visits Pope Pius in Venice negative
5 John Paul III says NO in Lourdes after a terror attack negative
6 John Paul III leaves worldwide television interview displaying symptoms of withrawal negative
7 Pope Pius XIII finally awakens from his coma and prays desperately for the disabled boy of people hosting him negative
8 Voiello’s friend Girolamo dies, and receives an elaborate Vatican funeral negative
9 Perpetrators of a terror act killing a priest unveil themselves as Pius XIII idolaters. Pius XIII dies peacefully negative

Words contributions to script polarization are depicted in the following graph.

emotions

Evaluating the emotions showed within the script through a sentiment analysis with the “nrc” lexicon, the predominant emotions are: trust, or maybe faith, followed by fear.

The words inspiring the most the diverse emotions are
anger anticipation disgust fear joy sadness surprise trust
bad church abortion abortion church abortion death church
death god death coma god coma gift god
evil holiness evil death holiness death holiness holiness
money mother sin god love leave leave lord
sin saint suffering holiness mother mother miracle mother

relationship between words

Another element in quantative text analysis is relationship between words. The below graph visualize most common bigrams, two words that occur together. There are some trivial occurrence such as tomorrow morning, Sistine Chapel and reverend mother. But there are also some insight reveiling associations.

  • Fragile as porcelain is how sir John Brannox descibes himself.

  • Sexual abuse is one of the series theme and one of church issues.

  • Also the association between financial structure or organization seems to be another church issue.

  • Christiam charity and how the hell is filled up are main teological topics.

The network graph below visualize the words correlation among episodes (wiht more than 0.7 correlation). The graph hilight the presence of 6 clusters. Even if it is hard to interpret, the cluters can represent a theme or a narrative line present throughout the series. First cluster on the right (mother, woman, abortion) could represent the discourse related to maternity. The two larger clusters could represent important themes within The New Pope:

  • the cluster with words like “friend”, “love” and “children” seems to be linked to the theme of human relationships;

  • the cluster with words like “lord”, “heaven” and “dead” seems to be linked to human destiny theme.

The one on the left should represent the story line of the vatican sisters strike that goes on from the beginning to the end of the series.

topic analysis

The analysis in this section uses topic modeling in order to explore the hypothesis that each episode in the series is focused on a particular theme.

Topic modeling is a method for unsupervised classification of documents, similar to clustering on numeric data, which finds natural groups of items even when we’re not sure what we’re looking for. Latent Dirichlet allocation (LDA) is a particularly popular method for fitting a topic model. It treats each episode as a mixture of topics, and each topic as a mixture of words. This allows episodes to “overlap” each other in terms of content, rather than being separated into discrete groups, in a way that mirrors typical use of natural language.

Running the LDA algorithm with 9 topics does not confirm that it is reasonable to think of each episode as reflections on a particular subject.

Infact not in each episode it is possible to highlight the prevalence of one topic over another.

Episode 3 and 6 share the same topic. Topic 6 is characterized by the following words: love, church, life, God, secretary, rome, mother. At the same time Episode 3 and 6 are joined by storylines or narrative arcs:

  • in episode 3 starts the experience of Ester as prostitute for kids with problems and in episode 6 this experience come to a dramatic end;

  • in episode 3 Voiello, secretary of state, score his victory by getting Brannox elected Pope while in episode 6 Voiello is forced to resign;

  • in episode 3 Pope Jonn Paul III is elected while in episode 6 mark his defeat baecause drug addicted.

Episode 2 instead embed 2 topics (8 and 9). Topic 8 is characterized by the following words: God, church, time, love, true, imagine and speak. Topic 9 is characterized by: church, life, beautiful, die, thinking, porcelain and weather. Infact in episode 2, two kind of discourse are in place:

  • reflection about what the the church is or should be (topic 8)

  • and a more intimate reflection on the man himself: sir John Brannox (Topic 9).

consideration about text quantitative analysis

Sorrentino work is too rich to be understood in term of quantitative analysis and the application of quantitative methods to what people might call art is questionable.

Nonetheless thinking quantitatively can make more evident some aspects of the text and lead to insights.

“Understand” is one of the most common word in the overall script (it appears 25 times, almost 3 times for episode) because is part of the burden of being human.

Should anyone fear to use maths and algorithms to make the burden lighter?

Feel free to email me if you would like to go deeper in the analysis, thanks for reading!


The analysis shown in this post have been executed using R as main computation tool together with its gorgeous ecosystem. In particular text analysis relied on tidytext, tm, topicmodels, widyr and wordcloud packages.