Three, six or 36: how many basic plots are there in all stories ever written? | Books | The Guardian
“There is no reason why the simple shapes of stories can’t be fed into computers; they are beautiful shapes,” said Vonnegut.

A new academic study has done exactly this, and gives us yet another reason to wish the great man were still with us to share his thoughts on it (and perhaps resubmit that thesis). Researchers from the University of Vermont’s Computational Story Lab fed 1,737 stories from Project Gutenberg – all English-language texts, all fiction – through a program that analysed their language for its emotional content.

Putting – maybe – an end to a debate that has been ongoing for millennia, the researchers found there are “six core trajectories which form the building blocks of complex narratives”. These are: “rags to riches” (a story that follows a rise in happiness), “tragedy”, or “riches to rags” (one that follows a fall in happiness), “man in a hole” (fall–rise), “Icarus” (rise–fall), “Cinderella” (rise–fall–rise), and “Oedipus” (fall–rise–fall). The most successful – here defined as the most downloaded – types of story, they find, are Cinderella, Oedipus, two sequential man in a hole arcs, and Cinderella with a tragic ending.

Their analysis (and the “simple shapes of stories” as theorised by Vonnegut) is provided online, and it’s fascinating to pick through. I liked the rise-fall-rise shape of Gulliver’s Travels, where words such as “destroy”, “enemy” and “ignorance” drag down the happiness rating, and the plunging “Icarus” graph of Romeo and Juliet, plagued by words such as “tears”, “die”, “weep” and “poison”.
Sentiment lexicon for Portuguese
Receptiviti - Home is a technology company with deep roots in academia that is using AI, NLP, Machine Learning and proprietary Language-Psychology Science to reinvent the way organizations understand and engage their most important assets -- people.
Machine learning bias
Sentiment classifier is bigoted. That's not a surprise; the problem is when it's applied wrong.
Idioms in sentiment analysis
some nice small datasets for idioms
