Sonification of Sentiment

For this project, I combined the power of WhisperAI, VADER, and MIDIUtil to transform TV show episodes into short songs capturing the evolution of sentiment line by line.

Introduction

The inspiration for this project struck me in the middle of a stint living in an isolated (and gloriously Internet-deprived) national park in West Africa. On my next recce into town, I discovered two wonderfully comprehensive guides to representing (indeed experiencing) data by converting it into sound. The name of this process — sonification — tickled me. It sounded like a piece of jargon some fictional, sleek science genius would say on TV. With the help of the resources I found, though, I soon discovered that it was a mercifully straightforward and exciting process.

The Project

The most comprehensive and user-friendly of these resources were Shawn Graham’s Programming Historian guide, “The Sound of Data,” and Matt Russo’s series of sonification in Python tutorials. These two guides were immeasurably helpful as I ventured into the realm of sonification.

I’d cobbled together a simple transcription program using OpenAI’s Whisper some months before and I had a handful of my favorite TV shows at my disposal. I quickly discovered that running episodes from the first season of Our Flag Means Death through the transcription script produced a text file compatible with an earlier NLTK project that in turn outputted a dataset compatible with Russo’s sonification script. I got to work. Ultimately, my workflow went something like this:

Whisper transcribes the TV episode.
NLTK tokenizes the sentences in the episode and assigns each sentence a score according to how positive, negative, or neutral the terminology it contains is.
Russo’s sonification script (modified to take the columns of my dataframe as input) maps the sentiment score output from the previous step and outputs a MIDI file containing a song produced from those notes.

The Python scripts I used for this project are available in this GitHub repository.

Results: Season 1 Highlights

A compilation of my favorite parts of the output files from all episodes of Season 1, compiled in chronological order using Audacity.

Discussion

The sound files produced by this project capture a wide range of nuance in the detected sentiment of each sentence in the Our Flag Means Death episodes I put through the pipeline. However, each episode’s corresponding sound file reaches a point where it crescendos and the frequency of notes becomes overwhelming. I’ve tentatively identified this as an erroneous cumulative effect in the data processing pipeline, but I admit I haven’t investigated why this occurs.

There are several additional sources of error earlier in the sonification process that must be addressed. First, the Whisper transcription of each episode is not perfect. Some words are erroneously transcribed and the occasional sentence is broken up incorrectly, resulting in an output of fewer or more sentences than are spoken in actuality. Second, the NLP tools I used assign words a positive, negative, or neutral score according to a predetermined lexicon, meaning they don’t take context into account. This resulted in a humorous bit of output — Izzy Hands, a character not particularly well known for being openly affectionate, spoke a sentence that produced one of the highest positive scores in the project. He was mocking another character at the time, but his use of the word “love” several times during this mockery artificially increased the positive sentiment score his words received.

Conclusion

I had a great deal of fun exploring the capabilities of Whisper and NLTK. This foray into sonification was an absolute blast. Please consider checking out Matt Russo’s and Shawn Graham’s fantastic work!