Realtime Audio Visualization - Senior Design Day

Simulator used showing the cascading color changing LEDs.

Team Members

Andrew Hahn
Micaela Holmes
Chunliang Tao

Supervisors

Jack Mottley, PhD, Electrical and Computer Engineering and Biomedical Engineering, UR;

Daniel Phinney, MS, Electrical and Computer Engineering and Audio Music Engineering, UR

Goal

The goal of this project is to create a small system containing a Raspberry Pi, LED strip, and associated wiring that is able to convey the mood of music through color. The music choice in movies, video games, and other forms of visual art is oftentime a significant factor in fully grasping the creators intention. In cases where music may not be heard, the cues given by the music are missed (think of a scary movie without the build up). This project is designed to supplement the soundtrack and use the LED strip to showcase the mood through the use of colors.

Description

Using a rope of LEDs, connected to a Raspberry Pi with our software, we will read the audio signal from an HDMI cable and display colors on the LED strip that match the tone of the music. The goal of our project is to be able to create a visualization to convey a similar feeling to people that are unable to hear the music. A successful implementation would generate color choices that look like they match the music being played.

Process

We started by creating a simulator of the LED strip in Python using graphics.py in order to enable us to work on the software to control the color and brightness of the LEDs without needing to first build the physical system to test. The simulator needed to operate as similarly to the actual LED strip as possible, which meant that it must update them in series, rather than in parallel.

Simulator used showing the cascading color changing LEDs. This is with a high delay in order to show that the colors are not updated in parallel. The actual system uses a far lower delay, so they will appear to change at the same time.

With that completed, we were able to start working on reading input. In our actual hardware system, we would have taken the audio input directly from HDMI, but due to the circumstances surrounding the COVID-19 outbreak, we were unable to put together our hardware, so our project remained as a software-only implementation.

Because we would like our project to still be easily adaptable to an implementation with hardware, we take our input over the default audio device for the computer. With this being the case, we should be able to change the audio device by changing one line in our code.

We used PyAudio for capturing input. Although it is typically used for recording input to output to a file, we configured it to track a rolling window of time, with the duration configurable by the user using a global variable (MAX_FRAMES) in our code. At any given time, our program is only keeping track of the past [MAX_FRAMES] frames to prevent our memory footprint and computational complexity from increasing without bound.

The result of this memory-conscious approach is a program whose performance will not degrade over time if the system is left on. When tracking 256 frames, which is roughly the past 6 seconds of audio, we had used a constant 57.7 MB of RAM, which fits very comfortably within the constraints of the Raspberry Pi 4 that we were designing for.

In addition to keeping the memory of the Raspberry Pi in mind, we also had to consider its computational power. In order to keep our program running smoothly, we had to consider a time budget for performing all of our calculations for each update of the LEDs. Our goal was to be able to smoothly update our LED display in real-time, so keeping computation time to a minimum will allow us to maximize our refresh rate.

In order to facilitate this, we record our frames into a deque, which is a data type from Python’s collections module that implements a double-ended queue. It allows O(1) access to the head and the tail, which is ideal for us given our decision to track a rolling frame. It allows us to add new elements to our frame buffer and access them, as well as remove elements that we no longer need to track, in constant time.

We use NumPy to perform most of our calculations to turn the direct data stream from PyAudio into something that we can work with. In addition to tracking the frames, we also track a few pieces of metadata relating to each frame in order to minimize the frequency of operations we need to do on the whole data set.

We believe that what we have written will work well on a Raspberry Pi, but since we were unable to test on the hardware, we have given made the refresh rate for the colors configurable using a global variable (REFRESH_TIME) where the value is given as a number of frames.

The color selections were based upon multiple research studies correlating music and mood. The idea that sounds can be conveyed as colors dates back to the 18th century with the majority of scales agreeing on a C as red. The challenge came in not only expressing the singular note of the music, but of the entire mood; meaning our program would have to recognize individual notes as well as the overall mood of the piece.

Results

Overall, we believe that our completed system works very well, given the setbacks that we dealt with during the development. Particularly with classical music, it is able to pick out individual notes extremely well. With more modern music that has radically different parts for different instruments, it is not able to work quite as well, but the results are still good.

The following are examples of its performance. They were captured with a Razer Kraken 7.1 V2 headset as the input device for the audio, and used MAX_FRAMES = 256, REFRESH_TIME = 1, and FADE_TIME = 50 in the code. Due to the less than ideal recording setup, the audio track does not sound very good and it peaks at times, our apologies for that.

These first three examples illustrate the performace on classical music with wide range.

Eine Kleine Nachtmusik, composed by Mozart. This example was captured with FADE_TIME = 100, instead of the value of 50 that was used for the rest of the examples.

Canon in D, composed by Pachelbel. The exact version used is a piano/cello duet, rather than the full orchestral rendition.

In the Hall of the Mountain King, composed by Grieg.

Clair de Lune displays how well the system is able to manage for slower, quieter music.

Clair de Lune, by Debussy.

The next two examples are taken directly from cinema and give a good example of how the system will work on more modern orchestral pieces.

He’s a Pirate, from Pirates of the Carribbean: The Curse of the Black Pearl, composed by Hans Zimmer, Klaus Badelt, and Ian Honeyman.

Duel of the Fates, from Star Wars: Episode III – Revenge of the Sith, composted by John Williams.

The following four videos represent more contemporary musical examples.

Handlebars, by the Flobots.

Enter Sandman, by Metallica.

Oh Glory, by Panic at the Disco.

The F.U.N Song, from Spongebob Squarepants.

References

Nuzzolo, Michael. “Music Mood Classification.” Electrical and Computer Engineering Design Handbook, 2015, sites.tufts.edu/eeseniordesignhandbook/2015/music-mood-classification/.

Sumare, Sonal P., and D.G. Bhalke. “Automatic Mood Detection of Music Audio Signals: An Overview.” IOSR Journals, 2015, www.iosrjournals.org/iosr-jece/papers/NCIEST/Volume%201/17.%2083-87.pdf.

“Affective Musical Key Characteristics.” Translated by Rita Steblin, Musical Key Characteristics, 1993, www.wmich.edu/mus-theo/courses/keys.html.