The Revolutions Blog has a fascinating post regarding the mathematical genius of Shazam, an iPhone app that listens to music and identifies it after about 10 seconds of listening. The app actually works too!
I was in Baltimore for a meeting with the people at Johns Hopkins when I asked Glen Coppersmith how this could be done. We were at an art festival where Cake happened to playing, so I pulled out my iPhone and showed him how the app worked. I turned it on, not knowing if it'd actually identify Cake's live performance, but sure enough it did and Glen and I were both impressed and immediately consumed with questions on how they're pulling this off so well. We first identified the problems in designing such a tool.
It would be too painful to simply send along the ten second window so some compression would have to occur. Not musical compression (ala wav=>mp3) but some kind of reduction in information more abstract. I first considered counting beats per minute, but beats are not always so clearly defined, especially in small windows of time. Glen then thought if we could capture the melody somehow and send that, we could iterate over a database. This seemed quite reasonable.
But what can we capture? The melody itself won't be enough. The melody over a section of time might do it, but that seems like a lot of information to store on the backend. And the query has to be quick or users will remove the app thinking it doesn't work. We felt we needed to be able to reduce the melody down to just it's key points but had no idea how this could be done.
Turns out we were on the right track. The team at Shazam had done almost exactly that. They built a spectrograph of the loudest notes over time. It's precisely a reduction of the melody into it's core identifying characteristics! Fascinating!
Here is the article from Revolution :: Shazam not magic after all