In order to determine how ‘good’ a song is, we must first define what a ‘good’ song is. I will also define what is average, bad, and excellent. Let's start with some broad human definitions that I believe are representative of music:
Bad Music: Bad music’s main feature is being interruptible to someone’s experience as listening. It may or may not be forgettable.
Average Music: Average music is generic, but not interruptible to our listening. It is often forgettable.
Good music: Good music is approximately generic. To a limited extent, it may be memorable.
Excellent music: Excellent music is beyond good, in that it is approachable to many audiences, while keeping complexity.
So, how do we approach this algorithmically? The definition of complexity in music is practically impossible to define. Fortunately, instead of developing an ultra-complex algorithm of some sort, we can instead make a music prediction AI. We can rank the probabilities of the AI’s prediction for the next note, and the actual note by the human artist.
Then, doing this for each note in the song, we now have a decent approximation of the complexity of the song.
This works great if you want to know which song is the most random, but obviously, a cat walking on a keyboard probably doesn’t constitute as the greatest song of all time.
My idea to filter out these cases is to actually have multiple AI running predictions on the music. The two (or more) AI would be different in model sizes, and with this, we can determine the divergence rate as the model parameters go to infinity. If the smaller AI predicts with approximately the same accuracy as the larger model, then we know the song is either very generic or just random notes.
In other words, since neither AI has a significant advantage, it is either so generic that anyone can predict it, or it's so random that no one can predict it, both fitting the definition of bad or average music.
On the other hand, if the larger model predicts with significantly more accuracy than the smaller AI, this means that only the larger AI can accurately deduce what is going on and predict accurately. Larger gaps mean better since this is symbolic of it being approachable to many audiences while keeping complexity. Therefore, this would be good or excellent music.
In the next article I write, I will share my results using this algorithm in practice! I already have the models (They are open source) ready for this purpose, so I just need to create the script for this algorithm and simply run it, so stay tuned!