Music is the oldest and most primal form of art. Before the Venus figures, before the cave paintings, we were carving flutes of wood and bone. Musical instruments existed for 30% of our time as a species, and singing existed for certainly even longer, although how much longer is unknowable. Music has, in other words, existed for long enough to exert its influence on our evolution. We are, after all, a pack animal. To be human is to communicate, and for much of our time on this Earth, to communicate means to produce music. After all, is there any other method of communication so able to clearly convey raw emotion? Communicating emotion, pure emotion, without merely stating the cause of it or the symptoms of it, is a fundamentally necessary ability to have. This is why every culture has a musical tradition. A society can exist without visual art, without the wheel, without even written language, but it cannot exist without music. Music is simply too essential. It is ingrained into our DNA in a way no other form of art is. It is a creation of pure emotion, expressed in a way others can experience.
And yet, music has, somehow, lost its magic. Songwriters cling to arbitrary rules of the trade, the purpose of which is lost to them. Popular music sounds increasingly the same, to the point that entire songs can be confused for each other. Sitting in someone's basement lies a flash drive filled with every melody that has been written or ever will be. Music has become ordinary, filled with no more emotion than can be sold. It has been cut up into pieces that can be dispassionately reassembled into a product, and listeners have become numb.
How is this possible? How has a 60,000-year tradition of empathetic magic become so lifeless? Why does every song sound so similar, when no two feelings are the same? And even when writing music, why is it so challenging to avoid falling into the same tropes we try to escape? The answer is not the music, nor the music theory. The answer is what lies underneath, a single assumption that made this mess, that has gone unquestioned for centuries:
What are the notes?
If you're like most people with a passing knowledge of music, the answer seems incredibly easy. There are twelve notes, spaced equally around the octave. But why? Why are there twelve notes, and not eleven, or thirteen, or fifty? Buckle in, we have a long ride ahead of us.
Sound is made of pressure waves in the atmosphere. Notes, or tones, are sound waves that repeat regularly. How often there waves repeat is the frequency of the note, but the frequency of a note is only one piece of the story. If frequency was the only part of a wave that mattered, a piano, violin, and flute would all sound the same, as would every vowel in language. Just as important as pitch is the waveform, which controls the timbre of sound. It is the shape of the wave which controls how a note sounds. This shape, however, is not a black box. Every naturally occurring tone is made up of an infinite number of harmonics, occurring at multiples of the root frequency, which add up to the original tone. For instance, the note A440 has its fundamental frequency at 440Hz, and then an overtone at 880Hz, another at 1320, another at 1760, etc. It is the ratio of these overtones that gives each sound its own characteristic color.
Our modern tuning is, to an extent, based on this harmonic series. The first harmonic, at a ratio of 2:1 above the fundamental, is the same note an octave higher. A tone played one octave above the original has so much overlap that the two can be hard to distinguish when played together. It is for this reason that most cultures consider distinct notes only within the octave, with the same notes repeating every octave above or below. For this reason, all intervals more complex than the octave are generally adjusted by factors of 2:1 until they lie within the octave. This does not change what the interval is, for instance, a ratio of 7:1 and 7:4 are the same, however the second notation is preferred.
The second harmonic, with a ratio of 3:2, is the basis of the second most stable interval, the perfect fifth. Like the octave, most cultures incorporate the fifth into their music. Unlike the octave, however, the fifth we use in our music is not a perfect perfect fifth. In our system of twelve-tone equal temperament, twelve fifths add up to an octave. In a system of ideal intervals, known as Just Intonation, no amount of fifths will add up to an octave, because no nontrivial power of three will ever equal a power of two. While this may sound inconvenient, functional systems of tuning have been based on stacking perfect fifths. Most notably of these is Pythagorean Tuning, a system of twelve notes made by selecting a root frequency and stacking fifths, five below the fundamental and six above. This results in a system that is functional in the root note of the tuning, and notes close by in the circle of fifths, but further away risks including a wolf tone where the endpoints of the "circle" are closed. Compared with 12TET, this system has a more accurate fifth, at the cost of not being a universal tuning.
After the fifth, the next note in the harmonic series is 5:4, the major third. You may be confused at this point. After all, if the fifth is 3:2, wouldn't that make the major third have the ratio 81:64? The answer to that is that there are more than one interval that sound like a major third. Because the intervals are so close, differing only be a factor of 81:80, they are treated by 12TET as the same interval. Despite their similarity, however, they do each have their own character, and differentiating between them offers interesting opportunities.
The major third is the last prime factor supported by 12-tone equal temperament, but it is far from the last member of the harmonic series. The harmonic seventh, with a ratio of 7:4, is a highly flat minor seventh, which can be used to great effect in seventh chords, forming the powerful ratio 4:5:6:7. Despite not being offered a spot on the keyboard, Western music still uses the harmonic seventh in the adaptive tuning of a capella, strings, or winds. The harmonic eleventh, 11:8, lies almost exactly between the perfect fourth and the tritone. It is the first element of the harmonic series that is not a part of Western tonality, however it is used in other cultures, for instance the quarter tones of many Middle Eastern traditions. The harmonic series continues forever beyond this point, however, each additional overtone is more unfamiliar, and each addition to a tonal system greatly increases the complexity of that system. For this reason, musicians tend to stick to low factors to make their music easier to understand. It is important to remember that while we can break the rules to great effect, the purpose of sharing our art is not to outsmart anyone, but to be understood.
The gold standard of tonality is just intonation. However, writing even a simple piece in just intonation may require an extremely large number of notes. Every chord in a justly tuned song requires borrowing its own set of notes. Even when the number of notes is kept manageable, the system is not reusable. A keyboard tuned to just intonation only works in the key it was tuned to, and even then it may not be enough, since one song may require the Pythagorean third, and another may require the diatonic 5:4 third. For these reasons, a keyboard cannot be tuned to just intonation.
The largest problem in just intonation is the comma. A comma is a small interval, barely large enough to be perceptible, that is the difference between two ways of reaching the "same" note. The 81:80 resulting from two different major thirds is an example of a comma. Another comma is the number 3^12:2^19 resulting from connecting the ends in Pythagorean tuning. These commas result in an explosion in the number of potentially usable notes in tuning system.
Early on in the development of keyboards, a solution was developed. When a unwanted comma appeared, the smaller interval could be made slightly larger, and the larger interval could be made slightly smaller, and so the comma could be tempered out, and the number of notes could be made reasonable again. Several temperaments were suggested. In most of them, the twelve notes in widespread use at the time were adjusted such that the fifth was made slightly flatter, to varying degrees, and the third slightly sharper, so that both major thirds would be the same. In many of these, the circle of fifths did not exist, as it did not in Pythagorean tuning, but the musician would simply be expected to not wrap around. The difference between these temperaments and Pythagorean tuning is that the temperaments were 5-limit, rather than 3-limit, tunings. That is, they fit the major third into their system. The temperament that became standard, 12TET, was selected because it solved the same problems as other temperaments, while also being an equal temperament, that is, it worked equally well with any note used as the root, because the notes were spaced equally around the octave.
So what's the problem? Why isn't 12 good enough?
12TET is a 5-limit system. It does not accommodate the harmonic seventh, the harmonic eleventh, or any larger harmonic. Its steps are quite large, roughly 50 times the smallest perceptible difference, so subtle differences may be tempered out when we don't want them to. While there may be almost unlimited songs we can still write, it matters very little whether we can write new music in 12TET, when what we care about most is whether we can write our ideas faithfully. Throughout history, music, as with all art, has trended towards becoming more complex. 3-limit harmony became standard in ancient times. 5-limit harmony via 12TET became standard in the 16th century. Is now the time for 7-limit harmony? I and many others would say yes.