Attempt of Music Recognition

Matlab Example: Attempt of music recognition, Rolf Brigola

Supplement to my lectures on Fourier analysis
A first step into audio spectral analysis

Here we consider a first example for the generation of simple spectral fingerprints of an
audio file series as data base for a first audio recognition test. The purpose of this example is
to experience, whether we can identify a music track by a very simple “spectral fingerprint” of a short
track section of about 30 seconds duration, taken from an arbitrary time interval of the track.

In the file fingerprint.m spectral fingerprints are computed for a series of tracks and saved as data base.
For an unknown track section the analog fingerprint has to be computed and compared with the data base.

We test – analogously to the previous page – the following type of a spectral fingerprint:

The bandwidth is partitioned into a series of subbands. For each subband we compute the so-called
spectral flatness“, i.e. the quotient of the geometric and the arithmetic mean of all amplitudes in the subband.
The vector of these values in the interval [0,1] is considered as the spectral fingerprint.

For recognition of a piece we compare its analog fingerprint. The orthogonal projections of the test
fingerprint into the data base fingerprint directions are computed. As “identification” per channel is chosen that track,
where we find the largest component. This is done in fingerprint_test.m

For more details please read the text and comments in the m-files. The m-files and all used test-tracks
can be downloaded in the compressed file archive music-recognition.tar.gz (149 MB). If you also want to download all the tracks,
whose spectral fingerprints are stored in the file fingerprint_db.mat, you can download them from music-tracks.tar.gz (613 MB).

The output of the first m-file is the data base for the test. Matlab’s FFT (on my old notebook with a dual core Pentium in the year 2010)
needed about 170 s computation time for 20 tracks of about 3-4 minutes duration. With a Intel Core™ i5-3320M CPU somewhat later 43 s.
(FFT for about 350 million samples in the 2 channels). Nowadays in 2023 a FFT with a modern High-Performance DSP processes up to
350 million samples per second! (e.g. MSC8156 High-Performance Multicore DSP)

The output of the second m-file is displayed in the Command Window of Matlab and shows the computed track no. per channel
and the components of the test fingerprint into the corresponding fingerprint directions of the displayed track no.
You can check the test success or mismatch by comparing the track list provided with the files, where also a page on the results is included.
You can also display or print out the generated correlations per test and compare them with your perceptive impressions of the tracks in test.
If you like, you can generate an own data base for comparison with music tracks and according test-tracks of your choice.

If you are interested in professional music recognition systems, a starting point could be the article of A. Wang, Shazam Entertainment.