Supplement to my lectures on Fourier analysis
A Simple Attempt of Speaker Recognition
We all have the experience that we identify within a few seconds a friend in a phone call by her or his voice.
Here we consider an example for the generation of vectors with amplitude means for a series of test speakers.
These vectors build the data base for a speaker recognition attempt.
We test, whether we can identify a speaker of that data base population by a 6 s speech test.
To test a text independent speaker recognition, the text sections used for the data base generation and
those of the speech tests were different (arbitrarily taken from a newspaper or another text).
Characterizing vectors are computed for a series of speaker recordings in the file generate_speaker_db.m
For an “unknown speaker” of that population the analog vector is computed from a test-track and compared with the data base.
For the example, I have already spectral data of 19 speakers stored in the file speaker_fingerprint_db.mat in the download files.
Test-tracks for them are also in the downloads.
We test the following type of a very simple spectral characterization:
For the frequency band 0 – 4000 Hz a vector is computed, whose components are means of amplitudes in a number of subbands.
Per speaker 6 time frames of 3 s duration are used, i.e. a 18 s speech recording as a wav-file with 44100 sampling rate.
For recognition an analog vector of a test speaker is computed. That vector is compared with the data base vectors.
As “identification” is chosen that data base person, where we find the largest correlation. This is done in recognize_speaker.m
You could test the procedure with own tracks from friends, if you like.
For more details please read the text and comments in the m-files. The m-files are
m-file generate_speaker_db.m for the generation of a test data base with characterizing speaker spectra, download here
m-file recognize_speaker.m for the recognition test, download here
My test data base generated with 19 test speakers to download here
List of participating speakers in my tests and corresponding numbers of their identities and test_track numbers, to download here
m-file plot_speaker_spectrum.m to plot the computed test speaker amplitude spectrum, to download here
The computed one-sided amplitude spectrum of my friend Chris was
Tests
To experience how it works and to compare the results with the speaker list, you can download all the tracks
used in the example with the filenames test_trackN.wav (N the number of the track, each about 6 s)
from the zip-file here (33 MB).
If you like, you can also download the tracks, which I used to build the data base for a match with a test speaker
as zip-archive here (68 MB).
The m-files, the extracted track archives and the data base speaker_fingerprint_db.mat must be
located in the same Matlab working folder.
I made this site on a rainy day, when I was in Italy with musician friends. We had fun with the curious test tracks,
especially with that of our friend Chris with his really funny reading of an italian restaurant menu.
Maybe you have fun too: listen to Chris.
To learn more on the considered subject, you could start with the article Wikipedia: Speaker Recognition and with references there.