چكيده به لاتين
According to the increasing amount of music pieces, in recent years, many researches have been conducted in the field of Music Information Retrieval. One of most known methods in this field is based on audio fingerprints. Audio fingerprints are content-based features for representation of a music piece in order to describe that piece for a music identification system. Philips method is one of the basic methods for identifying music. In Philips method, audio signal is divided into overlapping frames and each frame is divided into 33 sub-band frequencies using a filter bank. 32 bit fingerprint is generated using energy of theses frequency bands. One drawback of Philips basic method is its weakness for music identification in noisy conditions.
In this thesis, two approaches are proposed to overcome this drawback. In the first approach, fingerprint bits are generated using 3 frequency bands. Furthermore, a power mask is used. A power mask is a weight matrix for the fingerprint bits which weight bits based on probability of their contamination with noise such that clean bits have more effects on identifying music pieces. In second proposed approach, wavelet transform is used for extracting features in time-frequency domain. In this way, it is possible to analyze the signal at different level of details.
The experiments are performed on 250 audio tracks with length of 15 seconds selected from GTZAN dataset. Results show that for identifying music pieces in noisy conditions, the average of accuracy increases from 86.06 to 96.06 for the first approach where identification speed has not increased. On the other hand, average of accuracy in the second approach increase from 86.06 to 99.6 with more computational cost.