Using the Fourier transformation of fractional order when determining the cepstral coefficients in the mel frequencies for the verification of speakers
Keywords:
Mel frequency Cespstral coefficients, Fourier transformation of fractional order, Speech feature extraction, text-dependent speaker verification, signal processingAbstract
The voice is a natural biometric attribute for the verification and recognition of speakers. Using the Fourier transform of fractional order, it is possible to obtain characteristics of the speech signal in the time-frequency space with an added degree of freedom. The speech is represented in a different way compared to the traditionally Fourier transformation. This paper compares the performance of using the fractional Fourier transformation in the speakers’ verification in text-dependent systems, instead of the common representation using the standard Fourier transformation, of the Mel Frequencies Cepstral Coefficients. The results show that, for an appropriate choice of the order of the fractional Fourier transformation, an improvement has been obtained in the verification of the speaker.
References
Reynolds D.A. An Overview of Automatic Speaker Recognition Technology. IEEE ICASSP 2002. 2002, vol. IV, pp. 4072-4075.
Bimbot, F.; Bonastre, J.F.; Fredouille, C.; Gravier, G.; MagrinChagnolleau, I.; Meignier, S.; Merlin, T.; Ortega-García, J.; PetrovskaDelacrétaz, D. and Reynolds, D.A. A Tutorial on Text-Independent Speaker Verification. EURASIP 2004. 2004, vol. 4, pp. 430-451.
Srikaya, R.; Gao, Y. and Saon, G. Fractional Fourier Transform features for speech recognition. IEEE ICASSP 2004. 2004, vol. I, pp. 529-532.
Ozaktas, H.M.; Zalevsky, Z. and Kutay, M.A. The Fractional Fourier Transform: with applications in optics and signal processing. Chichester: John Wiley & Sons, 2001. Wiley Series in Pure and Applied Optics Series , #39, 513pp. ISBN: 978-0471963462.
Namias, V. The fractional order Fourier transform and its application to quantum mechanics. J. Inst. Math. Appl., 1980, vol. 25, pp. 241-265.
Almeida, L.B. The Fractional Fourier Transform and Time-Frequency Representations. IEEE Transactions on signal processing. 1994, vol. 42, núm. 11, pp. 3084-3091.
Torres, R.; Pellat-Finet P. and Torres Y. Fractional convolution, fractional correlation and their translation invariance properties. Signal processing. 2010, vol. 90, núm. 6, pp. 1976-1984.
Faúndez Z., M. Tratamiento digital de voz e imagen y aplicación a la multimedia. México: Marcombo, 2000. 288pp. ISBN: 978- 8426712448.
Gold B. and Morgan N. Speech and audio signal processing. New York: John Wiley & Sons, first edition, 1999. 537pp. ISBN: 978- 8126508228.
Stevens, S.S.; Volkmann, J. and E. B. Newman, E.B. A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America. 1937, vol. 8, núm. 3, pp. 185–190.
White, L.S. and King, S. The EUSTACE speech corpus. Centre for Speech Technology Research, University of Edinburgh. 2003. [web online]. . [Consulta: 01-4- 2011]
Malcolm Slaney. Auditory Toolbox version 2. Interval Research Corporation. 1998. [web online]. . [Consulta: 01-4-2011]
Wang, N.; Ching P.C. and Lee, T. Robust Speaker Verification Using Phase Information of Speech. National Cheng Kung University. The Proceedings of ISCLSP 2010, The 7th International Symposium on Chinese Spoken Language Processing. Tainan & Sun Moon Lake, Taiwan, november 29 to december 3 de 2010. IEEE Conference Publications, pp. 483-487.