On the optimal selection of Mel-Frequency Cepstral Coefficients for voice deepfake detection

Falcón López, Sergio A.; Tobarra Abad, María de los Llanos; Robles Gómez, Antonio; Pastor Vargas, Rafael

Fecha

2026-03-24

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

Wiley

Citas

Resumen

The continuous evolution of techniques for generating manipulated audio, known as voice deepfakes, and the widespread availability of tools that produce convincing forgeries have created an urgent need for reliable detection methods. This work considers the dimensionality of Mel-Frequency Cepstral Coefficients (MFCCs) as a core design variable for practical, deployable systems. The aim is to identify the smallest number of coefficients that preserves detection performance across heterogeneous models while reducing computational cost, a critical factor for mobile and edge deployment. This study evaluates a hybrid setting on the ASVspoof 2019 Logical Access dataset, in which the same feature family serves as input to five traditional machine learning algorithms (Random Forest, k-Nearest Neighbors, Linear Support Vector Classification, Extreme Gradient Boosting and Support Vector Machine with radial basis function kernel) and five deep learning models (Convolutional Neural Network, Recurrent Neural Network, Convolutional Recurrent Neural Network, Xception and ResNet). Results indicate that deep models reach near-peak performance with a small number of coefficients, whereas classical methods require a larger number to achieve stable performance (except Linear Support Vector Classification, which consistently underperforms). Accordingly, 32 coefficients are considered an effective operating point for hybrid deployments. Overall, the results provide evidence to guide the selection of the number of MFCC coefficients in voice deepfake detection, aiming for efficient, reproducible and explainable systems.

Descripción

This is the Accepted Manuscript of an article published by in Expert Systems, Wiley; available online at the publisher's website: https://doi.org/10.1111/exsy.70245
Este es el manuscrito aceptado de un artículo publicado en Expert Systems, Wiley; disponible en línea en el sitio web del editor: https://doi.org/10.1111/exsy.70245

Palabras clave

Deepfake, Forensic Analysis, Audio deepfake detection

Citación

Falcón-López, S.A., Tobarra, L., Robles- Gómez, A., Pastor-Vargas, R. (2026); On the optimal selection of Mel-Frequency Cepstral Coefficients for voice deepfake detection; Publicación: Expert Systems; Wiley, ; Páginas 1-33, https://doi.org/10.1111/exsy.70245

Centro

Escuela Técnica Superior de Ingeniería Informática

Departamento

Sistemas de Comunicación y Control

Fecha

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones