Cargando...
Fecha
2026-03-24
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editorial
Wiley
Resumen
The continuous evolution of techniques for generating manipulated audio, known as voice deepfakes, and the widespread availability of tools that produce convincing forgeries have created an urgent need for reliable detection methods. This work considers the dimensionality of Mel-Frequency Cepstral Coefficients (MFCCs) as a core design variable for practical, deployable systems. The aim is to identify the smallest number of coefficients that preserves detection performance across heterogeneous models while reducing computational cost, a critical factor for mobile and edge deployment. This study evaluates a hybrid setting on the ASVspoof 2019 Logical Access dataset, in which the same feature family serves as input to five traditional machine learning algorithms (Random Forest, k-Nearest Neighbors, Linear Support Vector Classification, Extreme Gradient Boosting and Support Vector Machine with radial basis function kernel) and five deep learning models (Convolutional Neural Network, Recurrent Neural Network, Convolutional Recurrent Neural Network, Xception and ResNet). Results indicate that deep models reach near-peak performance with a small number of coefficients, whereas classical methods require a larger number to achieve stable performance (except Linear Support Vector Classification, which consistently underperforms). Accordingly, 32 coefficients are considered an effective operating point for hybrid deployments. Overall, the results provide evidence to guide the selection of the number of MFCC coefficients in voice deepfake detection, aiming for efficient, reproducible and explainable systems.
Descripción
This is the Accepted Manuscript of an article published by in Expert Systems, Wiley; available online at the publisher's website: https://doi.org/10.1111/exsy.70245
Este es el manuscrito aceptado de un artículo publicado en Expert Systems, Wiley; disponible en línea en el sitio web del editor: https://doi.org/10.1111/exsy.70245
Este es el manuscrito aceptado de un artículo publicado en Expert Systems, Wiley; disponible en línea en el sitio web del editor: https://doi.org/10.1111/exsy.70245
Categorías UNESCO
Palabras clave
Deepfake, Forensic Analysis, Audio deepfake detection
Citación
Falcón-López, S.A., Tobarra, L., Robles- Gómez, A., Pastor-Vargas, R. (2026); On the optimal selection of Mel-Frequency Cepstral Coefficients for voice deepfake detection; Publicación: Expert Systems; Wiley, ; Páginas 1-33, https://doi.org/10.1111/exsy.70245
Centro
Escuela Técnica Superior de Ingeniería Informática
Departamento
Sistemas de Comunicación y Control

