Cargando...
Fecha
2025-09
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editorial
Resumen
Este trabajo intenta ser una aportación más al mundo del Reconocimiento del Habla Automatizado (ASR por sus siglas en inglés, Automatic Speech Recognition), cuya historia y avances van emparejados a los de la Inteligencia Artificial. Entre los campos de esta disciplina se encuentra la Diarización, consistente en identificar las voces de los distintos interlocutores de un audio. Esta disciplina, más compleja de lo que pudiera parecer en un principio, ha motivado diversos estudios en los últimos años.
Este trabajo se centra en la obtención de diarizaciones a partir de subtítulos y en la evaluación de éstas con respecto a las diarizaciones de referencia. Para ello, se hace un estudio, tanto de las métricas de error de diarización obtenidas, como del rendimiento en tiempo. Además de esto y en aras de contribuir a conseguir unos subtítulos de alta calidad, se quiere cumplir con determinados apartados del estándar UNE 153010:2012: “Subtitulado para personas sordas y personas con discapacidad auditiva” relacionadas con la diarización; para ello se ha realizado un proceso que contribuye a la mejora de los subtítulos.
El desarrollo se ha realizado utilizando tecnología de contenedores, lo que facilita la realización de pruebas aisladas en las condiciones deseadas, algo muy interesante en los experimentos y comparativas que se lleven a cabo.
This work attempts to be a further contribution to the world of Automatic Speech Recognition (ASR), whose history and advancements are closely related to those of Artificial Intelligence. Among the fields of this discipline is Diarization, which consists of identifying the voices of the different speakers in an audio recording. This discipline, more complex than it might initially seem, has prompted various studies in recent years. This work focuses on obtaining diarizations from subtitles and evaluating them against reference diarizations. To this end, a study is conducted of both the diarization error metrics obtained and the time performance. In addition, in order to contribute to achieving high-quality subtitles, we aim to comply with certain paragraphs of UNE 153010:2012 standard: "Subtitling for deaf and hearing impaired people" related to diarization. To this end, a process has been implemented that contributes to the improvement of subtitles. The development has been carried out using container technology, which facilitates the performance of isolated tests under the desired conditions, something very interesting in the experiments and comparisons that are carried out. Some conclusions have been reached, the general one being that under ideal conditions: clear, uninterrupted speech, no overlapping, and a quiet environment, diarization results can be acceptable. Another conclusion is that obtaining reliable metrics requires perfectly hand-labeled subtitling files, which is time-consuming. Regarding the results obtained, the conclusion is that the Pyannote pipeline requires GPU processing to fully utilize its power and without it, it is no better than NeMo.
This work attempts to be a further contribution to the world of Automatic Speech Recognition (ASR), whose history and advancements are closely related to those of Artificial Intelligence. Among the fields of this discipline is Diarization, which consists of identifying the voices of the different speakers in an audio recording. This discipline, more complex than it might initially seem, has prompted various studies in recent years. This work focuses on obtaining diarizations from subtitles and evaluating them against reference diarizations. To this end, a study is conducted of both the diarization error metrics obtained and the time performance. In addition, in order to contribute to achieving high-quality subtitles, we aim to comply with certain paragraphs of UNE 153010:2012 standard: "Subtitling for deaf and hearing impaired people" related to diarization. To this end, a process has been implemented that contributes to the improvement of subtitles. The development has been carried out using container technology, which facilitates the performance of isolated tests under the desired conditions, something very interesting in the experiments and comparisons that are carried out. Some conclusions have been reached, the general one being that under ideal conditions: clear, uninterrupted speech, no overlapping, and a quiet environment, diarization results can be acceptable. Another conclusion is that obtaining reliable metrics requires perfectly hand-labeled subtitling files, which is time-consuming. Regarding the results obtained, the conclusion is that the Pyannote pipeline requires GPU processing to fully utilize its power and without it, it is no better than NeMo.
Descripción
Categorías UNESCO
Palabras clave
ASR, Diarización, Subtítulos, ASR, Diarization, Subtitles
Citación
Sáenz De Cosca Lacalle, Daniel. Trabajo Fin de Máster: "Comparativa de procesos de diarización automática". Universidad Nacional de Educación a Distancia (UNED), 2025
Centro
E.T.S. de Ingeniería Informática