Publicación:
Test-driving information theory-based compositional distributional semantics: A case study on Spanish song lyrics

dc.contributor.authorGhajari Espinosa, Adrián
dc.contributor.authorBenito Santos, Alejandro
dc.contributor.authorRos Muñoz, Salvador
dc.contributor.authorFresno Fernández, Víctor Diego
dc.contributor.authorGonzález Blanco, Elena
dc.date.accessioned2025-05-13T07:10:29Z
dc.date.available2025-05-13T07:10:29Z
dc.date.issued2025-06-15
dc.descriptionThe registered version of this article, first published in “Knowledge-Based Systems, vol. 319, 2025", is available online at the publisher's website: Elsevier, https://doi.org/10.1016/j.knosys.2025.113549 La versión registrada de este artículo, publicado por primera vez en “Knowledge-Based Systems, vol. 319, 2025", está disponible en línea en el sitio web del editor: Elsevier, https://doi.org/10.1016/j.knosys.2025.113549
dc.description.abstractSong lyrics pose unique challenges for semantic similarity assessment due to their metaphorical language, structural patterns, and cultural nuances - characteristics that often challenge standard natural language processing (NLP) approaches. These challenges stem from a tension between compositional and distributional semantics: while lyrics follow compositional structures, their meaning depends heavily on context and interpretation. The Information Theory-based Compositional Distributional Semantics framework offers a principled approach by integrating information theory with compositional rules and distributional representations. We evaluate eight embedding models on Spanish song lyrics, including multilingual, monolingual contextual, and static embeddings. Results show that multilingual models consistently outperform monolingual alternatives, with the domain-adapted ALBERTI achieving the highest F1 macro scores (78.92 ± 10.86). Our analysis reveals that monolingual models generate highly anisotropic embedding spaces, significantly impacting performance with traditional metrics. The Information Contrast Model metric proves particularly effective, providing improvements up to 18.04 percentage points over cosine similarity. Additionally, composition functions maintaining longer accumulated vector norms consistently outperform standard averaging approaches. Our findings have important implications for NLP applications and challenge standard practices in similarity calculation, showing that effectiveness varies with both task nature and model characteristics.en
dc.description.versionversión publicada
dc.identifier.citationAdrián Ghajari, Alejandro Benito-Santos, Salvador Ros, Víctor Fresno, Elena González-Blanco, Test-driving information theory-based compositional distributional semantics: A case study on Spanish song lyrics, Knowledge-Based Systems, Volume 319, 2025, 113549, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2025.113549
dc.identifier.doihttps://doi.org/10.1016/j.knosys.2025.113549
dc.identifier.issn0950-7051
dc.identifier.urihttps://hdl.handle.net/20.500.14468/26536
dc.journal.titleKnowledge-Based Systems
dc.journal.volume319
dc.language.isoen
dc.page.initial113549
dc.publisherELSEVIER
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.es
dc.subject33 Ciencias Tecnológicas
dc.subject.keywordscompositional distributional semanticsen
dc.subject.keywordssemantic textual similarityen
dc.subject.keywordsword embeddingsen
dc.subject.keywordssong lyricsen
dc.titleTest-driving information theory-based compositional distributional semantics: A case study on Spanish song lyricsen
dc.typeartículoes
dc.typejournal articleen
dspace.entity.typePublication
person.familyNameGhajari Espinosa
person.familyNameBenito Santos
person.familyNameRos Muñoz
person.familyNameFresno Fernández
person.givenNameAdrián
person.givenNameAlejandro
person.givenNameSalvador
person.givenNameVíctor Diego
person.identifier.orcid0000-0001-5317-6390
person.identifier.orcid0000-0001-6330-4958
person.identifier.orcid0000-0003-4270-2628
relation.isAuthorOfPublicationdb5da577-2d78-45c3-9733-47368503a59c
relation.isAuthorOfPublicationc2a07fe0-c0d7-4a21-bdb8-e7d547e5b78b
relation.isAuthorOfPublicationd25ad74f-42fc-47ac-911d-1e5515319a58
relation.isAuthorOfPublication80cd3492-0ff8-4c8e-a904-2858623c7fc1
relation.isAuthorOfPublication.latestForDiscoverydb5da577-2d78-45c3-9733-47368503a59c
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Benito Santos_ Alejandro_Test-drivingInformati_ALEJANDRO BENITO SAN.pdf
Tamaño:
4.53 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: