Persona: Amigo Cabrera, Enrique
Cargando...
Dirección de correo electrónico
ORCID
0000-0003-1482-824X
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Amigo Cabrera
Nombre de pila
Enrique
Nombre
9 resultados
Resultados de la búsqueda
Mostrando 1 - 9 de 9
Publicación MT Evaluation : human-like vs. human acceptable(2006-07-17) Giménez, Jesús; Màrquez, Lluís; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioWe present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Human Likeness are more reliable for system tuning. Our results also show that current evaluation metrics are not always able to distinguish between automatic and human translations. In order to improve the descriptive power of current metrics we propose the use of additional syntax-based metrics, and metric combinations inside the QARLA Framework.Publicación Automatic Generation of Entity-Oriented Summaries for Reputation Management(Springer, 2020-04-01) Rodríguez Vidal, Javier; Verdejo, Julia; Carrillo de Albornoz Cuadrado, Jorge Amando; Amigo Cabrera, Enrique; Plaza Morales, Laura; Gonzalo Arroyo, Julio AntonioProducing online reputation summaries for an entity (company, brand, etc.) is a focused summarization task with a distinctive feature: issues that may affect the reputation of the entity take priority in the summary. In this paper we (i) present a new test collection of manually created (abstractive and extractive) reputation reports which summarize tweet streams for 31 companies in the banking and automobile domains; (ii) propose a novel methodology to evaluate summaries in the context of online reputation monitoring, which profits from an analogy between reputation reports and the problem of diversity in search; and (iii) provide empirical evidence that producing reputation reports is different from a standard summarization problem, and incorporating priority signals is essential to address the task effectively.Publicación The contribution of linguistic features to automatic machine translation evaluation(2009-08-02) Giménez, Jesús; Verdejo, M. Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioA number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The main reason is that the advantages of employing deeper linguistic information have not been clarified yet. In this work, we propose a novel approach for meta-evaluation of MT evaluation metrics, since correlation cofficient against human judges do not reveal details about the advantages and disadvantages of particular metrics. We then use this approach to investigate the benefits of introducing linguistic features into evaluation metrics. Overall, our experiments show that (i) both lexical and linguistic metrics present complementary advantages and (ii) combining both kinds of metrics yields the most robust metaevaluation performance.Publicación EvALL: Open Access Evaluation for Information Access Systems(Association for Computing Machinery (ACM), 2017) Almagro Cádiz, Mario; Rodríguez Vidal, Javier; Verdejo, M. Felisa; Amigo Cabrera, Enrique; Carrillo de Albornoz Cuadrado, Jorge Amando; Gonzalo Arroyo, Julio AntonioThe EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems that makes results completely comparable and publicly available for the whole research community. For researchers working on a given test collection, the framework allows to: (i) evaluate results in a way compliant with measurement theory and with state-of-the-art evaluation practices in the field; (ii) quantitatively and qualitatively compare their results with the state of the art; (iii) provide their results as reusable data to the scientific community; (iv) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source. For researchers running a challenge (a comparative evaluation campaign on shared data), the framework helps them to manage, store and evaluate submissions, and to preserve ground truth and system output data for future use by the research community. EvALL can be tested at http://evall.uned.es.Publicación Combining evaluation metrics via the unanimous improvement ratio and its application in weps clustering task(Association for the Advancement of Artificial Intelligence, 2011-12-01) Artiles Picón, Javier ; Verdejo, M. Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioMany Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed.Publicación A comparison of extrinsic clustering evaluation metrics based on formal constraints(Springer, 2009-05-11) Artiles Picón, Javier ; Verdejo, M. Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioThere is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.Publicación An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results(Association for Computational Linguistics Note:, 2020-07-01) Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio; Mizzarro, Stefano; Carrillo de Albornoz Cuadrado, Jorge AmandoIn Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.Publicación Information Theory–based Compositional Distributional Semantics(Massachusetts Institute of Technology Press, 2022-12-01) Amigo Cabrera, Enrique; Ariza Casabona, Alejandro; Fresno Fernández, Víctor Diego; Martí, M. Antònia; Agencia Estatal de Investigación (España); European CommissionIn the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an open issue. In this article we define and study the notion of Information Theory–based Compositional Distributional Semantics (ICDS): (i) We first establish formal properties for embedding, composition, and similarity functions based on Shannon’s Information Theory; (ii) we analyze the existing approaches under this prism, checking whether or not they comply with the established desirable properties; (iii) we propose two parameterizable composition and similarity functions that generalize traditional approaches while fulfilling the formal properties; and finally (iv) we perform an empirical study on several textual similarity datasets that include sentences with a high and low lexical overlap, and on the similarity between words and their description. Our theoretical analysis and empirical results show that fulfilling formal properties affects positively the accuracy of text representation models in terms of correspondence (isometry) between the embedding and meaning spaces.Publicación Evaluating Sequence Labeling on the basis of Information Theory(Association for Computational Linguistics, 2025-07-01) Amigo Cabrera, Enrique; Álvarez Mellado, Elena; Carrillo de Albornoz Cuadrado, Jorge Amando; European Commission; Agensia Estatal de Investigación (España)Various metrics exist for evaluating sequence labeling problems (strict span matching, token oriented metrics, token concurrence in sequences, etc.), each of them focusing on certain aspects of the task. In this paper, we define a comprehensive set of formal properties that captures the strengths and weaknesses of the existing metric families and prove that none of them is able to satisfy all properties simultaneously. We argue that it is necessary to measure how much information (correct or noisy) each token in the sequence contributes depending on different aspects such as sequence length, number of tokens annotated by the system, token specificity, etc. On this basis, we introduce the Sequence Labelling Information Contrast Model (SL-ICM), a novel metric based on information theory for evaluating sequence labeling tasks. Our formal analysis and experimentation show that the proposed metric satisfies all properties simultaneously.