Actas y comunicaciones de congresos

URI permanente para esta colección

hdl:123456789/22

Examinar

Mostrando 1 - 19 de 19

A comparison of extrinsic clustering evaluation metrics based on formal constraints
(Springer, 2009-05-11) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.
A data driven approach for person name disambiguation in web search results
(2014-08-23) Víctor Fresno, Víctor; Montalvo, Soto; Delgado Muñoz, Agustín Daniel; Martínez Unanue, Raquel
This paper presents an unsupervised approach for the task of clustering the results of a search engine when the query is a person name shared by different individuals. We propose an algorithm that calculates the number of clusters and establishes the groups of web pages according to the different individuals without the need to any training data or predefined thresholds, as the successful state of the art systems do. In addition, most of those systems do not deal with social media web pages and their performance could fail in a real scenario. In this paper we also propose a heuristic method for the treatment of social networking profiles. Our approach is compared with four gold standard collections for this task obtaining really competitive results, comparable to those obtained by some approaches with supervision.
A simple measure to assess non-response
(2011-06-19) Peñas Padilla, Anselmo; Rodrigo Yuste, Álvaro
There are several tasks where is preferable not responding than responding incorrectly. This idea is not new, but despite several previous attempts there isn’t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed (c@1) has a good balance of discrimination power, stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones, by leaving some questions unanswered. This measure is well suited for tasks such as Reading Comprehension tests, where multiple choices per question are given, but only one is correct.
Analyzing information retrieval methods to recover broken web links
(2011-06-19) Martínez Romo, Juan; Araujo Serna, M. Lourdes
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the page containing the link, and the cache page in some digital library.The selected information is processed and submitted to a search engine. We have compared different information retrievalmethods for both, the selection of terms used to construct the queries submitted to the search engine, and the ranking of the candidate pages that it provides, in order to help the user to find the best replacement. In particular, we have used term frequencies, and a language model approach for the selection of terms; and cooccurrence measures and a language model approach for ranking the final results. To test the different methods, we have also defined a methodology which does not require the user judgments, what increases the objectivity of the results.
Automatic detection of trends in time-stamped sequences : an evolutionary approach
(Springer-Verlag, 2009-01-14) Merelo, Juan Julián; Araujo Serna, M. Lourdes
This paper presents an evolutionary algorithm for modeling the arrival dates in time-stamped data sequences such as newscasts, e-mails, IRC conversations, scientific journal articles or weblog postings. These models are applied to the detection of buzz (i.e. terms that occur with a higher-than-normal frequency) in them, which has attracted a lot of interest in the online world with the increasing number of periodic content producers. That is why in this paper we have used this kind of online sequences to test our system, though it is also valid for other types of event sequences. The algorithm assigns frequencies (number of events per time unit) to time intervals so that it produces an optimal fit to the data. The optimization procedure is a trade off between accurately fitting the data and avoiding too many frequency changes, thus overcoming the noise inherent in these sequences. This process has been traditionally performed using dynamic programming algorithms, which are limited by memory and efficiency requirements. This limitation can be a problem when dealing with long sequences, and suggests the application of alternative search methods with some degree of uncertainty to achieve tractability, such as the evolutionary algorithm proposed in this paper. This algorithm is able to reach the same solution quality as those classical dynamic programming algorithms, but in a shorter time. We also test different cost functions and propose a new one that yields better fits than the one originally proposed by Kleinberg on real-world data. Finally, several distributions of states for the finite state automata are tested, with the result that an uniform distribution produces much better fits than the geometric distribution also proposed by Kleinberg. We also present a variant of the evolutionary algorithm, which achieves a fast fit of a sequence extended with new data, by taking advantage of the fit obtained for the original subsequence.
Combining evaluation metrics via the unanimous improvement ratio and its application in weps clustering task
(Association for the Advancement of Artificial Intelligence, 2011-12-01) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed.
Detecting malicious tweets in trending topics using a statistical analysis of language
(Elsevier, 2013-06-01) Martínez Romo, Juan; Araujo Serna, M. Lourdes
Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-ofthe- art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.
Disentangling categorical relationships through a graph of co-occurrences
(American Physical Society, 2011-10-19) Borge Holthoefer, Javier; Arenas, Alex; Capitán, José A.; Cuesta, José A.; Martínez Romo, Juan; Araujo Serna, M. Lourdes
The mesoscopic structure of complex networks has proven a powerful level of description to understand the linchpins of the system represented by the network. Nevertheless, themapping of a series of relationships between elements, in terms of a graph, is sometimes not straightforward. Given that all the information we would extract using complex network tools depend on this initial graph, it is mandatory to preprocess the data to build it on in the most accurate manner. Here we propose a procedure to build a network, attending only to statistically significant relations between constituents. We use a paradigmatic example of word associations to show the development of our approach. Analyzing the modular structure of the obtained network we are able to disentangle categorical relations, disambiguating words with success that is comparable to the best algorithms designed to the same end.
Evaluating Multilingual Question Answering Systems at CLEF
(2010-05-17) Forner, Pamela; Giampiccolo, Danilo; Magnini, Bernardo; Sutcliffe, Richard; Peñas Padilla, Anselmo; Rodrigo Yuste, Álvaro
The paper offers an overview of the key issues raised during the seven years’ activity of the Multilingual Question Answering Track at the Cross Language Evaluation Forum (CLEF). The general aim of the Multilingual Question Answering Track has been to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages, also drawing attention to a number of challenging issues for research in multilingual QA. The paper gives a brief description of how the task has evolved over the years and of the way in which the data sets have been created, presenting also a brief summary of the different types of questions developed. The document collections adopted in the competitions are sketched as well, and some data about the participation are provided. Moreover, the main evaluation measures used to evaluate system performances are explained and an overall analysis of the results achieved is presented.
Filling knowledge gaps in text for machine reading
(2010-08-22) Hovy, Eduard H.; Peñas Padilla, Anselmo
Texts are replete with gaps, information omitted since authors assume a certain amount of background knowledge. We define the process of enrichment that fills these gaps. We describe how enrichment can be performed using a Background Knowledge Base built from a large corpus. We evaluate the effectiveness of various openly available background knowledge bases and we identify the kind of information necessary for enrichment.
Identifying patterns for unsupervised grammar induction
(2010-07-15) Santamaría, Jesús; Araujo Serna, M. Lourdes
This paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences fromtheWall Street Journal corpus.
Linked data-based conceptual modelling for recommendation : a FCA-based approach
(2014-09-01) Castellanos, A.; Cigarrán Recuero, Juan Manuel; García Serrano, Ana Mª
In a recommendation task it is crucial to have an accurate content-based description of the users and the items consumed by them. Linked Open Data (LOD) has been demonstrated as one of the best ways of obtaining this kind of content, given its huge amount of structured information. The main question is to know how useful the LOD information is in inferring user preferences and how to obtain it. In this context, we propose a novel approach for Content Modelling and Recommendation based on Formal Concept Analysis (FCA). The approach is based in the modelling of the user and content related information, enriched with Linked Open Data, and in a new algorithm, to analyze the models and recommend new content. The framework provided by the ESWC 2014 Recommendation Challenge is used for the evaluation of the proposal. The results are within the average range of other participants, so the suitability of FCA for this scenario seems to be addressed. Nevertheless, further work has to be carried out in order to propose a refined approach for the management of LOD information.
MT Evaluation : human-like vs. human acceptable
(2006-07-17) Giménez, Jesús; Màrquez, Lluís; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
We present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Human Likeness are more reliable for system tuning. Our results also show that current evaluation metrics are not always able to distinguish between automatic and human translations. In order to improve the descriptive power of current metrics we propose the use of additional syntax-based metrics, and metric combinations inside the QARLA Framework.
Structure of morphologically expanded queries : a genetic algorithm approach
(Elsevier, 2009-10-13) Zaragoza, Hugo; Pérez Agüera, José R.; Pérez Iglesias, Joaquín; Araujo Serna, M. Lourdes
In this paper we deal with two issues. First, we discuss the negative effects of term correlation in query expansion algorithms, and we propose a novel and simple method (query clauses) to represent expanded queries which may alleviate some of these negative effects. Second, we discuss a method to optimize local query-expansion methods using genetic algorithms, and we apply this method to improve stemming. We evaluate this method with the novel query representation method and show very significant improvements for the problem of stemming optimization.
Temporally anchored relation extraction
(2012-12-08) Garrido, Guillermo; Cabaleiro, Bernardo; Peñas Padilla, Anselmo; Rodrigo Yuste, Álvaro
Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
The contribution of linguistic features to automatic machine translation evaluation
(2009-08-02) Giménez, Jesús; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
A number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The main reason is that the advantages of employing deeper linguistic information have not been clarified yet. In this work, we propose a novel approach for meta-evaluation of MT evaluation metrics, since correlation cofficient against human judges do not reveal details about the advantages and disadvantages of particular metrics. We then use this approach to investigate the benefits of introducing linguistic features into evaluation metrics. Overall, our experiments show that (i) both lexical and linguistic metrics present complementary advantages and (ii) combining both kinds of metrics yields the most robust metaevaluation performance.
Unsupervised discovery of domain-specific knowledge from text
(2011-06-19) Hovy, Dirk; Zhang, Chunliang; Hovy, Eduard H.; Peñas Padilla, Anselmo
Learning by Reading (LbR) aims at enabling machines to acquire knowledge from and reason about textual input. This requires knowledge about the domain structure (such as entities, classes, and actions) in order to do inference. We present a method to infer this implicit knowledge from unlabeled text. Unlike previous approaches, we use automatically extracted classes with a probability distribution over entities to allow for context-sensitive labeling. From a corpus of 1.4m sentences, we learn about 250k simple propositions about American football in the form of predicateargument structures like “quarterbacks throw passes to receivers”. Using several statistical measures, we show that our model is able to generalize and explain the data statistically significantly better than various baseline approaches. Human subjects judged up to 96.6% of the resulting propositions to be sensible. The classes and probabilistic model can be used in textual enrichment to improve the performance of LbR end-to-end systems.
Web spam detection : new classification features based on qualified link analysis and language models
(Institute of Electrical and Electronics Engineers (IEEE), 2010-09-01) Araujo Serna, M. Lourdes; Martínez Romo, Juan
Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.We consider, for instance, the ability of a search engine to find, using information provided by the page for a given link, the page that the link actually points at. This can be regarded as indicative of the link reliability. We also check the coherence between a page and another one pointed at by any of its links. Two pages linked by a hyperlink should be semantically related, by at least a weak contextual relation. Thus, we apply an LM approach to different sources of information from aWeb page that belongs to the context of a link, in order to provide high-quality indicators of Web spam. We have specifically applied the Kullback–Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. The result is a system that significantly improves the detection of Web spam using fewer features, on two large and public datasets such as WEBSPAM-UK2006 and WEBSPAM-UK2007.
WHAD : Wikipedia historical attributes data. Historical structured data extraction and vandalism detection from the Wikipedia edit history
(Springer Verlag (Germany), 2013-05-28) Alfonseca, Enrique; Garrido, Guillermo; Delort, Jean Yves; Peñas Padilla, Anselmo
This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia we are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are a concern for data quality. We present a study of vandalism identification in Wikipedia edits that uses only features from the infoboxes, and show that we can obtain, on this dataset, an accuracy comparable to a state-of-the-art vandalism identification method that is based on the whole article. Finally, we discuss different characteristics of the extracted dataset, which we make available for further study.

Examinar

Examinando Actas y comunicaciones de congresos por Departamento "Lenguajes y Sistemas Informáticos"

Resultados por página

Opciones de ordenación