Persona: Martínez Romo, Juan
Cargando...
Dirección de correo electrónico
ORCID
0000-0002-6905-7051
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Martínez Romo
Nombre de pila
Juan
Nombre
23 resultados
Resultados de la búsqueda
Mostrando 1 - 10 de 23
Publicación Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction(Elsevier, 2023-02) Fabregat Marcos, Hermenegildo; Duque Fernández, Andrés; Martínez Romo, Juan; Araujo Serna, M. LourdesBackground and Objectives: Named Entity Recognition (NER) and Relation Extraction (RE) are two of the most studied tasks in biomedical Natural Language Processing (NLP). The detection of specific terms and entities and the relationships between them are key aspects for the development of more complex automatic systems in the biomedical field. In this work, we explore transfer learning techniques for incorporating information about negation into systems performing NER and RE. The main purpose of this research is to analyse to what extent the successful detection of negated entities in separate tasks helps in the detection of biomedical entities and their relationships. Methods: Three neural architectures are proposed in this work, all of them mainly based on Bidirectional Long Short-Term Memory (Bi-LSTM) networks and Conditional Random Fields (CRFs). While the first architecture is devoted to detecting triggers and scopes of negated entities in any domain, two specific models are developed for performing isolated NER tasks and joint NER and RE tasks in the biomedical domain. Then, weights related to negation detection learned by the first architecture are incorporated into those last models. Two different languages, Spanish and English, are taken into account in the experiments. Results: Performance of the biomedical models is analysed both when the weights of the neural networks are randomly initialized, and when weights from the negation detection model are incorporated into them. Improvements of around 3.5% of F-Measure in the English language and more than 7% in the Spanish language are achieved in the NER task, while the NER+RE task increases F-Measure scores by more than 13% for the NER submodel and around 2% for the RE submodel. Conclusions: The obtained results allow us to conclude that negation-based transfer learning techniques are appropriate for performing biomedical NER and RE tasks. These results highlight the importance of detecting negation for improving the identification of biomedical entities and their relationships. The explored echniques show robustness by maintaining consistent results and improvements across different tasks and languages.Publicación Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?(Elsevier, 2019) Martinez-Rico, Juan R.; Martínez Romo, Juan; Araujo Serna, M. LourdesWikipedia is a free encyclopedia created as an international collaborative project. One of its peculiarities is that any user can edit its contents almost without restrictions, what has given rise to a phenomenon known as vandalism. Vandalism is any attempt that seeks to damage the integrity of the encyclopedia deliberately. To address this problem, in recent years several automatic detection systems and associated features have been developed. This work implements one of these systems, which uses three sets of new features based on different techniques. Specifically we study the applicability of a leading technology as deep learning to the problem of vandalism detection. The first set is obtained by expanding a list of vandal terms taking advantage of the existing semantic-similarity relations in word embeddings and deep neural networks. Deep learning techniques are applied to the second set of features, specifically Stacked Denoising Autoencoders (SDA), in order to reduce the dimensionality of a bag of words model obtained from a set of edits taken from Wikipedia. The last set uses graph-based ranking algorithms to generate a list of vandal terms from a vandalism corpus extracted from Wikipedia. These three sets of new features are evaluated separately as well as together to study their complementarity, improving the results in the state of the art. The system evaluation has been carried out on a corpus extracted from Wikipedia (WP_Vandal) as well as on another called PAN-WVC-2010 that was used in a vandalism detection competition held at CLEF conference.Publicación Detección de Indicios de Autolesiones No Suicidas en Informes Médicos de Psiquiatría Mediante el Análisis del Lenguaje(Sociedad Española para el Procesamiento del Lenguaje Natural, 2022) Reneses, Blanca; Sevilla-Llewellyn-Jones, Julia; Martínez-Capella, Ignacio; Seara-Aguilar, Germán; Martínez Romo, Juan; Araujo Serna, M. LourdesLa autolesión no suicida, a menudo denominada autolesión, es el acto de dañarse deliberadamente el propio cuerpo, como cortarse o quemarse. Normalmente, no pretende ser un intento de suicidio. En este trabajo se presenta un sistema de detección de indicios de autolesiones no suicidas, basado en el análisis del lenguaje, sobre un conjunto anotado de informes médicos obtenidos del servicio de psiquiatría de un Hospital público madrileño. Tanto la explicabilidad como la precisión a la hora de predecir los casos positivos, son los dos principales objetivos de este trabajo. Para lograr este fin se han desarrollado dos sistemas supervisados de diferente naturaleza. Por un lado se ha llevado a cabo un proceso de extracción de diferentes rasgos centrados en el propio mundo de las autolesiones mediante técnicas de procesamiento del lenguaje natural para alimentar posteriormente un clasificador tradicional. Por otro lado, se ha implementado un sistema de aprendizaje profundo basado en varias capas de redes neuronales convolucionales, debido a su gran desempeño en tareas de clasificación de textos. El resultado es el funcionamiento de dos sistemas supervisados con un gran rendimiento, en donde destacamos el sistema basado en un clasificador tradicional debido a su mejor predicción de clases positivas y la mayor facilidad de cara a explicar sus resultados a los profesionales sanitarios.Publicación Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction(Elsevier, 2023-02) Fabregat Marcos, Hermenegildo; Duque Fernández, Andrés; Martínez Romo, Juan; Araujo Serna, M. LourdesBackground and Objectives: Named Entity Recognition (NER) and Relation Extraction (RE) are two of the most studied tasks in biomedical Natural Language Processing (NLP). The detection of specific terms and entities and the relationships between them are key aspects for the development of more complex automatic systems in the biomedical field. In this work, we explore transfer learning techniques for incorporating information about negation into systems performing NER and RE. The main purpose of this research is to analyse to what extent the successful detection of negated entities in separate tasks helps in the detection of biomedical entities and their relationships. Methods: Three neural architectures are proposed in this work, all of them mainly based on Bidirectional Long Short-Term Memory (Bi-LSTM) networks and Conditional Random Fields (CRFs). While the first architecture is devoted to detecting triggers and scopes of negated entities in any domain, two specific models are developed for performing isolated NER tasks and joint NER and RE tasks in the biomedical domain. Then, weights related to negation detection learned by the first architecture are incorporated into those last models. Two different languages, Spanish and English, are taken into account in the experiments. Results: Performance of the biomedical models is analysed both when the weights of the neural networks are randomly initialized, and when weights from the negation detection model are incorporated into them. Improvements of around 3.5% of F-Measure in the English language and more than 7% in the Spanish language are achieved in the NER task, while the NER+RE task increases F-Measure scores by more than 13% for the NER submodel and around 2% for the RE submodel. Conclusions: The obtained results allow us to conclude that negation-based transfer learning techniques are appropriate for performing biomedical NER and RE tasks. These results highlight the importance of detecting negation for improving the identification of biomedical entities and their relationships. The explored echniques show robustness by maintaining consistent results and improvements across different tasks and languages.Publicación Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?(Elsevier, 2019) Martinez-Rico, Juan R.; Martínez Romo, Juan; Araujo Serna, M. LourdesWikipedia is a free encyclopedia created as an international collaborative project. One of its peculiarities is that any user can edit its contents almost without restrictions, what has given rise to a phenomenon known as vandalism. Vandalism is any attempt that seeks to damage the integrity of the encyclopedia deliberately. To address this problem, in recent years several automatic detection systems and associated features have been developed. This work implements one of these systems, which uses three sets of new features based on different techniques. Specifically we study the applicability of a leading technology as deep learning to the problem of vandalism detection. The first set is obtained by expanding a list of vandal terms taking advantage of the existing semantic-similarity relations in word embeddings and deep neural networks. Deep learning techniques are applied to the second set of features, specifically Stacked Denoising Autoencoders (SDA), in order to reduce the dimensionality of a bag of words model obtained from a set of edits taken from Wikipedia. The last set uses graph-based ranking algorithms to generate a list of vandal terms from a vandalism corpus extracted from Wikipedia. These three sets of new features are evaluated separately as well as together to study their complementarity, improving the results in the state of the art. The system evaluation has been carried out on a corpus extracted from Wikipedia (WP_Vandal) as well as on another called PAN-WVC-2010 that was used in a vandalism detection competition held at CLEF conference.Publicación Deep-Learning Approach to Educational Text Mining and Application to the Analysis of Topics’ Difficulty(Institute of Electrical and Electronics Engineers, 2020-12-02) Araujo Serna, M. Lourdes; López Ostenero, Fernando; Martínez Romo, Juan; Plaza Morales, LauraLearning analytics has emerged as a promising tool for optimizing the learning experience and results, especially in online educational environments. An important challenge in this area is identifying the most difficult topics for students in a subject, which is of great use to improve the quality of teaching by devoting more effort to those topics of greater difficulty, assigning them more time, resources and materials. We have approached the problem by means of natural language processing techniques. In particular, we propose a solution based on a deep learning model that automatically extracts the main topics that are covered in educational documents. This model is next applied to the problem of identifying the most difficult topics for students in a subject related to the study of algorithms and data structures in a Computer Science degree. Our results show that our topic identification model presents very high accuracy (around 90 percent) and may be efficiently used in learning analytics applications, such as the identification and understanding of what makes the learning of a subject difficult. An exhaustive analysis of the case study has also revealed that there are indeed topics that are consistently more difficult for most students, and also that the perception of difficulty in students and teachers does not always coincide with the actual difficulty indicated by the data, preventing to pay adequate attention to the most challenging topics.Publicación A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports(Elsevier, 2021) Fabregat Marcos, Hermenegildo; Duque Fernández, Andrés; Araujo Serna, M. Lourdes; Martínez Romo, JuanBackground and objectives: The 10th version of International Classification of Diseases (ICD-10) codification system has been widely adopted by the health systems of many countries, including Spain. However, manual code assignment of Electronic Health Records (EHR) is a complex and time-consuming task that requires a great amount of specialised human resources. Therefore, several machine learning approaches are being proposed to assist in the assignment task. In this work we present an alternative system for automatically recommending ICD-10 codes to be assigned to EHRs. Methods: Our proposal is based on characterising ICD-10 codes by a set of keyphrases that represent them. These keyphrases do not only include those that have literally appeared in some EHR with the considered ICD-10 codes assigned, but also others that have been obtained by a statistical process able to capture expressions that have led the annotators to assign the code. Results: The result is an information model that allows to efficiently recommend codes to a new EHR based on their textual content. We explore an approach that proves to be competitive with other state-of-the-art approaches and can be combined with them to optimise results. Conclusions: In addition to its effectiveness, the recommendations of this method are easily interpretable since the phrases in an EHR leading to recommend an ICD-10 code are known. Moreover, the keyphrases associated with each ICD-10 code can be a valuable additional source of information for other approaches, such as machine learning techniques.Publicación Generation of social network user profiles and their relationship with suicidal behaviour(Sociedad Española para el Procesamiento del Lenguaje Natural, 2024) Fernández Hernández, Jorge; Araujo Serna, M. Lourdes; Martínez Romo, JuanActualmente el suicidio es una de las principales causas de muerte en el mundo, por lo que poder caracterizar a personas con esta tendencia puede ayudar a prevenir posibles intentos de suicidio. En este trabajo se ha recopilado un corpus, llamado SuicidAttempt en español compuesto por usuarios con o sin menciones explícitas de intentos de suicidio, usando la aplicación de mensajería Telegram. Para cada uno de los usuarios se han anotado distintos rasgos demográficos de manera semi-automática mediante el empleo de distintos sistemas, en unos casos supervisados y en otros no supervisados. Por último se han analizado estos rasgos recogidos, junto con otros lingüísticos extraídos de los mensajes de los usuarios, para intentar caracterizar distintos grupos en base a su relación con el comportamiento suicida. Los resultados sugieren que la detección de estos rasgos demográficos y psicolingüísticos permiten caracterizar determinados grupos de riesgo y conocer en profundidad los perfiles que realizan dichos actos.Publicación Semi‑supervised incremental learning with few examples for discovering medical association rules(BioMed Central, 2022) Sánchez‑de‑Madariaga, Ricardo; Cantero Escribano, José Miguel; Martínez Romo, Juan; Araujo Serna, M. LourdesBackground: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time‑consuming. The purpose of this research is to design a new semi‑supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods: In this work we propose a new semi‑supervised data mining model that combines unsupervised techniques (Fisher’s exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F‑measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results: The new semi‑supervised ML algorithm improves the results of supervised algorithms computed using the F‑measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.Publicación Discovering related scientific literature beyond semantic similarity: a new co-citation approach(Springer, 2019-05-17) Rodríguez Prieto, Oscar; Araujo Serna, M. Lourdes; Martínez Romo, JuanWe propose a new approach to recommend scientific literature, a domain in which the efficient organization and search of information is crucial. The proposed system relies on the hypothesis that two scientific articles are semantically related if they are co-cited more frequently than they would be by pure chance. This relationship can be quantified by the probability of co-citation, obtained from a null model that statistically defines what we consider pure chance. Looking for article pairs that minimize this probability, the system is able to recommend a ranking of articles in response to a given article. This system is included in the co-occurrence paradigm of the field. More specifically, it is based on co-cites so it can produce recommendations more focused on relatedness than on similarity. Evaluation has been performed on the ACL Anthology collection and on the DBLP dataset, and a new corpus has been compiled to evaluate the capacity of the proposal to find relationships beyond similarity. Results show that the system is able to provide, not only articles similar to the submitted one, but also articles presenting other kind of relations, thus providing diversity, i.e. connections to new topics.
- «
- 1 (current)
- 2
- 3
- »