Nassimi, Sahar: Entity Linking for the Biomedical Domain. Hannover : Gottfried Wilhelm Leibniz Universität, Master Thesis, 2023, X, 86 S. DOI: https://.doi.org/10.15488/13414
Abstract: | |
Entity linking is the process of detecting mentions of different concepts in text documents and linking them to canonical entities in a target lexicon. However, one of the biggest issues in entity linking is the ambiguity in entity names. The ambiguity is an issue that many text mining tools have yet to address since different names can represent the same thing and every mention could indicate a different thing. For instance, search engines that rely on heuristic string matches frequently return irrelevant results, because they are unable to satisfactorily resolve ambiguity. Thus, resolving named entity ambiguity is a crucial step in entity linking. To solve the problem of ambiguity, this work proposes a heuristic method for entity recognition and entity linking over the biomedical knowledge graph concerning the semantic similarity of entities in the knowledge graph. Named entity recognition (NER), relation extraction (RE), and relationship linking make up a conventional entity linking (EL) system pipeline (RL). We have used the accuracy metric in this thesis.Therefore, for each identified relation or entity, the solution comprises identifying the correct one and matching it to its corresponding unique CUI in the knowledge base. Because KBs contain a substantial number of relations and entities, each with only one natural language label, the second phase is directly dependent on the accuracy of the first. The framework developed in this thesis enables the extraction of relations and entities from the text and their mapping to the associated CUI in the UMLS knowledge base. This approach derives a new representation of the knowledge base that lends it to the easy comparison. Our idea to select the best candidates is to build a graph of relations and determine the shortest path distance using a ranking approach.We test our suggested approach on two well-known benchmarks in the biomedical field and show that our method exceeds the search engine's top result and provides us with around 4% more accuracy. In general, when it comes to fine-tuning, we notice that entity linking contains subjective characteristics and modifications may be required depending on the task at hand. The performance of the framework is evaluated based on a Python implementation. | |
License of this version: | Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden. |
Document Type: | MasterThesis |
Publishing status: | publishedVersion |
Issue Date: | 2023-02-10 |
Appears in Collections: | Fakultät für Elektrotechnik und Informatik |
pos. | country | downloads | ||
---|---|---|---|---|
total | perc. | |||
1 | ![]() |
Germany | 104 | 31.04% |
2 | ![]() |
United States | 73 | 21.79% |
3 | ![]() |
France | 19 | 5.67% |
4 | ![]() |
Canada | 14 | 4.18% |
5 | ![]() |
Ireland | 12 | 3.58% |
6 | ![]() |
No geo information available | 11 | 3.28% |
7 | ![]() |
India | 11 | 3.28% |
8 | ![]() |
Netherlands | 7 | 2.09% |
9 | ![]() |
Italy | 6 | 1.79% |
10 | ![]() |
Croatia | 6 | 1.79% |
other countries | 72 | 21.49% |
Hinweis
Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.