Entity Linking for the Biomedical Domain

Download statistics - Document (COUNTER):

Nassimi, Sahar: Entity Linking for the Biomedical Domain. Hannover : Gottfried Wilhelm Leibniz Universität, Master Thesis, 2023, X, 86 S. DOI: https://.doi.org/10.15488/13414

Selected time period:


Sum total of downloads: 335

Entity linking is the process of detecting mentions of different concepts in text documents and linking them to canonical entities in a target lexicon. However, one of the biggest issues in entity linking is the ambiguity in entity names. The ambiguity is an issue that many text mining tools have yet to address since different names can represent the same thing and every mention could indicate a different thing. For instance, search engines that rely on heuristic string matches frequently return irrelevant results, because they are unable to satisfactorily resolve ambiguity. Thus, resolving named entity ambiguity is a crucial step in entity linking. To solve the problem of ambiguity, this work proposes a heuristic method for entity recognition and entity linking over the biomedical knowledge graph concerning the semantic similarity of entities in the knowledge graph. Named entity recognition (NER), relation extraction (RE), and relationship linking make up a conventional entity linking (EL) system pipeline (RL). We have used the accuracy metric in this thesis.Therefore, for each identified relation or entity, the solution comprises identifying the correct one and matching it to its corresponding unique CUI in the knowledge base. Because KBs contain a substantial number of relations and entities, each with only one natural language label, the second phase is directly dependent on the accuracy of the first. The framework developed in this thesis enables the extraction of relations and entities from the text and their mapping to the associated CUI in the UMLS knowledge base. This approach derives a new representation of the knowledge base that lends it to the easy comparison. Our idea to select the best candidates is to build a graph of relations and determine the shortest path distance using a ranking approach.We test our suggested approach on two well-known benchmarks in the biomedical field and show that our method exceeds the search engine's top result and provides us with around 4% more accuracy. In general, when it comes to fine-tuning, we notice that entity linking contains subjective characteristics and modifications may be required depending on the task at hand. The performance of the framework is evaluated based on a Python implementation.
License of this version: Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.
Document Type: MasterThesis
Publishing status: publishedVersion
Issue Date: 2023-02-10
Appears in Collections:Fakultät für Elektrotechnik und Informatik

distribution of downloads over the selected time period:

downloads by country:

pos. country downloads
total perc.
1 image of flag of Germany Germany 104 31.04%
2 image of flag of United States United States 73 21.79%
3 image of flag of France France 19 5.67%
4 image of flag of Canada Canada 14 4.18%
5 image of flag of Ireland Ireland 12 3.58%
6 image of flag of No geo information available No geo information available 11 3.28%
7 image of flag of India India 11 3.28%
8 image of flag of Netherlands Netherlands 7 2.09%
9 image of flag of Italy Italy 6 1.79%
10 image of flag of Croatia Croatia 6 1.79%
    other countries 72 21.49%

Further download figures and rankings:


Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository