Normalization Techniques For Improving The Performance Of Knowledge Graph Creation Pipelines

Download statistics - Document (COUNTER):

Torabinejad, Mohammad: Normalization Techniques For Improving The Performance Of Knowledge Graph Creation Pipelines. Hannover : Gottfried Wilhelm Leibniz Universität Hannover, Master-Thesis, 2020, X, 61 S. DOI:

Selected time period:


Sum total of downloads: 284

With the rapid growth of data within the web, demands on discovering information within data and consecutively exploiting knowledge graphs rise much more than we think it does. Data integration systems can be of great help to meet this precious demand in that they offer transformation of data from various sources and with different volumes. To this end, a data integration system takes advantage of utilizing mapping rules-- specified in a language like RML -- to integrate data collected from various data sources into a knowledge graph. However, large data sources may suffer from various data quality issues, being redundant one of them. Regarding this, the Semantic Web community contributes to Knowledge Engineering with techniques to create a knowledge graph efficiently. The thesis reported in this document tackles creating knowledge graphs in the presence of data sources with redundant data, and a novel normalization theory is proposed to solve this problem. This theory covers not only the characteristics of the data sources but also mapping rules used to integrate the data sources into a knowledge graph. Based on this, three normal forms are proposed and an algorithm for transforming mapping rules and data sources into these normal forms. The proposed approach's performance is evaluated in different testbeds composed of real-world data and synthetic data. The observed results suggest that the proposed techniques can dramatically reduce the execution time of knowledge graph creation. Therefore, this thesis's normalization theory contributes to the repertoire of tools that facilitate the creation of knowledge graphs at scale.
License of this version: Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.
Document Type: masterThesis
Publishing status: publishedVersion
Issue Date: 2020
Appears in Collections:Fakultät für Elektrotechnik und Informatik

distribution of downloads over the selected time period:

downloads by country:

pos. country downloads
total perc.
1 image of flag of Germany Germany 156 54.93%
2 image of flag of United States United States 45 15.85%
3 image of flag of Spain Spain 13 4.58%
4 image of flag of No geo information available No geo information available 8 2.82%
5 image of flag of Norway Norway 7 2.46%
6 image of flag of Korea, Republic of Korea, Republic of 6 2.11%
7 image of flag of India India 5 1.76%
8 image of flag of China China 5 1.76%
9 image of flag of France France 4 1.41%
10 image of flag of Austria Austria 4 1.41%
    other countries 31 10.92%

Further download figures and rankings:


Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository