Enriching and validating geographic information on the web

Tempelmeier, Nicolas

dc.identifier.uri	http://dx.doi.org/10.15488/12283
dc.identifier.uri	https://www.repo.uni-hannover.de/handle/123456789/12381
dc.contributor.author	Tempelmeier, Nicolas	eng
dc.date.accessioned	2022-06-23T10:33:05Z
dc.date.available	2022-06-23T10:33:05Z
dc.date.issued	2022
dc.identifier.citation	Tempelmeier, Nicolas: Enriching and validating geographic information on the web. Hannover : Gottfried Wilhelm Leibniz Universität. Diss., 2022, xvii, 185 S. DOI: https://doi.org/10.15488/12283	eng
dc.description.abstract	The continuous growth of available data on the World Wide Web has led to an unprecedented amount of available information. However, the enormous variance in data quality and trustworthiness of information sources impairs the great potential of the large amount of vacant information. This observation especially applies to geographic information on the Web, i.e., information describing entities that are located on the Earth’s surface. With the advent of mobile devices, the impact of geographic Web information on our everyday life has substantially grown. The mobile devices have also enabled the creation of novel data sources such as OpenStreetMap (OSM), a collaborative crowd-sourced map providing open cartographic information. Today, we use geographic information in many applications, including routing, location recommendation, or geographic question answering. The processing of geographic Web information yields unique challenges. First, the descriptions of geographic entities on the Web are typically not validated. Since not all Web information sources are trustworthy, the correctness of some geographic Web entities is questionable. Second, geographic information sources on the Web are typically isolated from each other. The missing integration of information sources hinders the efficient use of geographic Web information for many applications. Third, the description of geographic entities is typically incomplete. Depending on the application, missing information is a decisive criterion for (not) using a particular data source. Due to the large scale of the Web, the manual correction of these problems is usually not feasible such that automated approaches are required. In this thesis, we tackle these challenges from three different angles. (i) Validation of geographic Web information: We validate geographic Web information by detecting vandalism in OpenStreetMap, for instance, the replacement of a street name with advertisement. To this end, we present the OVID model for automated vandalism detection in OpenStreetMap. (ii) Enrichment of geographic Web information through integration: We integrate OpenStreetMap with other geographic Web information sources, namely knowledge graphs, by identifying entries corresponding to the same world real-world entities in both data sources. We present the OSM2KG model for automated identity link discovery between OSM and knowledge graphs. (iii) Enrichment of missing information in geographic Web information: We consider semantic annotations of geographic entities on Web pages as an additional data source. We exploit existing annotations of categorical properties of Web entities as training data to enrich missing categorical properties in geographic Web information. For all of the proposed models, we conduct extensive evaluations on real-world datasets. Our experimental results confirm that the proposed solutions reliably outperform existing baselines. Furthermore, we demonstrate the utility of geographic Web Information in two application scenarios. (i) Corpus of geographic entity embeddings: We introduce the GeoVectors corpus, a linked open dataset of ready-to-use embeddings of geographic entities. With GeoVectors, we substantially lower the burden to use geographic data in machine learning applications. (ii) Application to event impact prediction: We employ several geographic Web information sources to predict the impact of public events on road traffic. To this end, we use cartographic, event, and event venue information from the Web.	eng
dc.description.abstract	Durch die kontinuierliche Zunahme verfügbarer Daten im World Wide Web, besteht heute eine noch nie da gewesene Menge verfügbarer Informationen. Das große Potential dieser Daten wird jedoch durch hohe Schwankungen in der Datenqualität und in der Vertrauenswürdigkeit der Datenquellen geschmälert. Dies kann vor allem am Beispiel von geografischen Web-Informationen beobachtet werden. Geografische Web-Informationen sind Informationen über Entitäten, die über Koordinaten auf der Erdoberfläche verfügen. Die Relevanz von geografischen Web-Informationen für den Alltag ist durch die Verbreitung von internetfähigen, mobilen Endgeräten, zum Beispiel Smartphones, extrem gestiegen. Weiterhin hat die Verfügbarkeit der mobilen Endgeräte auch zur Erstellung neuartiger Datenquellen wie OpenStreetMap (OSM) geführt. OSM ist eine offene, kollaborative Webkarte, die von Freiwilligen dezentral erstellt wird. Mittlerweile ist die Nutzung geografischer Informationen die Grundlage für eine Vielzahl von Anwendungen, wie zum Beispiel Navigation, Reiseempfehlungen oder geografische Frage-Antwort-Systeme. Bei der Verarbeitung geografischer Web-Informationen müssen einzigartige Herausforderungen berücksichtigt werden. Erstens werden die Beschreibungen geografischer Web-Entitäten typischerweise nicht validiert. Da nicht alle Informationsquellen im Web vertrauenswürdig sind, ist die Korrektheit der Darstellung mancher Web-Entitäten fragwürdig. Zweitens sind Informationsquellen im Web oft voneinander isoliert. Die fehlende Integration von Informationsquellen erschwert die effektive Nutzung von geografischen Web-Information in vielen Anwendungsfällen. Drittens sind die Beschreibungen von geografischen Entitäten typischerweise unvollständig. Je nach Anwendung kann das Fehlen von bestimmten Informationen ein entscheidendes Kriterium für die Nutzung einer Datenquelle sein. Da die Größe des Webs eine manuelle Behebung dieser Probleme nicht zulässt, sind automatisierte Verfahren notwendig. In dieser Arbeit nähern wir uns diesen Herausforderungen von drei verschiedenen Richtungen. (i) Validierung von geografischen Web-Informationen: Wir validieren geografische Web-Informationen, indem wir Vandalismus in OpenStreetMap identifizieren, zum Beispiel das Ersetzen von Straßennamen mit Werbetexten. (ii) Anreicherung von geografischen Web-Information durch Integration: Wir integrieren OpenStreetMap mit anderen Informationsquellen im Web (Wissensgraphen), indem wir Einträge in beiden Informationsquellen identifizieren, die den gleichen Echtwelt-Entitäten entsprechen. (iii) Anreicherung von fehlenden geografischen Informationen: Wir nutzen semantische Annotationen von geografischen Entitäten auf Webseiten als weitere Datenquelle. Wir nutzen existierende Annotationen kategorischer Attribute von Web-Entitäten als Trainingsdaten, um fehlende kategorische Attribute in geografischen Web-Informationen zu ergänzen. Wir führen ausführliche Evaluationen für alle beschriebenen Modelle durch. Die vorgestellten Lösungsansätze erzielen verlässlich bessere Ergebnisse als existierende Ansätze. Weiterhin demonstrieren wir den Nutzen von geografischen Web-Informationen in zwei Anwendungsszenarien. (i) Korpus mit Embeddings von geografischen Entitäten: Wir stellen den GeoVectors-Korpus vor, einen verlinkten, offenen Datensatz mit direkt nutzbaren Embeddings von geografischen Web-Entitäten. Der GeoVectors-Korpus erleichtert die Nutzung von geografischen Daten in Anwendungen von maschinellen Lernen erheblich. (ii) Anwendung zur Prognose von Veranstaltungsauswirkungen: Wir nutzen Karten-, Veranstaltungs- und Veranstaltungsstätten-Daten aus dem Web, um die Auswirkungen von Veranstaltungen auf den Straßenverkehr zu prognostizieren.	ger
dc.language.iso	ger	eng
dc.publisher	Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
dc.rights	CC BY 3.0 DE	eng
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/de/	eng
dc.subject	geographic data	eng
dc.subject	Web data	eng
dc.subject	Linked Open Data	eng
dc.subject	spatio-temporal machine learning	eng
dc.subject	Geografische Daten	ger
dc.subject	Web Daten	ger
dc.subject	Linked Open Data	ger
dc.subject	räumlich-zeitliches maschinelles Lernen	ger
dc.subject.ddc	004 \| Informatik	eng
dc.title	Enriching and validating geographic information on the web	eng
dc.type	DoctoralThesis	eng
dc.type	Text	eng
dcterms.extent	xvii, 185 S.
dc.description.version	publishedVersion	eng
tib.accessRights	frei zug�nglich	eng

Name: Tempelmeier_Nicol ...

Größe: 16.26Mb

Format: PDF

Öffnen

Die Publikation erscheint in Sammlung(en):

Fakultät für Elektrotechnik und Informatik
Frei zugängliche Publikationen aus der Fakultät für Elektrotechnik und Informatik
Dissertationen
Dissertationsschriften der Leibniz Universität Hannover

Enriching and validating geographic information on the web

Die Publikation erscheint in Sammlung(en):

Suche im Repositorium

Durchblättern

Gesamter Bestand

Diese Sammlung

Mein Nutzer/innenkonto

Nutzungsstatistiken