Representation and contextualization for document understanding

Tran, Nam Khanh

dc.identifier.uri	http://dx.doi.org/10.15488/4440
dc.identifier.uri	https://www.repo.uni-hannover.de/handle/123456789/4480
dc.contributor.author	Tran, Nam Khanh	ger
dc.date.accessioned	2019-02-13T10:37:53Z
dc.date.available	2019-02-13T10:37:53Z
dc.date.issued	2019
dc.identifier.citation	Tran, Nam Khanh: Representation and contextualization for document understanding. Hannover : Gottfried Wilhelm Leibniz Universität, Diss., 2019, xviii, 130 S. DOI: https://doi.org/10.15488/4440	ger
dc.description.abstract	Document understanding requires discovery of meaningful patterns in text, which in turn involves analyzing documents and extracting useful information for a certain purpose. There is a multitude of problems that need to be dealt with to solve this task. With the goal of improving document understanding, we identify three main problems to study within the scope of this thesis. The first problem is about learning text representation, which is considered as starting point to gain understanding of documents. The representation enables us to build applications around the semantics or meaning of the documents, rather than just around the keywords presented in the texts. The second problem is about acquiring document context. A document cannot be fully understood in isolation since it may refer to knowledge that is not explicitly included in its textual content. To obtain a full understanding of the meaning of the document, that prior knowledge, therefore, has to be retrieved to supplement the text in the document. The last problem we address is about recommending related information to textual documents. When consuming text especially in applications such as e-readers and Web browsers, users often get attracted by the topics or entities appeared in the text. Gaining comprehension of these aspects, therefore, can help users not only further explore those topics but also better understand the text. In this thesis, we tackle the aforementioned problems and propose automated approaches that improve document representation, and suggest relevant as well as missing information for supporting interpretations of documents. To this end, we make the following contributions as part of this thesis: Representation learning - the first contribution is to improve document representation which serves as input to document understanding algorithms. Firstly, we adopt probabilistic methods to represent documents as a mixture of topics and propose a generalizable framework for improving the quality of topics learned from small collections. The proposed method can be well adapted to different application domains. Secondly, we focus on learning the distributed representation of documents. We introduce multiplicative tree-structured Long Short-Term Memory (LSTM) networks which are capable of integrating syntactic and semantic information from text into the standard LSTM architecture for improved representation learning. Finally, we investigate the usefulness of attention mechanism for enhancing distributed representations. In particular, we propose Multihop Attention Networks which can learn effective representations and illustrate its usefulness in the application of question answering. Time-aware contextualization - the second contribution is to formalize the novel and challenging task of time-aware contextualization, where explicit context information is required for bridging the gap between the situation at the time of content creation and the situation at the time of content digestion. To solve this task, we propose a novel approach which automatically formulates queries for retrieving adequate contextualization candidates from an underlying knowledge source such as Wikipedia, and then ranks the candidates using learning-to-rank algorithms. Context-aware entity recommendation - the third contribution is to give assistance to document exploration by recommending related entities to the entities mentioned in the documents. For this purpose, we first introduce the idea of a contextual relatedness of entities and formalize the problem of context-aware entity recommendation. Then, we approach the problem by a statistically sound probabilistic model incorporating temporal and topical context via embedding methods.	ger
dc.language.iso	eng	ger
dc.publisher	Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
dc.rights	Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.	ger
dc.subject	document understanding	eng
dc.subject	representation learning	eng
dc.subject	time-aware contextualization	eng
dc.subject	context-aware entity recommendation	eng
dc.subject	Dokumentverständnis	ger
dc.subject	Lernen von Textrepräsentation	ger
dc.subject	zeitbewusste Kontextualisierung	ger
dc.subject	kontextbewusste Entitätsempfehlung	ger
dc.subject.ddc	004 \| Informatik	ger
dc.title	Representation and contextualization for document understanding	eng
dc.type	DoctoralThesis	ger
dc.type	Text	ger
dcterms.extent	xviii, 130 S.
dc.description.version	publishedVersion	ger
tib.accessRights	frei zug�nglich	ger

Name: NamKhanhTran_Thes ...

Größe: 2.253Mb

Format: PDF

Öffnen

Die Publikation erscheint in Sammlung(en):

Fakultät für Elektrotechnik und Informatik
Frei zugängliche Publikationen aus der Fakultät für Elektrotechnik und Informatik
Dissertationen
Dissertationsschriften der Leibniz Universität Hannover

Representation and contextualization for document understanding

Die Publikation erscheint in Sammlung(en):

Suche im Repositorium

Durchblättern

Gesamter Bestand

Diese Sammlung

Mein Nutzer/innenkonto

Nutzungsstatistiken