Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion

Loading...
Thumbnail Image
Date
2022
Volume
Issue
Journal
Series Titel
Book Title
Publisher
Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Link to publishers version
Abstract

Every year, approximately around 2.5 million new scientific papers are published. With the rapidly growing publication trends, it is increasingly difficult to manually sort through and keep track of the relevant research – a problem that is only more acute in a multidisciplinary setting. The Open Research Knowledge Graph (ORKG) is a next-generation scholarly communication platform that aims to address this issue by making knowledge about scholarly contributions machine-actionable, thus enabling completely new ways of human-machine assistance in comprehending re- search progress. As such, the ORKG is powered by a diverse spectrum of NLP services to assist the expert users in structuring scholarly contributions and searching for the most rele- vant contributions. For a prospective recommendation service, this thesis examines the task of automated ORKG completion as an object extraction task from a given paper Abstract for a query ORKG predicate. As a main contribution of this thesis, automated ORKG completion is formulated as an extractive Question Answering (QA) machine learning objective under an open world assumption. Specifically, the task attempted in this work is fixed-prompt Language Model (LM) tuning (LMT) for few-shot ORKG object prediction formulated as the well-known SQuAD extrac- tive QA objective. Three variants of BERT-based transfomer LMs are evaluated. To support the novel LMT task, this thesis introduces a scholarly QA dataset akin in characteristics to the SQuAD QA dataset generated semi-automatically from the ORKG knowledge base. As a result, the BERT model variants when tested in vanilla setting versus after LMT, show a positive, significant performance uplift for auto-mated ORKG completion as an object completion task. This thesis offers a strong empirical basis for future research aiming at a production-ready automated ORKG completion model.

Description
Keywords
License
CC BY 3.0 DE