Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Zur Kurzanzeige

dc.identifier.uri http://dx.doi.org/10.15488/16524
dc.identifier.uri https://www.repo.uni-hannover.de/handle/123456789/16651
dc.contributor.author Rabby, Gollam
dc.contributor.author D’Souza, Jennifer
dc.contributor.author Oelen, Allard
dc.contributor.author Dvorackova, Lucie
dc.contributor.author Svátek, Vojtěch
dc.contributor.author Auer, Sören
dc.date.accessioned 2024-03-08T08:49:21Z
dc.date.available 2024-03-08T08:49:21Z
dc.date.issued 2023
dc.identifier.citation Rabby, G.; D’Souza, J.; Oelen, A.; Dvorackova, L.; Svátek, V. et al.: Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. In: Journal of Biomedical Semantics 14 (2023), 18. DOI: https://doi.org/10.1186/s13326-023-00298-4
dc.description.abstract Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data. eng
dc.language.iso eng
dc.publisher London : BioMed Central
dc.relation.ispartofseries Journal of Biomedical Semantics 14 (2023)
dc.rights CC BY 4.0 Unported
dc.rights.uri https://creativecommons.org/licenses/by/4.0
dc.subject COVID-19 eng
dc.subject Domain-independent knowledge graph eng
dc.subject Influential scholarly document prediction eng
dc.subject Machine learning algorithms eng
dc.subject Text mining eng
dc.subject World health organization eng
dc.subject.ddc 570 | Biowissenschaften, Biologie
dc.subject.ddc 610 | Medizin, Gesundheit
dc.title Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph eng
dc.type Article
dc.type Text
dc.relation.essn 2041-1480
dc.relation.doi https://doi.org/10.1186/s13326-023-00298-4
dc.bibliographicCitation.volume 14
dc.bibliographicCitation.firstPage 18
dc.description.version publishedVersion
tib.accessRights frei zug�nglich


Die Publikation erscheint in Sammlung(en):

Zur Kurzanzeige

 

Suche im Repositorium


Durchblättern

Mein Nutzer/innenkonto

Nutzungsstatistiken