Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability

Redi, Miriam; Morgan, Jonathan; Fetahu, Besnik; Taraborelli, Dario

Startseite
→
Forschungseinrichtungen
→
Forschungszentren
→
Dokumentanzeige

dc.identifier.uri	http://dx.doi.org/10.15488/5061
dc.identifier.uri	https://www.repo.uni-hannover.de/handle/123456789/5105
dc.contributor.author	Redi, Miriam
dc.contributor.author	Morgan, Jonathan
dc.contributor.author	Fetahu, Besnik
dc.contributor.author	Taraborelli, Dario
dc.date.accessioned	2019-07-02T07:58:23Z
dc.date.available	2019-07-02T07:58:23Z
dc.date.issued	2019
dc.identifier.citation	Redi, M.; Morgan, J.; Fetahu, B.; Taraborelli, D.: Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability. In: The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019, S. 1567-1578. DOI: https://doi.org/10.1145/3308558.3313618
dc.description.abstract	Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.	eng
dc.language.iso	eng
dc.publisher	New York, NY : Association for Computing Machinery, Inc
dc.relation.ispartofseries	The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019
dc.rights	CC BY 4.0 Unported
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Citations	eng
dc.subject	Crowdsourcing	eng
dc.subject	Neural Networks	eng
dc.subject	Wikipedia	eng
dc.subject	Crowdsourcing	eng
dc.subject	Large dataset	eng
dc.subject	Neural networks	eng
dc.subject	Taxonomies	eng
dc.subject	Algorithmic model	eng
dc.subject	Citations	eng
dc.subject	External sources	eng
dc.subject	Guiding principles	eng
dc.subject	Large-scale dataset	eng
dc.subject	Secondary sources	eng
dc.subject	Wikipedia	eng
dc.subject	Wikipedia articles	eng
dc.subject	Websites	eng
dc.subject.classification	Konferenzschrift	ger
dc.subject.ddc	004 \| Informatik	ger
dc.title	Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability
dc.type	BookPart
dc.type	Text
dc.relation.isbn	978-1-4503-6674-8
dc.relation.doi	https://doi.org/10.1145/3308558.3313618
dc.bibliographicCitation.firstPage	1567
dc.bibliographicCitation.lastPage	1578
dc.description.version	publishedVersion
tib.accessRights	frei zug�nglich