Crowdsourcing for web genre annotation

Download statistics - Document (COUNTER):

Asheghi, N.R.; Sharoff, S.; Markert, K.: Crowdsourcing for web genre annotation. In: Language Resources and Evaluation 50 (2016), Nr. 3, S. 603-641. DOI:

Repository version

To cite the version in the repository, please use this identifier:

Selected time period:


Sum total of downloads: 617

Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or exhibit low inter-coder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the schemes to annotators outside these small expert groups. In this paper, we tackle these problems by using crowd-sourcing for genre annotation, leading to the Leeds Web Genre Corpus—the first web corpus which is, demonstrably reliably annotated for genre and which can be easily and cost-effectively expanded using naive annotators. We also show that the corpus is source and topic diverse. © 2016, The Author(s).
License of this version: CC BY 4.0 Unported
Document Type: article
Publishing status: publishedVersion
Issue Date: 2016
Appears in Collections:Forschungszentren

distribution of downloads over the selected time period:

downloads by country:

pos. country downloads
total perc.
1 image of flag of Germany Germany 136 22.04%
2 image of flag of France France 99 16.05%
3 image of flag of United States United States 83 13.45%
4 image of flag of No geo information available No geo information available 52 8.43%
5 image of flag of Russian Federation Russian Federation 35 5.67%
6 image of flag of United Kingdom United Kingdom 32 5.19%
7 image of flag of Ukraine Ukraine 29 4.70%
8 image of flag of China China 22 3.57%
9 image of flag of Czech Republic Czech Republic 15 2.43%
10 image of flag of Netherlands Netherlands 14 2.27%
    other countries 100 16.21%

Further download figures and rankings:


Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository