Downloadstatistik des Dokuments (Auswertung nach COUNTER):

Leonhardt, Lutz Jurek: Efficient and Explainable Neural Ranking. Hannover : Gottfried Wilhelm Leibniz Universität, Diss., 2023, xii, 165 S., DOI: https://doi.org/10.15488/15769

Zeitraum, für den die Download-Zahlen angezeigt werden:

Jahr: 
Monat: 

Summe der Downloads: 349




Kleine Vorschau
Zusammenfassung: 
The recent availability of increasingly powerful hardware has caused a shift from traditional information retrieval (IR) approaches based on term matching, which remained the state of the art for several decades, to large pre-trained neural language models. These neural rankers achieve substantial improvements in performance, as their complexity and extensive pre-training give them the ability of understanding natural language in a way. As a result, neural rankers go beyond term matching by performing relevance estimation based on the semantics of queries and documents.However, these improvements in performance don't come without sacrifice. In this thesis, we focus on two fundamental challenges of neural ranking models, specifically, ones based on large language models: On the one hand, due to their complexity, the models are inefficient; they require considerable amounts of computational power, which often comes in the form of specialized hardware, such as GPUs or TPUs. Consequently, the carbon footprint is an increasingly important aspect of systems using neural IR. This effect is amplified when low latency is required, as in, for example, web search. On the other hand, neural models are known for being inherently unexplainable; in other words, it is often not comprehensible for humans why a neural model produced a specific output. In general, explainability is deemed important in order to identify undesired behavior, such as bias.We tackle the efficiency challenge of neural rankers by proposing Fast-Forward indexes, which are simple vector forward indexes that heavily utilize pre-computation techniques. Our approach substantially reduces the computational load during query processing, enabling efficient ranking solely on CPUs without requiring hardware acceleration. Furthermore, we introduce BERT-DMN to show that the training efficiency of neural rankers can be improved by training only parts of the model.In order to improve the explainability of neural ranking, we propose the Select-and-Rank paradigm to make ranking models explainable by design: First, a query-dependent subset of the input document is extracted to serve as an explanation; second, the ranking model makes its decision based only on the extracted subset, rather than the complete document. We show that our models exhibit performance similar to models that are not explainable by design and conduct a user study to determine the faithfulness of the explanations.Finally, we introduce BoilerNet, a web content extraction technique that allows the removal of boilerplate from web pages, leaving only the main content in plain text. Our method requires no feature engineering and can be used to aid in the process of creating new document corpora from the web.
Lizenzbestimmungen: CC BY 3.0 DE
Publikationstyp: DoctoralThesis
Publikationsstatus: publishedVersion
Erstveröffentlichung: 2023
Die Publikation erscheint in Sammlung(en):Fakultät für Elektrotechnik und Informatik
Dissertationen

Verteilung der Downloads über den gewählten Zeitraum:

Herkunft der Downloads nach Ländern:

Pos. Land Downloads
Anzahl Proz.
1 image of flag of Netherlands Netherlands 146 41,83%
2 image of flag of Germany Germany 93 26,65%
3 image of flag of United States United States 37 10,60%
4 image of flag of Canada Canada 9 2,58%
5 image of flag of Korea, Republic of Korea, Republic of 8 2,29%
6 image of flag of United Kingdom United Kingdom 8 2,29%
7 image of flag of Russian Federation Russian Federation 7 2,01%
8 image of flag of Taiwan Taiwan 4 1,15%
9 image of flag of India India 4 1,15%
10 image of flag of France France 4 1,15%
    andere 29 8,31%

Weitere Download-Zahlen und Ranglisten:


Hinweis

Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.