Milchevski, E.; Anand, A.; Michel, S.: The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search. In: Advances in Database Technology - EDBT 2015 Proceedings, S. 253-264
Zusammenfassung: | |
We consider the problem of processing similarity queries over a set of top-k rankings where the query ranking and the similarity threshold are provided at query time. Spearman’s Footrule distance is used to compute the similarity between rankings, considering how well rankings agree on the positions (ranks) of ranked items (i.e., the L1 distance). This setup allows the application of metric index structures such as M- or BK-trees and, alternatively, enables the use of traditional inverted indices for retrieving rankings that overlap (in items) with the query. Although both techniques are reasonable, they come with individual drawbacks for our specific problem. In this paper, we propose a hybrid indexing strategy, which blends inverted indices and metric space indexing, resulting in a structure that resembles both indexing methods with tunable emphasis on one or the other. To find the sweet spot, we propose an assumption-lean but highly accurate (empirically validated) cost model through theoretical analysis. We further present optimizations to the inverted index component, for early termination and minimizing bookkeeping. The performance of the proposed algorithms, hybrid variants, and competitors is studied in a comprehensive evaluation using real-world benchmark data consisting of Web-search–result rankings and entity rankings based on Wikipedia. | |
Lizenzbestimmungen: | CC BY-NC-ND 4.0 Unported |
Publikationstyp: | BookPart |
Publikationsstatus: | publishedVersion |
Erstveröffentlichung: | 2015 |
Die Publikation erscheint in Sammlung(en): | Fakultät für Elektrotechnik und Informatik |
Pos. | Land | Downloads | ||
---|---|---|---|---|
Anzahl | Proz. | |||
1 | United States | 43 | 32,58% | |
2 | Germany | 42 | 31,82% | |
3 | Chile | 14 | 10,61% | |
4 | China | 7 | 5,30% | |
5 | Poland | 3 | 2,27% | |
6 | India | 3 | 2,27% | |
7 | Ireland | 2 | 1,52% | |
8 | France | 2 | 1,52% | |
9 | Czech Republic | 2 | 1,52% | |
10 | Canada | 2 | 1,52% | |
andere | 12 | 9,09% |
Hinweis
Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.