Efficiently identifying top k similar entities

Hanasoge Sudheendra, Supreetha

Startseite
→
Fakultäten
→
Fakultät für Elektrotechnik und Informatik
→
Dokumentanzeige

Downloadstatistik des Dokuments (Auswertung nach COUNTER):

Hanasoge Sudheendra, Supreetha: Efficiently identifying top k similar entities. Hannover : Gottfried Wilhelm Leibniz Universität, Master Thesis, 2020, 81 S. DOI: https://doi.org/10.15488/10466

Zeitraum, für den die Download-Zahlen angezeigt werden:

Summe der Downloads: 241

Verteilung der Downloads über den gewählten Zeitraum
Herkunft der Downloads nach Ländern

zurück zum Einzeltitelnachweis (Ansicht Nutzungsstatistik schließen)

NameEfficiently_ident ...

Größe3,17 MB

FormatAdobe PDF

Öffnen

Zusammenfassung:
With the rapid growth in genomic studies, more and more successful researches are being produced that integrate tools and technologies from interdisciplinary sciences. Computational biology or bioinformatics is one such field that successfully applies computational tools to capture and transcribe biological data. Specifically in genomic studies, detection and analysis of co-occurring mutations is an leading area of study. Concurrently, in the recent years, computer science and information technology have seen an increased interest in the area association analysis and co-occurrence computation. The traditional method of finding top similar entities involves examining every possible pair of entities, which leads to a prohibitive quadratic time complexity. Most of the existing approaches also require a similarity measure and threshold beforehand to retrieve the top similar entities. These parameters are not always easy to tune. Heuristically, an adaptive method can have wider applications for identifying the top most similar pair of mutations (or entities in general). In this thesis, we have presented an algorithm to efficiently identify top k similar pair of mutations using co-occurrence as the similarity measure. Our approach used an upperbound condition to iteratively prune the search space and tackled the quadratic complexity. The empirical evaluations show that the proposed approach shows the computational efficiency in terms of execution time and accuracy of our approach particularly in large size datasets. In addition, we also evaluate the impact of various parameters like input size, k on the execution time in top k approaches. This study concludes that systematic pruning of the search space using an adaptive threshold condition optimizes the process of identifying top similar pair of entities.
Lizenzbestimmungen:	Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.
Publikationstyp:	MasterThesis
Publikationsstatus:	publishedVersion
Erstveröffentlichung:	2020-12-28
Die Publikation erscheint in Sammlung(en):	Fakultät für Elektrotechnik und Informatik

nach oben

Verteilung der Downloads über den gewählten Zeitraum:

nach oben

Herkunft der Downloads nach Ländern:

Pos.	Land		Downloads
Pos.	Land		Anzahl	Proz.
1		Germany	92	38,17%
2		United States	39	16,18%
3		China	17	7,05%
4		No geo information available	14	5,81%
5		Russian Federation	13	5,39%
6		United Kingdom	6	2,49%
7		Iran, Islamic Republic of	5	2,07%
8		Israel	5	2,07%
9		Czech Republic	5	2,07%
10		Austria	5	2,07%
		andere	40	16,60%

nach oben

Weitere Download-Zahlen und Ranglisten:

Hinweis

Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Suche im Repositorium

Durchblättern

Gesamter Bestand
Diese Sammlung

Efficiently identifying top k similar entities

Downloadstatistik des Dokuments (Auswertung nach COUNTER):

Zeitraum, für den die Download-Zahlen angezeigt werden:

Summe der Downloads: 241

Verteilung der Downloads über den gewählten Zeitraum:

Herkunft der Downloads nach Ländern:

Weitere Download-Zahlen und Ranglisten:

Suche im Repositorium

Durchblättern

Gesamter Bestand

Diese Sammlung