COCOA: COrrelation coefficient-aware data augmentation

Download statistics - Document (COUNTER):

Esmailoghli, M.; Quiané-Ruiz, J.-A.; Abedjan, Z.: COCOA: COrrelation coefficient-aware data augmentation. In: Velegrakis, Yannis; Zeinalipour, Demetris; Chrysanthis, Panos K.; Guerra, Francesco (Eds.): Advances in Database Technology - EDBT 2021. Konstanz, Germany : OpenProceedings.org, University of Konstanz, University Library, 2021, S. 331-336. DOI: https://doi.org/10.5441/002/edbt.2021.30

Repository version

To cite the version in the repository, please use this identifier: https://doi.org/10.15488/16496

Selected time period:

year: 
month: 

Sum total of downloads: 7




Thumbnail
Abstract: 
Calculating correlation coefficients is one of the most used measures in data science. Although linear correlations are fast and easy to calculate, they lack robustness and effectiveness in the existence of non-linear associations. Rank-based coefficients such as Spearman's are more suitable. However, rank-based measures first require to sort the values and obtain the ranks, making their calculation super-linear. One of the use-cases that is affected by this is data enrichment for Machine Learning (ML) through feature extraction from large databases. Finding the most promising features from millions of candidates to increase the ML accuracy requires billions of correlation calculations. In this paper, we introduce an index structure that ensures rank-based correlation calculation in a linear time. Our solution accelerates the correlation calculation up to 500 times in the data enrichment setting.
License of this version: CC BY-NC-ND 4.0 Unported
Document Type: BookPart
Publishing status: publishedVersion
Issue Date: 2021
Appears in Collections:Fakultät für Elektrotechnik und Informatik

distribution of downloads over the selected time period:

downloads by country:

pos. country downloads
total perc.
1 image of flag of Germany Germany 5 71.43%
2 image of flag of Russian Federation Russian Federation 1 14.29%
3 image of flag of Indonesia Indonesia 1 14.29%

Further download figures and rankings:


Hinweis

Zur Erhebung der Downloadstatistiken kommen entsprechend dem „COUNTER Code of Practice for e-Resources“ international anerkannte Regeln und Normen zur Anwendung. COUNTER ist eine internationale Non-Profit-Organisation, in der Bibliotheksverbände, Datenbankanbieter und Verlage gemeinsam an Standards zur Erhebung, Speicherung und Verarbeitung von Nutzungsdaten elektronischer Ressourcen arbeiten, welche so Objektivität und Vergleichbarkeit gewährleisten sollen. Es werden hierbei ausschließlich Zugriffe auf die entsprechenden Volltexte ausgewertet, keine Aufrufe der Website an sich.

Search the repository


Browse