COCOA: COrrelation coefficient-aware data augmentation

Zur Kurzanzeige

dc.identifier.uri http://dx.doi.org/10.15488/16496
dc.identifier.uri https://www.repo.uni-hannover.de/handle/123456789/16623
dc.contributor.author Esmailoghli, Mahdi
dc.contributor.author Quiané-Ruiz, Jorge-Arnulfo
dc.contributor.author Abedjan, Ziawasch
dc.contributor.editor Velegrakis, Yannis
dc.contributor.editor Zeinalipour, Demetris
dc.contributor.editor Chrysanthis, Panos K.
dc.contributor.editor Guerra, Francesco
dc.date.accessioned 2024-03-05T08:20:30Z
dc.date.available 2024-03-05T08:20:30Z
dc.date.issued 2021
dc.identifier.citation Esmailoghli, M.; Quiané-Ruiz, J.-A.; Abedjan, Z.: COCOA: COrrelation coefficient-aware data augmentation. In: Velegrakis, Yannis; Zeinalipour, Demetris; Chrysanthis, Panos K.; Guerra, Francesco (Eds.): Advances in Database Technology - EDBT 2021. Konstanz, Germany : OpenProceedings.org, University of Konstanz, University Library, 2021, S. 331-336. DOI: https://doi.org/10.5441/002/edbt.2021.30
dc.description.abstract Calculating correlation coefficients is one of the most used measures in data science. Although linear correlations are fast and easy to calculate, they lack robustness and effectiveness in the existence of non-linear associations. Rank-based coefficients such as Spearman's are more suitable. However, rank-based measures first require to sort the values and obtain the ranks, making their calculation super-linear. One of the use-cases that is affected by this is data enrichment for Machine Learning (ML) through feature extraction from large databases. Finding the most promising features from millions of candidates to increase the ML accuracy requires billions of correlation calculations. In this paper, we introduce an index structure that ensures rank-based correlation calculation in a linear time. Our solution accelerates the correlation calculation up to 500 times in the data enrichment setting. eng
dc.language.iso eng
dc.publisher Konstanz, Germany : OpenProceedings.org, University of Konstanz, University Library
dc.relation.ispartof Advances in Database Technology - EDBT 2021
dc.rights CC BY-NC-ND 4.0 Unported
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject Data Science eng
dc.subject Correlation coefficient eng
dc.subject Data augmentation eng
dc.subject Data enrichments eng
dc.subject Index structure eng
dc.subject.classification Konferenzschrift ger
dc.subject.ddc 004 | Informatik
dc.title COCOA: COrrelation coefficient-aware data augmentation eng
dc.type BookPart
dc.type Text
dc.relation.essn 2367-2005
dc.relation.isbn 978-3-89318-084-4
dc.relation.doi https://doi.org/10.5441/002/edbt.2021.30
dc.bibliographicCitation.firstPage 331
dc.bibliographicCitation.lastPage 336
dc.description.version publishedVersion
tib.accessRights frei zug�nglich


Die Publikation erscheint in Sammlung(en):

Zur Kurzanzeige

 

Suche im Repositorium


Durchblättern

Mein Nutzer/innenkonto

Nutzungsstatistiken