GVC: efficient random access compression for gene sequence variations

Zur Kurzanzeige

dc.identifier.uri https://www.repo.uni-hannover.de/handle/123456789/14826
dc.identifier.uri https://doi.org/10.15488/14708
dc.contributor.author Adhisantoso, Yeremia Gunawan
dc.contributor.author Voges, Jan
dc.contributor.author Rohlfing, Christian
dc.contributor.author Tunev, Viktor
dc.contributor.author Ohm, Jens-Rainer
dc.contributor.author Ostermann, Jörn
dc.date.accessioned 2023-09-06T05:16:24Z
dc.date.available 2023-09-06T05:16:24Z
dc.date.issued 2023
dc.identifier.citation Adhisantoso, Y.G.; Voges, J.; Rohlfing, C.; Tunev, V.; Ohm, J.-R. et al.: GVC: efficient random access compression for gene sequence variations. In: BMC Bioinformatics 24 (2023), 121. DOI: https://doi.org/10.1186/s12859-023-05240-0
dc.description.abstract Background: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. Results: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. Conclusions: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/. eng
dc.language.iso eng
dc.publisher London : BioMed Central
dc.relation.ispartofseries BMC Bioinformatics 24 (2023)
dc.rights CC BY 4.0 Unported
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject Compression eng
dc.subject Random access eng
dc.subject Variants eng
dc.subject VCF eng
dc.subject.ddc 004 | Informatik ger
dc.subject.ddc 570 | Biowissenschaften, Biologie ger
dc.title GVC: efficient random access compression for gene sequence variations eng
dc.type Article
dc.type Text
dc.relation.essn 1471-2105
dc.relation.doi https://doi.org/10.1186/s12859-023-05240-0
dc.bibliographicCitation.volume 24
dc.bibliographicCitation.firstPage 121
dc.description.version publishedVersion
tib.accessRights frei zug�nglich


Die Publikation erscheint in Sammlung(en):

Zur Kurzanzeige

 

Suche im Repositorium


Durchblättern

Mein Nutzer/innenkonto

Nutzungsstatistiken