Coding of Genomic Sequencing Data

TNT members involved in this project:
Yeremia G. Adhisantoso, M.Sc.
Dr.-Ing. Marco Munderloh
Prof. Dr.-Ing. Jörn Ostermann
Dipl.-Ing. Jan Voges

Over the past years technological advances in genomic sequencing - the process of reading out genomic information from biological samples - have led to a faster and more cost-efficient approach to sequence individual genomes and other genomic samples. Because of the enormous amount of sequencing data generated the processing, storage and analysis of sequencing data entails novel challenges for the scientific community. New processes and tools have to be developed to overcome the current limitations in terms of storage space, processing speed and many more. Our goal is to develop novel algorithms to enhance data processing "from the tissue to the hard drive".

In the scope of this project we actively contribute to the series of MPEG-G standards (ISO/IEC 23092). More information is available on the MPEG-G website.

If you are interested in writing your thesis and thereby in contributing to this project please contact Jan Voges.

  • J. Voges, and J. Ostermann: Streaming für die Genomforschung, Binaire, vol. 2019, no. 2, 2019 (link)

Show recent publications only
  • Conference Contributions
    • Brian E Bliss, Joshua M Allen, Saurabh Baheti, Matthew A Bockol, Shubham Chandak, Jaime Delgado, Jan Fostier, Josep L Gelpi, Steven N Hart, Mikel Hernaez Arrazola, Matthew E Hudson, Michael T Kalmbach, Eric W Klee, Liudmila S Mainzer, Fabian Müntefering, Daniel Naro, Idoia Ochoa-Alvarez, Jörn Ostermann, Tom Paridaens, Christian A Ross, Jan Voges, Eric D Wieben, Mingyu Yang, Tsachy Weissman, Mathieu Wiepert
      Genie: an MPEG-G conformant software to compress genomic data
      International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), p. (poster), Denver, CO (US), November 2019
    • Tom Paridaens, Jan Voges, Mikel Hernaez, Jan Fostier, Jörn Ostermann
      GABAC: an arithmetic coding solution for genomic data
      27th Conference on Intelligent Systems for Molecular Biology (ISMB) and 18th European Conference on Computational Biology (ECCB) 2019, International Society for Computational Biology (ISCB), Vol. 8, p. 1463 (poster), Basel (CH), July 2019
    • Jan Voges
      Optimization Strategy for MPEG-G Compliant Entropy Encoding
      Contributions 5th ITG/VDE Summer School Video Compression and Processing (SVCP), University of Konstanz, pp. 228-255, Konstanz (DE), June 2019, edited by Dietmar Saupe, André Kaup, Jens-Rainer Ohm
    • Idoia Ochoa, Hongyi Li, Florian Baumgarte, Charles Hergenrother, Jan Voges, Mikel Hernaez
      AliCo: a new efficient representation for SAM files
      2019 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 93-102, Snowbird, UT (US), March 2019
    • Jan Voges
      MPEG-G: The Standard for Genomic Information Representation
      Proceedings of the 4th Summer School on Video Compression and Processing (SVCP) 2018, Leibniz Universität Hannover, Institut für Informationsverarbeitung, pp. 7-8, Hannover (DE), July 2018, edited by Jan Voges
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-Level Scheme for Quality Score Compression
      Proceedings of the 10th International Conference on Bioinformatics and Computational Biology (BICOB 2018), International Society for Computers and their Applications (ISCA), pp. 161-167, Las Vegas, NV (US), March 2018, edited by Hisham Al-Mubaid, Qin Ding, Oliver Eulenstein
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Lossy Compression of Quality Scores in Differential Gene Expression: A First Assessment and Impact Analysis
      2018 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 167-176, Snowbird, UT (US), March 2018
    • Jan Voges, Jörn Ostermann
      MPEG-G: The Emerging Standard for Genomic Data
      Poster abstracts of the 25th German Conference on Bioinformatics, PeerJ, Vol. 5, p. 2 (poster), Tübingen (DE), September 2017
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017, International Society for Computational Biology (ISCB), Vol. 6, p. 1382 (poster), Prague (CZ), August 2017
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Differential Gene Expression with Lossy Compression of Quality Scores in RNA-Seq Data
      2017 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), p. 444 (poster), Snowbird, UT (US), April 2017
    • Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger
      An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 221-230, Snowbird, UT (US), April 2016
    • Jan Voges, Marco Munderloh, Jörn Ostermann
      Predictive Coding of Aligned Next-Generation Sequencing Data
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 241-250, Snowbird, UT (US), April 2016
  • Journals
    • Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez
      GABAC: an arithmetic coding solution for genomic data
      Bioinformatics, Oxford University Press, Vol. 36, No. 7, pp. 2275-2277, December 2019, edited by John Hancock
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-level Scheme for Quality Score Compression
      Journal of Computational Biology, Mary Ann Liebert, Inc., Vol. 25, No. 10, October 2018
    • Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez
      An introduction to MPEG-G, the new ISO standard for genomic information representation
      bioRxiv, Cold Spring Harbor Laboratory, September 2018
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Bioinformatics, Oxford University Press, Vol. 34, No. 10, pp. 1650-1658, May 2018, edited by Bonnie Berger
    • Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp
      Comparison of high-throughput sequencing data compression tools
      Nature Methods, Nature Publishing Group, Vol. 13, No. 12, pp. 1005-1008, October 2016