Multivariate Statistical Methods for High-Dimensional Multiset Omics Data Analysis

Attila Csala; Aeilko H. Zwinderman

doi:10.15586/computationalbiology.2019.ch5

PDF HTML XML

Published: Oct 31, 2019

DOI: https://doi.org/10.15586/computationalbiology.2019.ch5

Keywords:

canonical correlation analysis, high-dimensional data analysis, integrative omics data, multivariate statistics, redundancy analysis

Attila Csala

Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands

Aeilko H. Zwinderman

Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands

ABSTRACT

This chapter covers the state-of-the-art multivariate statistical methods designed for high dimensional multiset omics data analysis. Recent biotechnological developments have enabled large-scale measurement of various biomolecular data, such as genotypic and phenotypic data, dispersed over various omics domains. An emergent research direction is to analyze these data sources using an integrated approach to better model and understand the underlying biology of complex disease conditions. However, comprehensive analysis techniques that can handle both the size and complexity, and at the same time can account for the hierarchical structure of such data, are lacking. An overview of some of the developments in multivariate techniques for high-dimensional omics data analysis, highlighting two well-known multivariate methods, canonical correlation analysis (CCA) and redundancy analysis (RDA), is provided in this chapter. Penalized versions of CCA are widespread in the omics data analysis field, and there is recent work on multiset penalized RDA that is applicable to multiset omics data. How these methods meet the statistical challenges that come with high-dimensional multiset omics data analysis and help to further our understanding of the human condition in terms of health and disease are presented. Additionally, the current challenges to be resolved in the field of omics data analysis are discussed.