Multiblock Data Fusion in Statistics and Machine Learning
- 0 %

Multiblock Data Fusion in Statistics and Machine Learning

Applications in the Natural and Life Sciences
Sofort lieferbar | Lieferzeit: Sofort lieferbar I
Alle Preise inkl. MwSt. | Versandkostenfrei
ISBN-13:
9781119600961
Veröffentl:
2022
Erscheinungsdatum:
28.04.2022
Seiten:
416
Autor:
Age K. Smilde
Gewicht:
1048 g
Format:
256x181x26 mm
Sprache:
Deutsch
Beschreibung:

Age K. Smilde is a Professor of Biosystems Data Analysis at the Swammerdam Institute for Life Sciences at the University of Amsterdam. He also holds a part-time position at the Department of Machine Intelligence of Simula Metropolitan Center for Digital Engineering in Oslo, Norway. His research interest is multiblock data analysis and its implementation in different fields of life sciences. He is currently the Editor-in-Chief of the Journal of Chemometrics.
 
Tormod Næs is a Senior Scientist at Nofima, a food research institute in Norway. He is also currently employed as adjoint professor at the Department of Food Science, University of Copenhagen, Denmark and as extraordinary professor at University of Stellenbosch, South Africa. His main research interest is multivariate analysis with special emphasis on applications in sensory science and spectroscopy.
 
Kristian Hovde Liland is an Associate Professor with a top scientist scholarship in Data Science at the Norwegian University of Life Sciences and works in the areas of chemometrics, data analysis and machine learning. His main research is in linear prediction modelling, spectroscopy, and the transition between chemometrics and machine learning.
Foreword xiii
 
Preface xv
 
List of Figures xvii
 
List of Tables xxxi
 
Part I Introductory Concepts and Theory 1
 
1 Introduction 3
 
1.1 Scope of the Book 3
 
1.2 Potential Audience 4
 
1.3 Types of Data and Analyses 5
 
1.3.1 Supervised and Unsupervised Analyses 5
 
1.3.2 High-, Mid- and Low-level Fusion 5
 
1.3.3 Dimension Reduction 7
 
1.3.4 Indirect Versus Direct Data 8
 
1.3.5 Heterogeneous Fusion 8
 
1.4 Examples 8
 
1.4.1 Metabolomics 8
 
1.4.2 Genomics 11
 
1.4.3 Systems Biology 13
 
1.4.4 Chemistry 13
 
1.4.5 Sensory Science 15
 
1.5 Goals of Analyses 16
 
1.6 Some History 17
 
1.7 Fundamental Choices 17
 
1.8 Common and Distinct Components 19
 
1.9 Overview and Links 20
 
1.10 Notation and Terminology 21
 
1.11 Abbreviations 22
 
2 Basic Theory and Concepts 25
 
2.i General Introduction 25
 
2.1 Component Models 25
 
2.1.1 General Idea of Component Models 25
 
2.1.2 Principal Component Analysis 26
 
2.1.3 Sparse PCA 30
 
2.1.4 Principal Component Regression 31
 
2.1.5 Partial Least Squares 32
 
2.1.6 Sparse PLS 36
 
2.1.7 Principal Covariates Regression 37
 
2.1.8 Redundancy Analysis 38
 
2.1.9 Comparing PLS, PCovR and RDA 38
 
2.1.10 Generalised Canonical Correlation Analysis 38
 
2.1.11 Simultaneous Component Analysis 39
 
2.2 Properties of Data 39
 
2.2.1 Data Theory 39
 
2.2.2 Scale-types 42
 
2.3 Estimation Methods 44
 
2.3.1 Least-squares Estimation 44
 
2.3.2 Maximum-likelihood Estimation 45
 
2.3.3 Eigenvalue Decomposition-based Methods 47
 
2.3.4 Covariance or Correlation-based Estimation Methods 47
 
2.3.5 Sequential Versus Simultaneous Methods 48
 
2.3.6 Homogeneous Versus Heterogeneous Fusion 50
 
2.4 Within- and Between-block Variation 52
 
2.4.1 Definition and Example 52
 
2.4.2 MAXBET Solution 54
 
2.4.3 MAXNEAR Solution 54
 
2.4.4 PLS2 Solution 55
 
2.4.5 CCA Solution 55
 
2.4.6 Comparing the Solutions 56
 
2.4.7 PLS, RDA and CCA Revisited 56
 
2.5 Framework for Common and Distinct Components 60
 
2.6 Preprocessing 63
 
2.7 Validation 64
 
2.7.1 Outliers 64
 
2.7.1.1 Residuals 64
 
2.7.1.2 Leverage 66
 
2.7.2 Model Fit 67
 
2.7.3 Bias-variance Trade-off 69
 
2.7.4 Test Set Validation 70
 
2.7.5 Cross-validation 72
 
2.7.6 Permutation Testing 75
 
2.7.7 Jackknife and Bootstrap 76
 
2.7.8 Hyper-parameters and Penalties 77
 
2.8 Appendix 78
 
3 Structure of Multiblock Data 87
 
3.i General Introduction 87
 
3.1 Taxonomy 87
 
3.2 Skeleton of a Multiblock Data Set 87
 
3.2.1 Shared Sample Mode 88
 
3.2.2 Shared Variable Mode 88
 
3.2.3 Shared Variable or Sample Mode 88
 
3.2.4 Shared Variable and Sample Mode 89
 
3.3 Topology of a Multiblock Data Set 90
 
3.3.1 Unsupervised Analysis 90
 
3.3.2 Supervised Analysis 93
 
3.4 Linking Structures 95
 
3.4.1 Linking Structure for Unsupervised Analysis 95
 
3.4.2 Linking Structures for Supervised Analysis 96
 
3.5 Summary 98
 
4 Matrix Correlations 99
 
4.i General Introduction 99
 
4.1 Definition 99
 
4.2 Most Used Matrix Correlations 101
 
4.2.1 Inner Product Correl
Multiblock Data Fusion in Statistics and Machine Learning
 
Explore the advantages and shortcomings of various forms of multiblock analysis, and the relationships between them, with this expert guide
 
Arising out of fusion problems that exist in a variety of fields in the natural and life sciences, the methods available to fuse multiple data sets have expanded dramatically in recent years. Older methods, rooted in psychometrics and chemometrics, also exist.
 
Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences is a detailed overview of all relevant multiblock data analysis methods for fusing multiple data sets. It focuses on methods based on components and latent variables, including both well-known and lesser-known methods with potential applications in different types of problems.
 
Many of the included methods are illustrated by practical examples and are accompanied by a freely available R-package. The distinguished authors have created an accessible and useful guide to help readers fuse data, develop new data fusion models, discover how the involved algorithms and models work, and understand the advantages and shortcomings of various approaches.
 
This book includes:
* A thorough introduction to the different options available for the fusion of multiple data sets, including methods originating in psychometrics and chemometrics
* Practical discussions of well-known and lesser-known methods with applications in a wide variety of data problems
* Included, functional R-code for the application of many of the discussed methods
 
Perfect for graduate students studying data analysis in the context of the natural and life sciences, including bioinformatics, sensometrics, and chemometrics, Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences is also an indispensable resource for developers and users of the results of multiblock methods.

Kunden Rezensionen

Zu diesem Artikel ist noch keine Rezension vorhanden.
Helfen sie anderen Besuchern und verfassen Sie selbst eine Rezension.