Principles of statistical genomics

Author(s)

Bibliographic Information

Principles of statistical genomics

Shizhong Xu

Springer, c2013

Available at  / 1 libraries

Search this Book/Journal

Note

Includes bibliographical references (p. 413-421) and index

Description and Table of Contents

Description

Statistical genomics is a rapidly developing field, with more and more people involved in this area. However, a lack of synthetic reference books and textbooks in statistical genomics has become a major hurdle on the development of the field. Although many books have been published recently in bioinformatics, most of them emphasize DNA sequence analysis under a deterministic approach. Principles of Statistical Genomics synthesizes the state-of-the-art statistical methodologies (stochastic approaches) applied to genome study. It facilitates understanding of the statistical models and methods behind the major bioinformatics software packages, which will help researchers choose the optimal algorithm to analyze their data and better interpret the results of their analyses. Understanding existing statistical models and algorithms assists researchers to develop improved statistical methods to extract maximum information from their data. Resourceful and easy to use, Principles of Statistical Genomics is a comprehensive reference for researchers and graduate students studying statistical genomics.

Table of Contents

Part I Genetic Linkage Map 1 Map Functions 1.1 Physical map and genetic map 1.2 Derivation of map functions 1.3 Haldane map function 1.4 Kosambi map function 2 Recombination Fraction 2.1 Mating designs 2.2 Maximum likelihood estimation of recombination fraction 2.3 Standard error and significance test 2.4 Fisher's scoring algorithm for estimating 2.5 EM algorithm for estimating 3 Genetic Map Construction 3.1 Criteria of optimality 3.2 Search algorithms 3.2.1 Exhaustive search 3.2.2 Heuristic search 3.2.3 Simulated annealing 3.2.4 Branch and bound 3.3 Bootstrap confidence of a map 4 Multipoint Analysis of Mendelian Loci 4.1 Joint distribution of multiple locus genotype 4.1.1 BC design 4.1.2 F2 design 4.1.3 Four-way cross design 4.2 Incomplete genotype information 4.2.1 Partially informative genotype 4.2.2 BC and F2 are special cases of FW 4.2.3 Dominance and missing markers 4.3 Conditional probability of a missing marker genotype 4.4 Joint estimation of recombination fractions 4.5 Multipoint analysis for m markers 4.6 Map construction with unknown recombination fractions Part II Analysis of Quantitative Traits 5 Basic Concepts of Quantitative Genetics 5.1 Gene frequency and genotype frequency 5.2 Genetic effects and genetic variance 5.3 Average effect of allelic substitution 5.4 Genetic variance components 5.5 Heritability 5.6 An F2 family is in Hardy-Weinberg equilibrium 6 Major Gene Detection 6.1 Estimation of major gene effect 6.1.1 BC design 6.1.2 F2 design 6.2 Hypothesis tests 6.2.1 BC design 6.2.2 F2 design 6.3 Scale of the genotype indicator variable 6.4 Statistical power 6.4.1 Type I error and statistical power 6.4.2 Wald-test statistic 6.4.3 Size of a major gene 6.4.4 Relationship between W-test and Z-test 6.4.5 Extension to dominance effect 7 Segregation Analysis 7.1 Gaussian mixture distribution 7.2 EM algorithm 7.2.1 Closed form solution 7.2.2 EM steps 7.2.3 Derivation of the EM algorithm 7.2.4 Proof of the EM algorithm 7.3 Hypothesis tests 7.4 Variances of estimated parameters 7.5 Estimation of the mixing proportions 8 Genome Scanning for Quantitative Trait Loci 8.1 The mouse data 8.2 Genome scanning 8.3 Missing genotypes 8.4 Test statistics 8.5 Bonferroni correction 8.6 Permutation test 8.7 Piepho's approximate critical value 8.8 Theoretical consideration 9 Interval Mapping 9.1 Least squares method 9.2 Weighted least squares 9.3 Fisher scoring 9.4 Maximum likelihood method 9.4.1 EM algorithm 9.4.2 Variance-covariance matrix of 9.4.3 Hypothesis test 9.5 Remarks on the four methods of interval mapping 10 Interval Mapping for Ordinal Traits 10.1 Generalized linear model 10.2 ML under homogeneous variance 10.3 ML under heterogeneous variance 10.4 ML under mixture distribution 10.5 ML via the EM algorithm 10.6 Logistic analysis 10.7 Example 11 Mapping Segregation Distortion Loci 11.1 Probabilistic model 11.1.1 The EM Algorithm 11.1.2 Hypothesis test 11.1.3 Variance matrix of the estimated parameters 11.1.4 Selection coefficient and dominance 11.2 Liability model 11.2.1 EM algorithm 11.2.2 Variance matrix of estimated parameters 11.2.3 Hypothesis test 11.3 Mapping QTL under segregation distortion 11.3.1 Joint likelihood function 11.3.2 EM algorithm 11.3.3 Variance-covariance matrix of estimated parameters 11.3.4 Hypothesis tests 11.3.5 Example 12 QTL Mapping in Other Populations 12.1 Recombinant inbred lines 12.2 Double haploids 12.3 Four-way crosses 12.4 Full-sib family 12.5 F2 population derived from outbreds 12.6 Example 13 Random Model Approach to QTL Mapping 13.1 Identity-by-descent (IBD) 13.2 Random effect genetic model 13.3 Sib-pair regression 13.4 Maximum likelihood estimation 13.4.1 EM algorithm 13.4.2 EM algorithm under singular value decomposition 13.4.3 Multiple siblings 13.5 Estimating the IBD value for a marker 13.6 Multipoint method for estimating the IBD value 13.7 Genome scanning and hypothesis tests 13.8 Multiple QTL model 13.9 Complex pedigree analysis 14 Mapping QTL for Multiple Traits 14.1 Multivariate model 14.2 EM algorithm for parameter estimation 14.3 Hypothesis tests 14.4 Variance matrix of estimated parameters 14.5 Derivation of the EM algorithm 14.6 Example 15 Bayesian Multiple QTL Mapping 15.1 Bayesian regression analysis 15.2 Markov chain Monte Carlo 15.3 Mapping multiple QTL 15.3.1 Multiple QTL model 15.3.2 Prior, likelihood and posterior 15.3.3 Summary of the MCMC process 15.3.4 Post MCMC analysis 15.4 Alternative methods of Bayesian mapping 15.4.1 Reversible jump MCMC 15.4.2 Stochastic search variable selection 15.4.3 Lasso and Bayesian Lasso 15.5 Example: Arabidopsis data 16 Empirical Bayesian QTL Mapping 16.1 Classical mixed model 16.1.1 Simultaneous updating for matrix G 16.1.2 Coordinate descent method 16.1.3 Block coordinate descent method 16.1.4 Bayesian estimates of QTL effects 16.2 Hierarchical mixed model 16.2.1 Inverse chi-square prior 16.2.2 Exponential prior 16.2.3 Dealing with sparse models 16.3 Infinitesimal model for whole genome sequence data 16.3.1 Data trimming 16.3.2 Concept of continuous genome 16.4 Example: Simulated data Part III Microarray Data Analysis 17 Microarray Differential Expression Analysis 17.1 Data preparation 17.1.1 Data transformation 17.1.2 Data normalization 17.2 F-test and t-test 17.3 Type I error and false discovery rate 17.4 Selection of differentially expressed genes 17.4.1 Permutation test 17.4.2 Selecting genes by controlling FDR 17.4.3 Problems of the previous methods 17.4.4 Regularized t-test 17.5 General linear model 17.5.1 Fixed model approach 17.5.2 Random model approach 18 Hierarchical Clustering of Microarray Data 18.1 Distance matrix 18.2 UPGMA 18.3 Neighbor joining 18.3.1 Principle of neighbor joining 18.3.2 Computational algorithm 18.4 Other methods 18.5 Bootstrap confidence 19 Model-Based Clustering of Microarray Data 19.1 Cluster analysis with the K-means method 19.2 Cluster analysis under Gaussian mixture 19.2.1 Multivariate Gaussian distribution 19.2.2 Mixture distribution 19.2.3 The EM algorithm 19.2.4 Supervised cluster analysis 19.2.5 Semi-supervised cluster analysis 19.3 Inferring the number of clusters 19.4 Microarray experiments with replications 20 Gene Specific Analysis of Variances 20.1 General linear model 20.2 The SEM algorithm 20.3 Hypothesis testing 21 Factor Analysis of Microarray Data 21.1 Background of factor analysis 21.1.1 Linear model of latent factors 21.1.2 EM algorithm 21.1.3 Number of factors 21.2 Cluster analysis 21.3 Differential expression analysis 21.4 MCMC algorithm 22 Classification of Tissue Samples Using Microarrays 22.1 Logistic regression 22.2 Penalized logistic regression 22.3 The coordinate descent algorithm 22.4 Cross validation 22.5 Prediction of disease outcome 22.6 Multiple category classification 23 Time-Course Microarray Data Analysis 23.1 Gene expression profiles 23.2 Orthogonal polynomial 23.3 B-spline 23.4 Mixed effect model 23.5 Mixture mixed model 23.6 EM algorithm 23.7 Best linear unbiased prediction 23.8 SEM algorithm 23.8.1 Monte Carlo sampling 23.8.2 SEM steps 24 Quantitative Trait Associated Microarray Data Analysis 24.1 Linear association 24.1.1 Linear model 24.1.2 Cluster analysis 24.1.3 Three-cluster analysis 24.1.4 Differential expression analysis 24.2 Polynomial and B-spline 24.3 Multiple trait association 25 Mapping Expression Quantitative Trait Loci 25.1 Individual marker analysis 25.1.1 SEM algorithm 25.1.2 MCMC algorithm 25.2 Joint analysis of all markers 25.2.1 Multiple eQTL model 25.2.2 SEM algorithm 25.2.3 MCMC algorithm 25.2.4 Hierarchical evolutionary stochastic search (HESS)

by "Nielsen BookData"

Details

Page Top