Nancy Zhang Seminar
Stanford University, "Simultaneous Change-point Models with Applications to Cross-sample and Cross-platform Analysis of DNA Copy Number"
| What | |
|---|---|
| When |
2009-10-22 14:00
2009-10-22 15:15
2009-10-22 from 14:00 to 15:15 |
| Where | RRI 101 |
| Add event to calendar |
|
1 abstract
DNA copy number analysis involves the detection of chromosomal gains and losses using high-density microarray platforms. Change-point methods have been applied successfully to detecting signals in single data sequences derived from one biological sample. However, it is common to have data sets involving hundreds to thousands of biological samples. How should information be combined across samples to detect population level common polymorphisms?
Also, how should the samples be summarized to give a sparse signature of variation across the cohort? It is also now common to have the same biological sample assayed using multiple experimental platforms. For example, in the Cancer Genome Atlas project, each biological sample is processed using Illumina, Affymetrix and Agilent chips. How should data be integrated across platforms to achieve higher accuracy?
I will discuss the statistical issues underlying these problems and formulate a class of simultaneous change-point models for cross-sample and cross-platform data integration. These models lead to interpretable scan statistics whose significance level can be theoretically analyzed. I will also discuss model selection approaches for this class of models. The insights gained from this study can be applied to integrative analysis of data from other types of genome-wide profiling experiments, such as methylation or RNA expression.
2 log
DNA copy number
?? same boundary
2.1 pull samples together to detect
sparse representation
shared change-point free jump model
f_i(t) = u + sigma_i I(s-t)
sum of square of t-stat, then take max
false positive rate
2.2 multi-platform integration (MPCBS)
r_k linear coefficient for different platforms
scaling factor for different platform: (sginal response rate) sqrt(number of probes)/error-stdev
2.3 Recursive segmentaion approach (MSCBS)
scan by z-score
find the max
go back do it again
no of change points by BIC
classic BIC requires the likelihood to be differentiable and also limited number of models
model selection involves: N (#samples), T(#probes), m(#change points), M (#"mean" parameters)
BIC - NmH(M/Nm), H(p) is entropy. the extra term offsets the decrease of residual
2.4 Validation:
- no of disagreements between tech replicates
- compare child and parents. only if parent has it
2.5 cons
- the stat (essentially is sum of chi-square stat) is not good for rare variants (<5%)