| | BIOST 2025: Biostatistics Seminar Notices Seminar Notices Spring Term 2007 Seminar Notices Fall Term 2006 Seminar Speakers Spring Term 2007 Kelly H. Zou, February 8, 2007 Yan Lin , March 1, 2007 Qing Xu , March 1, 2007 Wentao Feng , March 1, 2007 Kiros Berhane, March 22, 2007 Huiman X. Barnhart, Ph.D., April 12, 2007 | SEMINAR DATE: Thursday, February 8, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Kelly H. Zou, Director of Biostatistics, Clinical Research Program, Children's Hospital Boston, Associate Professor of Radiology, Harvard Medical School TOPIC: Accuracy and Reliability of Statistical Classification Methods with Applications in Image-Guided Therapy and Functional MR Imaging | The validity of image segmentation is an important issue in image processing and image-guided interventions because it has a direct impact on therapeutic planning. We examined classification accuracy in imaging analysis based on three two-sample validation metrics against the estimated composite latent gold standard, which was derived from several experts' manual segmentations by an expectation-maximization (EM) algorithm called simultaneous truth and performance level estimation (STAPLE). The distribution functions of the target and control pixel data were parametrically assumed to be a mixture of two beta distributions with different shape parameters. We estimated the corresponding receiver operating characteristic (ROC) curve, Dice similarity coefficient, and mutual information, over all possible decision thresholds. Based on each validation metric, an optimal threshold was then computed via maximization. We illustrated these methods using magnetic resonance (MR) imaging data on three radiological examples: (1) hidden gold standard in prostate peripheral zone segmentation for brachytherapy, (2) accuracy of brain tumor segmentation, and (3) multi-institutional functional imaging for detection of brain activation. Extensions to spatial correlation structures were also considered under a Markov random fields model. Key Words: Sensitivity, Specificity, Receiver operating characteristic (ROC) curve, Dice similarity coefficient, Mutual information, Expectation maximization (EM) algorithm, magnetic resonance (MR) imaging, image-guided therapy, Functional MR imaging (fMRI). | Return to top | SEMINAR DATE: Thursday, March 1, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Yan Lin, Graduate Student, Department of Biostatistics, University of Pittsburgh TOPIC: Smarter Clustering Methods for High-throughput SNP Genotype Calling | Many high-throughput genotyping technologies for single nucleotide polymorphism (SNP) markers have been developed. Most use clustering methods to ”call” the SNP genotypes, but standard clustering methods are not optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of a number of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is mostly ignored. Furthermore, when prior information about the distribution of the measurements for each cluster is available, choosing an appropriate model-based clustering method can significantly improve the genotype calls. In this paper, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes. We also propose two new methods to call genotypes using family data. The first method is a modification of the K-means method which uses the family information by updating all members from a family together. The second method is a likelihood-based method that combines the Gaussian or Beta mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Sequenom platform (www.sequenom.com). | Return to top | SEMINAR DATE: Thursday, March 1, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Qing Xu, Graduate Student, Department of Biostatistics, University of Pittsburgh TOPIC: A weighted log-rank test to detect early difference in censored survival distributions | We revisit the weighted log-rank test where the weight function was derived by assuming the inverse Gaussian distribution for an omitted exponentiated covariate that induces a nonproportionality under the proportional hazards model (Oakes and Jeong, 1998). The method was based on the score test statistic for a group comparison from the proportional hazards model. In this paper, we perform a simulation study to compare the test statistic based on the inverse Gaussian distribution with ones using other popular weight functions including members of the Harrington-Fleming's G-rho family (1982). The nonproportional hazards data are generated by changing the hazard ratios over time under the proportional hazards model. The results indicate that the inverse Gaussian-based test tends to have higher power than some of the members that belong to the G-rho family in detecting a difference between two survival distributions when populations become homogeneous as time progresses. One of the datasets from phase III clinical trials on breast cancer will be illustrated as a real example. | Return to top | SEMINAR DATE: Thursday, March 1, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Wentao Feng, Graduate Student, Department of Biostatistics, University of Pittsburgh TOPIC: A Supremum Log-Rank Test for Adaptive Two-Stage Treatment Strategies and Corresponding Sample Size Formula | In two-stage adaptive treatment strategies, patients receive one of the induction treatments, followed by a maintenance treatment given that the patient responded to the induction treatment they received. To test for a difference in the effect of different induction and maintenance treatment combinations, a modified supremum weighted log-rank test is proposed. The test is applied to the data from a two-stage randomized trial and compared to the results obtained using a standard weighted log rank test. A sample size formula is proposed based on the limiting distribution of the supremum weighted log-rank statistic. The sample size formula reduces to the Eng and Kosorok's sample-size formula for two-sample supremum log-rank test when there is no second randomization. Monte Carlo simulation studies show that the proposed test provides sample sizes which are close to that obtained by standard weighted log-rank test under proportional hazard alternative. However, the proposed test is more powerful than the standard weighted log-rank test under non-proportional hazard alternatives. KEYWORDS : Adaptive treatment strategies; Brownian motion; Censoring distribution; Counting process; Proportional hazards; Two-stage designs; Sample size formula; Supremum log rank statistics; Survival Function | Return to top | SEMINAR DATE: Thursday, March 22, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Kiros Berhane, Associate Professor, Department of Preventive Medicine, University of Southern California TOPIC: Functional Based Multi Level Models for Multiple Longitudinal Outcomes | | We propose a flexible multi-level modeling technique that uses splines to model multiple longitudinal outcomes. This work is motivated by our desire to study the long term effects of air pollution on children's health by examining the relationship between ecologic covariates (e.g. air pollution) and functionals related to various nonlinear lung function growth curves. The latent variable approach is used in connecting the outcomes within a subject. This technique allows for the estimation of cluster specific growth curves, after adjusting for subject-specific growth-curve parameters. A Gibbs sampling approach is implemented to obtain posterior mean and variance estimates of non-linear functionals of growth curves in a unified way. Ecologic inference is then conducted in a multi level setting. We illustrate the technique via analysis of data from the Southern California Children's Health Study. | Return to top SEMINAR DATE: Thursday, April 12, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Huiman Barnhart, Associate Professor, Department of Biostatistics and Bioinformatics, Duke University TOPIC: An Overview on Assessing Agreement with Continuous Agreement | Reliable and accurate measurements serve as basis for evaluation in many disciplines of science. The issues related to reliable and accurate measurement have evolved over several decades dating back to the 19th century with pioneering work by Galton (1886), Pearson (1896, 1899, 1901) and Fisher (1925). Oftentimes, it is not practical to require the new measurement to be identical to the truth either because (1) we are willing to accept a measurement up to some tolerable (or acceptable) error or (2) the truth is simply not available to us because either it is not measurable or it is only possible to measure with some error. To deal with issues related to both (1) and (2), different concepts, methods, or theories have been developed in different disciplines. Some of these concepts have been used across different disciplines and some have been limited to a particular field and may have potential use in other disciplines. In this talk, I elucidate and contrast fundamental concepts used in different disciplines and bring these concepts into one common theme: assessing closeness (agreement) of the observations. We focus on assessing agreement with continuous measurement and classify different statistical approaches for expressing agreement in terms of (a) descriptive tools; (b) unscaled summary indices based on absolute differences of measurements, and (c) scaled summary indices attaining values between -1 and 1 for various data structure and for cases with and without reference. We identify gaps that require further research as well as future directions in assessing agreement. This is the joint work with Michal Haber in Emory University and Lawrence Lin in Baxter Healthcare Inc.. This research is supported by NIH R01-MH070028. | Return to top Seminar Speakers Fall Term 2006 Chien-Cheng (George) Tseng, September 7, 2006 Dev Chakraborty, September 21, 2006 Sanat K. Sarkar, October 19, 2006 Jeffrey Blume, November 3, 2006 Yu Cheng, November 16, 2006 Joseph Ibrahim, December 7 , 2006 | SEMINAR DATE: Thursday, September 7, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Chien-Cheng (George) Tseng, Assistant Professor, Graduate School of Public Health, Department of Biostatistics, University of Pittsburgh TOPIC: Which Missing Value Imputation Method to Use in Expression Profiles: a Comparative Study and Two Selection Schemes | | Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method appeared to have their pros and cons and the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of current imputation methods on multiple types of microarray experiments including time series, multiple exposures, and multiple exposures X time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data. We found that the success of each imputation method largely depended on the underlying complexity of the expression data. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme performed better than any single method. In addition, a self-training selection (STS) scheme, which determines the optimal imputation method via simulation, was proposed and performed even better than the EBS scheme with increased computation cost. Our findings have provided much insight to the problem of missing value imputation in microarray data. We conclude that EBS and STS schemes are complementary and effective tools for selecting the optimal imputation method. | Return to top | SEMINAR DATE: Thursday, September 21, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Dev Chakraboraty, Associate Professor, School of Medicine Department of Radiology, University of Pittsburgh TOPIC: Free-response observer performance methodology - recent advances and applications to imaging system assessment | | The free-response paradigm is being increasingly used in the assessment of medical imaging systems. It differs from the receiver operating characteristic (ROC) method in that it uses location information, rewarding the observer when lesions are correctly located and penalizing the observer when they are not. In the FROC paradigm the data-unit is a variable number of mark-rating pairs per image. A mark is the indicated location of a suspicious region. The rating is the corresponding degree of suspicion. A method for analyzing free-response data, termed JAFROC analysis is summarized. JAFROC has been shown to have significantly greater statistical power at discriminating between modalities than the ROC method but it has significant limitations. A recent search-model for free-response data will be described. The search model can be used to infer a figure of merit of observer performance and to predict ROC and FROC curves. An approach to estimating search model parameters for computer aided detection algorithms is described. Significant problems inhibiting full utilization of the method are summarized. | Return to top | SEMINAR DATE: Thursday, October 19, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Sanat K. Sarkar, Professor, The Fox School of Business, Department of Statistics, Temple University TOPIC: Generalizing the False Discovery Rate in Multiple Testing of Large Number of Hypotheses | | While testing a large number of hypotheses, one is often willing to tolerate a few false rejections but wants to control too many of them, say k or more. In such a case, a procedure controlling at least k false discoveries will have better abilities to detect false null hypotheses than the one that controls at least one false discovery. With that rationale, the traditional concept of familywise error rate (FWER), the probability of at least one false discovery, has recently been generalized to that of the k-FWER, the probability of at least k false discoveries, and procedures controlling it have been proposed. In this talk, I'll introduce a less conservative notion of error rate than the k-FWER, which is the k-FDR, the expected proportion of k or more false discoveries among the total number of discoveries. Procedures controlling the k-FDR will be presented under certain assumptions on the dependence structure of the underlying p-values. Some of these procedures are more powerful than the corresponding k-FWER procedures and some are often more powerful than the Benjamini-Hochberg FDR procedure. | Return to top | SEMINAR DATE: Friday, November 3, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jeffrey Blume, Assistant Professor, Center for Statistical Sciences, Brown University TOPIC: Why Likelihood is on the Critical Path | The statistical challenges and opportunities along the FDA’s ‘critical path of new medical products’ are substantial. The critical path calls for increased flexibility and greater efficiency of statistical methods without sacrificing their accuracy. This is exactly what Likelihood methods provide. Consider using the likelihood ratio, instead of a p-value or posterior probability, to measure of the strength of statistical evidence in the data. It has already been established that Likelihood ratios are flexible, efficient, and seldom misleading. But more to the point is that they are more flexible, more efficient and more accurate than the traditional hypothesis testing methods. For example, examining likelihood ratios as I propose minimizes the overall probability of making an error (either type I or type II) in an experiment and this result holds even in cases with multiple endpoints where the type I error is adjusted to avoid inflation. In addition, likelihood ratios remain seldom misleading even when the study is (statistically) rigged to produce evidence favoring the pet hypothesis over the true hypothesis. The take home message is that likelihood ratios represent an excellent compromise between the Bayesian and frequentist paradigms when it comes to measuring the strength of evidence in data. This is because likelihood ratios retain the desirable properties from both paradigms (irrelevance of sample space, good performance probabilities) while shedding the undesirable ones (dependence on prior distributions, ad-hoc adjustments to control error probabilities). Hence, Likelihood ratios make ideal regulatory tools because (1) they eliminate the need for seemingly arbitrary adjustments, (2) they frequently characterize the evidence correctly even when the study is (statistically) rigged to do otherwise, (3) they can be made robust to the underlying model choice in a way that preserves (1) and (2) above. | Return to top | SEMINAR DATE: Thursday, November 16, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Yu Cheng, Assistant Professor, Department of Statistics, University of Pittsburgh TOPIC: MODELING ASSOCIATION OF BIVARIATE COMPETING RISKS DATA THROUGH CUMULATIVE INCIDENCE FUNCTIONS | Frailty models are frequently used to analyze clustered survival data and evaluate within-cluster associations. However, they are seldom used in multivariate competing risks settings because of the challenge imposed by dependence structure between the primary and competing risks. To address this challenge, we focus on a nonparametrically identifiable quantity: cumulative incidence function (CIF). Frailty models are constructed expressing the bivariate CIF in terms of its marginals based on some improper random variables whose distribution functions corresponding to CIFs. Estimating equations are proposed to estimate the unknown association parameter involved in frailty models. The large sample properties of the association parameter estimators are established using empirical processes techniques and their practical performances are studied by Monte-Carlo simulations. We illustrate their practical utility by an analysis of dementia in the Cache County Study. | Return to top | SEMINAR DATE: Thursday, December 7, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Joseph Ibrahim,Alumni Distinguished Professor,School of Public Health,Department of Biostatistics,University of North Carolinah TOPIC: Variable selection in regression mixture modeling for the discovery of gene regulatory networks To access slides from talk click here. | The profusion of genomic data through genome sequencing and gene expression microarray technology has facilitated statistical research in determining gene interactions regulating a biological process. Current methods generally consist of a two-stage procedure: clustering gene expression measurements, and searching for regulatory ``switches'', typically short, conserved sequence patterns (motifs) in the DNA sequence adjacent to the genes. This process often leads to misleading conclusions as incorrect cluster selection may lead to missing important regulatory motifs or making many false discoveries. Treating cluster memberships as known, rather than estimated, introduces bias into analyses, preventing uncertainty about cluster parameters. Further, there is under-utilization of the available data, as the sequence information is ignored for purposes of expression clustering and vice-versa. We propose a way to address these issues by combining gene clustering and motif discovery in a unified framework, a mixture of hierarchical regression models, with unknown components representing the latent gene clusters, and genomic sequence features linked to the resultant gene expression through a multivariate hierarchical regression. We demonstrate a Monte Carlo method for simultaneous variable selection (for motifs) and clustering (for genes). The selection of the number of components in the mixture is addressed by computing the analytically intractable Bayes factor through a novel multi-stage mixture importance sampling approach. This methodology is illustrated on a yeast cell cycle dataset to determine an optimal set of motifs that discriminates between groups of genes and simultaneously finds the most significant gene clusters. | |  |