| | BIOST 2025: Biostatistics Seminar Notices Seminar Notices Spring Term 2008 Seminar Notices Fall Term 2007 Seminar Speakers Spring Term 2008 Daniel Nagin, January 17, 2008 Taeyoung Park , January 31, 2008 Zhezhen Jin, February 21, 2008 Fiona Callaghan, February 28, 2008 Chunrong Cheng, February 28, 2008 Sarah Haile, February 28, 2008 Jia Li , February 28, 2008 Tao Song, February 28, 2008 Rick Blakesley, February 28, 2008 Peter F. Thall, March 6, 2008 Hongwei Zhao, March 27, 2008 Mitchell H. Gail, April 3, 2008 SEMINAR DATE: Thursday, January 17, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Daniel Nagin, Teresa and H. John Heinz III Professor of Public Policy and Statistics, Carnegie Mellon University TOPIC: THE RELATIONSHIP BETWEEN FIRST IMPRISONMENT AND CRIMINAL CAREER DEVELOPMENT: A MATCHED SAMPLES COMPARISON | Using data from the Netherlands-Based Criminal Career and Life-course Study we examine the effect of first-time imprisonment between age18-38 on the conviction rates in the three years immediately following the year of the imprisonment. Unadjusted comparisons of those imprisoned and those not imprisoned will be biased because imprisonment is not meted out randomly. Selection processes will tend to make the imprisoned group disproportionately crime prone compared to the not imprisoned group. In this study we combine group-based trajectory modeling with risk set matching to balance a variety of measurable indicators of criminal propensity. We find that first-time imprisonment is associated with an increase in criminal activity in the three years following release. The effect of imprisonment is similar across offence types. | Return to top SEMINAR DATE: Thursday, January 31, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Taeyoung Park, Assistant Professor,Department of Statistics,University of Pittsburgh TOPIC: A Melodious Harmony of Dissonance: Efficiency of Incompatibility in Partially Collapsed Gibbs Samplers | Ever increasing computational power along with ever more sophisticated statistical computing techniques is making it possible to fit ever more complex statistical models. Among the popular, computationally intensive methods, the Gibbs sampler (Geman and Geman 1984) has been spotlighted because of its simplicity and power to effectively generate samples from a high-dimensional probability distribution. Despite its simple implementation and description, however, the Gibbs sampler is criticized for its sometimes slow convergence especially when it is used to fit highly structured complex models. Here, we present partially collapsed Gibbs sampling strategies that improve the convergence by capitalizing on a set of functionally incompatible conditional distributions. Such incompatibility is generally avoided in the construction of a Gibbs sampler because the resulting convergence properties are not well understood. We, however, introduce three basic tools (marginalization, permutation, and trimming) which allow us to transform a Gibbs sampler into a partially collapsed Gibbs sampler with known stationary distribution and faster convergence. We illustrate our partially collapsed Gibbs sampling strategies by fitting joint change-point (or joint segmentation) models for Poisson time-series data from different signals in astrophysics. The change-point models assume that observed data for each signal are generated from an inhomogeneous Poisson process with constant intensity within unknown time blocks. Because the number of time blocks is unknown and depends on change points, the standard Gibbs sampler constructed to fit the models is not computationally feasible. A typical strategy to avoid the infeasible steps in the Gibbs sampler is to marginalize over Poisson intensities depending on the unknown time blocks. Such marginalization, however, results in a set of incompatible conditional distributions, so that the partially collapsed Gibbs sampler should be designed. | Return to top SEMINAR DATE: Thursday, February 21, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER:SPEAKER: Zhezhen Jin, Associate Professor, Department of Biostatistics, Mailman School of Public Health, Columbia University TOPIC: Regression analysis of Censored Data | In this talk, I will present the estimation and inference for right censored data based on semiparametric linear regression model, accelerated failure time (AFT) model, which is of the form of the ordinary linear regression model with the completely unspecified distribution for random errors. Since the existing estimating functions for regression parameters are nonregular, i.e., non-smooth and non-monotone, it is challenging to obtain the point estimation and its variance estimation. I will review recently developed estimation methods and present a user-friendly general S-Plus/R program package implementing these methods along with real examples. Unsolved issues and problems will also be discussed. | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Fiona Callaghan, University of Pittsburgh TOPIC: Classification Trees for Survival Data with Competing Risks | Classification trees are the most popular tool for categorizing individuals into groups and subgroups based on particular outcomes of interest. To date, trees have not been developed to deal with survival data involving competing risks. In this study, we propose two classification trees to analyze data with competing risks: a tree that maximizes between-node heterogeneity and a tree that maximizes within-node homogeneity. After we describe the methods used in growing and pruning the trees, we demonstrate and compare their performance with simulations in a variety of competing risk model configurations. We also illustrate their use by analyzing survival data concerning patients who had end-stage liver disease and were on the waiting list to receive a liver transplant. | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Chunrong Cheng, University of Pittsburgh TOPIC: Carrying prediction models across microarray data sets generated by different labs and different platforms | Reproducibility of microarray experiment has been greatly improved in the past decade and its application in biomedical research is more and more prevalent. Multiple literatures investigating an identical disease are found with different array platform and implemented in different labs. Similar high disease prediction accuracies are often reported in these studies, however, applying a prediction model established in one study to the other usually generates poor performance. We investigated the application of gene-wise normalization following the commonly practiced global sample-wise normalization. The proposed gene-wise normalization often dramatically increases the prediction accuracies in the cross-dataset prediction. We further propose a bootstrapping and an alternative analytical method to adjust for differential sample ratios of disease groups that may affect the performance of gene-wise normalization. Simulation result and application to three lung cancer data sets show significant and robust improvement of our method. A simple calibration scheme is developed to apply our method to future clinical trials. The number of calibration samples needed is estimated from existing studies and suggested for application to future studies. | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Sarah Haile , University of Pittsburgh TOPIC: Parametric Inference for the Cumulative Incidence Function | The cumulative incidence function is of great importance when analyzing data where competing risks are present. We present a new distribution for parametric inference on competing risks. As the cumulative incidence function is used to model a subset of events, it is logical to model them using a distribution which is improper. The 4-parameter Gompertz distribution proposed is very flexible and permits several different hazard shapes, including unimodal, and can be extended to include covariates. The model is applied to data from National Surgical Adjuvant Breast and Bowel Project breast cancer trial B-14. . | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jia Li, University of Pittsburgh TOPIC: Meta-analysis for identifying signature genes in the integration of multiple genomic studies | With the availability of tons of expression profiles, the need for meta-analyses to integrate different types of microarray data are obvious. For detection of differentially expressed genes, most of the current efforts are focused on comparing and evaluating gene lists obtained from each individual dataset. The statistical framework is often not rigorously formulated and a real sense of information integration is rarely performed. In this paper, we tackle two often asked biological questions: "Which genes are significant in one or more data sets?" and "Which genes are significant in all data sets?". We illustrated two statistical hypothesis settings and proposed a best weighted statistic and a maximum p-value statistic for the two questions, respectively. Permutation analysis is then applied to control the false discovery rate. The proposed test statistic is shown to be admissible. And we further show the advantage of our proposed test procedures over existing methods by power comparison, simulation study and real data analyses of a multiple-tissue energy metabolism mouse model data and prostate cancer data sets. | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Tao Song, University of Pittsburgh TOPIC: A GENERALIZED NONPARAMETRIC APPROACH OF COMPARING FROC SYSTEMS | ROC curve analysis is a widely used method of comparing diagnostic imaging systems. One formulation of the area under the ROC curve is based on the probability of selecting the abnormal subject from a random pair of normal - abnormal subjects. In a Free Response ROC (FROC) process, which requires searching and marking the locations of all suspected abnormalities with a level of suspicion (rating), normal subjects may have multiple false positives and abnormal subjects may have multiple true positives and false positives. We consider a general approach that uses as a summary index the area under an ROC curve derived from an FROC process. The method entails specifying a function that is used to select the abnormal subject from the normal-abnormal pair. A previously proposed index based on the highest rating on a subject can be viewed as a special case of this method. We consider various discriminating functions including average score and stochastic dominance. Simulation studies are conducted to compare the statistical power of these methods to distinguish between two FROC processes. | Return to top SEMINAR DATE: Thursday, February 28, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Rick Blakesley , University of Pittsburgh TOPIC: Considering P-Value Dependence in a Stepwise Multiplicity Adjustment Method | Multiple hypothesis testing with correlated outcomes has proven challenging. Nonparametric multiplicity adjustment methods that incorporate resampling have demonstrated type I error protection and good power, but implementation remains an obstacle. Parametric methods derived from the Bonferroni method have demonstrated power, but conservatively control type I error with increasing correlation coefficients between outcomes. In contrast, methods derived from the Sidak method incorporate correlation coefficients, though with unstable type I error protection. We propose a parametric method that combines and refines elements of existing methods to control type I error while considering correlation. These elements include the Sidak functional form and the Hochberg stepwise component. We also use a refined adjustment component, similar to the Dubey/Armitage-Parmar and R2 Adjustment methods, which incorporates a measure of dependence between the pvalues under the null hypothesis. We conducted a simulation study to estimate the type I error (familywise error) and power rates of the proposed method and ten existing methods across many combinations of simulation trial parameters, with the chosen rejection threshold a = 0.05. The proposed method demonstrated type I error between [0.047, 0.057] across the conditions explored, with power rates similar to the Hommel and step-down minP methods and exceeding all other methods with conservative type I error performance. While not proven to control type I error in a theoretical context, the proposed parametric method has corroborated, through simulation, the desired properties of a multiplicity adjustment method. | Return to top SEMINAR DATE: Thursday, March 6, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Peter F. Thall, PhD, Dept. of Biostatistics, Division of Quantitative Sciences, University of Texas, M.D. Anderson Cancer Center TOPIC: Patient-Specific Dose-Finding Based On Bivariate Outcomes and Covariates | A Bayesian method for covariate-adjusted (“individualized”) dose-finding based on a bivariate (efficacy, toxicity) outcome is presented. The method extends Thall and Cook (Biometrics 60:684-693, 2004). Implementation requires an informative prior on covariate effects, obtained from historical data or by elicitation. In the underlying probability model, dose and covariate main effects and dose-covariate interactions are included in the linear components of the marginal efficacy and toxicity outcome probabilities. For each of a representative set of covariate vectors, limits on the probabilities of efficacy and toxicity specified by the physician are used to construct bounding functions that are used to determine the acceptability of each dose for each possible covariate vector. The physician also must specify equally desirable target (efficacy, toxicity) probability pairs for a reference patient’s covariates to characterize trade-offs between the two outcomes. Each patient's dose is chosen to optimize the efficacy-toxicity trade-off for his/her specific covariates. Because the selected doses are covariate-specific and the method is sequentially outcome-adaptive, different patients may receive different doses at the same interim point in the trial, and some initially eligible patients may have no acceptable dose. The method is illustrated by application to a phase I/II trial in acute leukemia. | Return to top SEMINAR DATE: Thursday, March 27, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Hongwei Zhao, Sc.D., Associate Professor of Biostatistics, University of Rochester Medical Center, Department of Biostatistics and Computational Biology TOPIC: Regression Analysis of Mean Quality-Adjusted Lifetime with Censored Data
| In clinical trials of chronic diseases such as AIDS, cancer or cardiovascular diseases, it has been realized that it is not enough to consider only the overall survival time, the quality of life is also very important. The quality-adjusted lifetime (QAL) is a measure that combines both the quantity and quality of a patient's life time and thus has received more and more attention. Due to the induced informative censoring problem, the techniques that are commonly used for analyzing survival time are not valid anymore. We will propose a new method for studying the regression problem for the mean QAL when the data are subject to right censoring. We allow a very general form for the mean model as a function of covariates. The applicability of our method is demonstrated by both simulation experiments and a data example from a breast cancer clinical trial study. | Return to top SEMINAR DATE: Thursday, April 3, 2008 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Mitchell H. Gail, M.D., Ph.D., Senior Investigator, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute TOPIC: Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies | Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be “T-selected”, namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk. DP increases with genetic effect size and case-control sample size, and decreases with the number of non-disease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. Extensions of these methods show that multi-stage designs have appreciably lower DP than a one-stage design with the same number of cases and controls if the proportion of cases and controls in the first stage of the multistage design is less than 25%. | Return to top Seminar Speakers Fall Term 2007 Jin Wu, September 20, 2007 Heejung Bang, September 27, 2007 Lu Tian, October 11, 2007 Jason Connor, October 18, 2007 Dulal K. Bhaumik, November 1, 2007 Andre Rogatko , December 13, 2007 | SEMINAR DATE: Thursday, September 20, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jing Wu, Assistant Professor of Statistics, Department of Statistics, Purdue University TOPIC: COMPUTATION-BASED DISCOVERY OF CIS REGULATORY MODULES BY HIDDEN MARKOV MODEL | A key component in genome sequence analysis is the identification of regions of the genome that contain regulatory information. In higher eukaryotes, this information is organized into modular units called cis-regulatory modules. Each module contains multiple binding sites for a specific combination of several transcription factors. In this article, we propose a hidden Markov model (HMM) to identify transcription factor binding sites (TFBSs) and cis-regulatory modules (CRMs). For a given genomic sequence, we first select potential TFBSs from a large database (e.g., TRANSFAC), then construct an HMM where the TFBSs are only counted when they occur within a specialized CRM state. The novel features of the proposed method include that it does not assume a small set of TFBSs for a given gene, on the other hand, the method utilizes information from a large collection of well-characterized TFBSs and therefore is computationally more efficient and robust than the de novo methods. Our approach is applied to three data sets with experimentally evaluated TFBSs. The method shows better specificity and sensitivity than other similar computational tools in identifying CRMs and TFBSs. This is joint work with Dr. Jun Xie in the Department of Statistics at Purdue University. | Return to top | SEMINAR DATE: Thursday, September 27, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Heejung Bang, Assistant Professor of Public Health, Weill Cornell Medical College, Cornell University TOPIC: DE-MYSTIFYING MEDICAL COST ESTIMATORS: WHAT WE FOUND AFTER 10 YEARS | In clinical trials comparing different treatments and observational studies in health economics and outcomes research, medical costs are frequently collected and analyzed nowadays. Since Lin et al.'s (1997) first finding in the problem of applying standard analysis techniques such as sample mean and the Kaplan-Meir estimator to the censored cost data, many new methods have been proposed. In this talk, I will review valid methods for statistical estimation and inference that have been developed for last 10 years and show what Zhao, Bang, Wang and Pfeifer (2007) recently discovered, analytic relationships among several widely adopted medical cost estimators that are seemingly different. | Return to top SEMINAR DATE: Thursday, October 11, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Lu Tian, Assistant Professor, Department of Preventive Medicine, Northwestern University TOPIC: MODEL EVALUATION BASED ON THE SAMPLING DISTRIBUTION OF ESTIMATED ABSOLUTE PREDICTION ERROR | The construction of a reliable, practically useful prediction rule for future responses is heavily dependent on the ``adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for a binary outcome. We show that the prediction error can be consistently estimated via the re-substitution and cross validation methods even when the fitted model is not correctly specified. Furthermore, we show that the resulting estimators are asymptotically normal. When the prediction rule is ``unsmooth", the variance of the above normal distribution can be estimated well with a perturbation-resampling method. With real examples and an extensive simulation study, we demonstrate that the interval estimates obtained from the above normal approximation for the prediction errors provide much more information about model adequacy than their point estimate counterparts. | Return to top SEMINAR DATE: Thursday, October 18, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jason Connor, Statistical Scientist, Berry Consultants TOPIC: ETHICS & EXECUTION OF ADAPTIVE BAYESIAN CLINICAL TRIALS A UTERINE CANCER CASE STUDY | I describe a brief history of adaptive designs and then provide a case study in uterine cancer. This includes the ethics of adaptive randomization and statistical benefits of the Bayesian paradigm. In the case study I illustrate components of a trial we may select to adapt during the trial and show ways to present the novel designs to clinicians, IRBs, and regulatory agencies. I illustrate how adaptive designs often times provide shorter, less expensive trials in which a greater proportion of patients receive the most efficacious treatment. | Return to top | SEMINAR DATE: Thursday, November 1, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Dulal K. Bhaumik, Professor, Department of Psychiatry, Division of Epidemiology and Biostatistics, University of Illinois at Chicago TOPIC: SAMPLE SIZE DETERMINATION FOR HIERARCHICAL LONGITUDINAL DESIGNS WITH DIFFERENTIAL ATTRITION RATES | We consider the problem of sample size determination for three-level mixed-effects linear regression models for the analysis of clustered longitudinal data. Three-level designs are used in many areas, but in particular, multi-center randomized longitudinal clinical trials in medical or health-related research. In this case, level 1 represents measurement occasion, level 2 represents subject, and level 3 represents center. The model we consider involves random effects of the time trends at both the subject level and the center level. In the most common case, we have two random effects (constant and a single trend), at both subject and center levels. The approach presented here is general with respect to sampling proportions, number of groups, and attrition rates over time. We derive sample size requirements (i.e., power characteristics) for a test of treatment-by-time interaction(s) for designs based on either subject-level or cluster-level randomization. The general methodology is illustrated using two characteristic examples. | Return to top | SEMINAR DATE: Thursday, December 13, 2007 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: André Rogatko, Associate Director for Biostatistics Research and Informatics, Winship Cancer Institute, Professor, Department of Biostatistics, Rollins School of Public Health, Professor, Department of Hematology and Oncology, School of Medicine, Emory University TOPIC: INDIVIDUALIZED PATIENT DOSING IN CANCER CLINICAL TRIALS | We will discuss EWOC (Escalation with Over Dose Control), the first statistical method to directly incorporate formal safety constraints into the design of cancer phase I trials. The method controls the frequency of overdosing by selecting dose levels for use in the trial so that the predicted proportion of patients administered a dose exceeding the MTD(Maximum Tolerated Dose) is equal to a specified upper bound. We will also discuss an extension of EWOC that permits the utilization of information concerning individual patient differences in susceptibility to treatment. This is the first method described to design cancer clinical trials that not only guides dose escalation but also permits personalization of the dose level for each specific patient. The method adjusts doses according to patient-specific characteristics and allows the dose to be escalated as quickly as possible while safeguarding against overdosing. The extension of EWOC to covariate utilization was implemented in five FDA approved phase I studies that will be discussed. A new paradigm for drug development based on individual dosing will be proposed. | |  |