| | BIOST 2025: Biostatistics Seminar Notices Seminar Notices Fall Term 2005 Seminar Notices Spring Term 2006 Seminar Speakers Spring Term 2006 Paul J. Rathouz, January 19, 2006 Lan Kong, February 2, 2006 Nicholas Lange , February 9, 2006 Matthias Schonlau, February 23, 2006 David Dunson, March 2, 2006 Feng-shou Ko, March 23, 2006 Jia Li, March 23, 2006 Li Qin, March 23, 2006 Hua Yun Chen, April 6, 2006 | SEMINAR DATE: Thursday, January 19, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Paul J. Rathouz, Associate Professor, Department of Health Studie,University of Chicago TOPIC: Missing Covariate Data in Matched Case-Control Studies | We consider the problem of highly-stratified or matched studies with a binary outcome that are analyzed using conditional logistic regression (CLR). We assume that data on some covariates are missing for some study participants and illustrate the problem with an example data set. Existing CLR methods for this problem involve either modeling the distribution of missing covariates or modeling the probability of data being missing. When the missingness process is modeled, a previously proposed method did not make use of data for those records with missing covariate data except in the model for the missingness. We extend this method, embedding it in a new class of estimators that use outcome and available covariate data for all study participants. We show that a particular member of this class always has better efficiency than the previously-proposed estimator. A simulation study compares these methods with respect to efficiency and robustness to model misspecification. We then present a variation on our method for the case of missingness due to drop-out in longitudinal data analyses with fixed effects models. Time permitting, we consider the approach wherein the distribution of the missing covariate is modeled. The semiparametric efficient estimator of the regression parameters is identified, and a new estimator, which reduces dependence on the model for the missing covariate, is proposed. | Return to top | SEMINAR DATE: Thursday, February 2, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Lan Kong, Assistant Professor, Department of Biostatistics, Graduate School of Public Healh, University of Pittsburgh TOPIC: Case-Cohort Analysis with Accelerated Failure Time Model | | In a case-cohort design, covariates are assembled only for a subcohort that is randomly selected from the entire cohort and any additional cases outside the subcohort. This design is appealing for large cohort studies of rare disease, especially when the exposures of interest are expensive to ascertain for all the subjects. We propose statistical methods for analyzing the case-cohort data with a semiparametric accelerated failure time model that interprets the covariates effects as to accelerate or decelerate the time to failure. Asymptotic properties of the proposed estimators are developed. The finite sample properties of case-cohort estimator and its relative efficiency to full cohort estimator are assessed via simulation studies. A real example from a study of cardiovascular disease is provided to illustrate the estimating procedure. | Return to top | SEMINAR DATE: Thursday, February 9, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Nicholas Lange,Associate Professor,Department of Biostatistics,Harvard School of Public Health TOPIC: Pediatric Brain Development and Bayesian Model Averaging | | In this talk, I will first describe an ongoing, nationwide longitudinal MRI study of brain and behavioral development of approximately N = 500 healthy representative children aged 10 days to 19 years. I will then outline model selection strategies for numerous and potentially important covariates of interest in the absence of adequate neurodevelopmental theory or previous empirical evidence for guidance. Frequentist and Bayesian model averaging methods will be described, applied and their results compared. My emphases will be on the interpretability and generalizability of findings for effective communication with pediatric neurologists, psychologists and psychiatrists. No previous knowledge of human neuroanatomy, pediatric development or MRI will be necessary. | Return to top | SEMINAR DATE: Thursday, February 23, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Matthias Schonlau, Statistician, Head RAND Statistical Consulting Service, RAND Corporation TOPIC: The sinking of the Titanic: An application of Interactive Visualization of Data | | On her maiden voyage in 1912 the Titanic hit an iceberg and sank. Due to outdated laws the Titanic did not carry enough lifeboats to evacuate all passengers. Following the social norm of the day "women and children" were evacuated first with remaining seats in the lifeboats going to men. Do the data support this characterization of events? Did it extend to 3rd class passengers? We will explore data related to the sinking of the Titanic interactively using the freely available software package Mondrian. Rather than using the data set that is available in R, we are using a different source and have also been able to get information about who went into which boat. | Return to top | SEMINAR DATE: Thursday, March 2, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: David Dunson, Adjunct Associate Professor,Institute of Statistics and Decision Sciences, Duke University & Senior Investigator, Biostatistics Branch, NIEHS TOPIC: Bayesian Density Regression: Applications in Epidemiology | | In biomedical studies, there is often interest in the relationship between one or more response variables and a set of predictors. Most regression models allow one aspect of the distribution, such as the mean or median, to hange with predictors. However, in many cases, interest focuses on studying how the distribution evolves across the predictor space, and there may be unanticipated changes in shape. In fact, in epidemiologic studies, such changes in shape are a natural consequence of gene x environment interactions with unmeasured genetic factors. To address this problem, a nonparametric Bayesian approach is proposed, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of regression models, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. Theoretical properties are outlined, an approach is developed for efficient computation, and the methods are illustrated through application to genetic and epidemiologic studies. | Return to top | SEMINAR DATE: Thursday, March 23, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Feng-shou Ko TOPIC: Assessing the Effectiveness of Potential Longitudinal Biomarkers in Multivariate Survival Analysis | | We develop an extension of a method proposed earlier by Henderson et al. (2002) which combines the analysis of longitudinal and time to event information. In our development of the joint likelihood function, we incorporated a frailty parameter into a semi-parametric survival model. To compare our method to the proposed by Henderson et al., we performed simulations to assess the power for detecting longitudinal biomarkers. In our simulations, three latent processes were generated similar to those used in their work. Our results were: 1) higher correlations between the longitudinal biomarker values and survival time functions were associated with higher values for the power of score test; 2) for equal sample sizes, the power of a score test for relatively large numbers of subjects and small numbers of time points was higher than relatively small numbers of subjects and large numbers of observed time points; and 3) the power associated with our method was somewhat higher than that in Henderson, et al. To further compare our method to that of Henderson et al., we analyzed the liver cirrhosis data set presented in their paper. Our results were similar to theirs. Both methods showed that prothrombin is an effective surrogate for survival in liver cirrhosis patients. | Return to top | SEMINAR DATE: Thursday, March 23, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jia Li, Biostatistics Graduate Student, University of Pittsburgh TOPIC: A Comparison of Two Multiple Imputation Packages: MICE and MIX In Survival Analysis With Missing Covariates | | Estimating potential prognostic factors is a common and important issue when analyzing medical data, because the main goal of such studies is to distinguish effects of a set of covariates on an outcome, such as survival. However, missing data lead to the whole process complex. Multiple imputation method is a general method developed from a Bayesian perspective for handling missing-data problems. It has practical advantages of allowing standard complete-data methods of analysis to be used and reflecting uncertainty of the missing data in models. In this article, we review two easy-to-use packages MICE and MIX that implement the multiple imputation procedure. We compare performance of the packages with respect to bias, efficiency and robustness by using several simulation studies in survival settings and the National Surgical Adjuvant Breast and Bowel Project (NSABP) Protocol B-06. Practical limitations and valuable features of the packages are assessed in details as well. | Return to top | SEMINAR DATE: Thursday, March 23, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Li Qin, Biostatistics Graduate Student, University of Pittsburgh TOPIC: An Extension of Latent Variable Model for Informative Intermittent Missing Data | In longitudinal studies, subjects are followed over time, so missing data are a frequent problem. We propose a latent variable model for informative intermittent missingness which is an extension of Roy's (2003) latent dropout class model. In our model, the value of the latent variable is affected by the missing pattern and it is also as a covariate in modeling the longitudinal response. Using this approach, the latent variable links the longitudinal response and the missing process. In our model the latent variable is continuous instead of categorical and we assume that it is from a normal distribution with unity variance. To simplify the analysis for intermittent missing patterns, we define two variables: one for the dropout time, and the other for the number of missing time points before dropout. The EM algorithmis used to obtain the estimates of the parameter we are interested in and Gauss-Hermite quadrature is used to approximate the integration of the latent variable (Sammel, et al., 1997). The standard errors of the parameter estimates are obtained from the inverse of the Fisher information matrix of the final marginal likelihood. This method is illustrated using data from a pediatric obesity study on evaluating the effectiveness of family-based intervention. We use the generalized Pearson residuals to assess the fit of the model. | Return to top | SEMINAR DATE: Thursday, April 6, 2006 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Hua Yun Chen, Ph.D., Associate Professor, Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago TOPIC: Likelihood Robustification in Parametric Regression with Missing Data | | In parametric regression with missing data, when either the covariates are missing with or without auxiliary information, or the outcome are missing with auxiliary information, additional semiparametric or parametric models are often required for carrying out the likelihood inferences. To protect the regression parameter estimator from misspecification of the covariate models or the misspecification of the missing data mechanism model in the alternative inverse missing probability weighted approach, Robins et al. proposed a doubly robust procedure for making inferences on the regression parameter. Their approach to finding the doubly robust estimating equations was based on the projection of the weighted estimating score. A general doubly robust estimator may have low efficiency and computing the best doubly robust estimator, the locally efficient estimator, can be very challenging even for missing data with the simplest missing pattern when Robins et al.'s approach is followed. We propose an alternative representation of the semiparametric efficient score. Computationally, likelihood robustification, an approximation to the locally semiparametric efficient estimator based on the proposed representation is relatively easy to obtain and the estimator obtained from the approximating score has the doubly robust property when data are missing at random. Asymptotic inference on the regression parameter based on the likelihood robustification estimator is also proposed. Simulation results show that estimates based on the likelihood robustification performs well with finite sample sizes. | Return to top Seminar Speakers Fall Term 2005 Kaifeng Lu, September 15, 2005 Yijian Huang, September 29, 2005 Cyrille Joutard, October 6, 2005 Xuelin Huang, October 20, 2005 Zhen Chen, November 3, 2005 Larry Wasserman, November 10, 2005 Hong Wang , December 8, 2005 | SEMINAR DATE: Thursday, September 15, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Kaifeng Lu, Biometrician, Merck Research Laboratories TOPIC: Responder Analysis in Longitudinal Studies with Missing Data | | In some longitudinal studies, researchers might be interested in estimating the treatment difference in response rate at the primary time point of interest, where the responder indicator is generated by dichotomizing an underlying continuous measurement at the primary time point of interest according to a prespecified threshold value. The underlying continuous measurement is often analyzed using a repeated measures model and the responder status analyzed using a logistic regression model. In this article, we discuss two approaches for the responder analysis in the presence of missing data on the continuous underlying continuous measurement as a result of patient skipping visits or patient dropping out before normal completion of the study. One approach is to impute missing data based on the repeated measures model for the underlying continuous measurement and fit the logistic regression model on the observed or otherwise imputed responder status. The other approach is based on an estimating equation that does not impute missing data. Large sample properties of the resulting estimators are derived and simulation studies conducted to assess the performance of the proposed estimators. | Return to top | SEMINAR DATE: Thursday, September 29, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Yijian (Eugene) Huang, PhD Associate Professor,Department of Biostatistics,Emory University TOPIC: Errors-in-Covariates Effect on Estimating Functions: Additivity in Limit and Nonparametric Correction | | We consider Poisson, logistic, and Cox regressions when some covariates are not accurately ascertainable but contaminated with additive errors. Huang and Wang (1999, Technical report; 2000, JASA; 2001, JASA) showed that the slope parameters can be consistently estimated via nonparametric correction, without imposing distributional assumptions on both the underlying true covariates and the errors. However, certain instrumental variables, particularly replicated error-contaminated covariates, are required. In this talk, we reveal that the error effect is additive in the limit on some properly formulated estimating functions. This finding gives rise to a new nonparametric correction technique that accommodates a broad variety of internal and external error-assessment data. Simulations for Cox regression with external reliability data are conducted, and the application to an AIDS study is presented as an illustration. This talk is based on joint work with C. Y. Wang. | Return to top | SEMINAR DATE: Thursday, October 6, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Cyrille Joutard, Post-doctoral Fellow, Department of Statistics, Carnegie Mellon University TOPIC: Grade of membership analysis of a subset of functional disability data | The rapid growth of the elderly population in the United States has given rise to an increasing interest in the assessment of chronically disabled elderly. The National Long Term Care Survey (NLTCS) is a longitudinal survey of the U.S. population aged 65 and older with waves conducted in 1982, 1984, 1989, 1994, 1999 and 2004. We deal with a subset of 16 functional disability measures which was extracted from the analytic file of the NLTCS for the 1982, 1984, 1989 and 1994 waves by Erosheva (2002). The disability measures consist of measures of activities of daily living (ADL) which include basic activities of hygiene and personal care, and measures of instrumental activities of daily living (IADL) which include basic activities necessary to reside in the community. For each ADL/IADL measure, individuals are classified as being either healthy or disabled on that measure. We used the grade of membership (GoM) model to analyze these data. The GoM model assumes a hierarchical latent structure for mixed membership in a set of extreme profiles (clusters). This paper focuses on problems of model selection. We carry out the GoM analysis using MCMC methods to obtain the posterior distribution of the model parameters and use different criteria for selecting an optimal number of extreme profiles. Finally, we study variational approximation methods to estimate the model parameters and we compare the model selection results with the ones obtained using MCMC methods. | Return to top | SEMINAR DATE: Thursday, October 20, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Xuelin Huang, Assistant Professor,Biostatistics & Applied MathUniversity of Texas MD Anderson Cancer Center TOPIC: Optimization of Multi-Stage Treatment Sequences for Recurrent Diseases | In the treatment of cancer, patients commonly receive a variety of sequential treatments. The initial treatments administered following diagnosis can vary, as well as subsequent salvage regimens given after disease recurrence. Since the initiation and choice of salvage treatments depend on the progress of the disease, the Cox proportional hazard model with time-dependent covariates cannot be used to make causal inference for the effects of initial and salvage treatments on survival. Besides the difficulty in the estimation of the effect of multi-stage treatment sequence, there is a further problem, that is, how to identify the optimal treatment strategies using sufficiently rich but non-randomized data. The work by Tsiatis, Robins, Murphy and colleagues are extended to solve this problem in the situation of recurrent diseases. Simulation studies are conducted to evaluate the methods. The methods are illustrated by a retrospective study of soft tissue sarcoma, which motivated this research. Various dynamic and non-dynamic treatment strategies are evaluated and compared. | Return to top | SEMINAR DATE: Thursday, November 3, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Zhen Chen, Assistant Professor,Department of Biostatistics & Clinical Epidemiology,University of Pennsylvania TOPIC: Modelling Developmental Toxicity Endpoints: A Causal Inference Approach | In a typical Segment II developmental toxicity study, pregnant dams are randomized into groups for treatment with a toxin during organogenesis. The pregnant dams are sacrificed near term and the fetuses are examined for a variety of developmental endpoints. Current approaches to analyzing data from these Segment II studies take the observed outcome differences in live fetuses between dose groups as the effect of toxin. The estimates from these practices usually lack causal interpretations for two reasons. First, fetuses that survive to birth at a high dose level of the toxin might be more robust on average than those at a low dose level. As such, the set of fetuses that would survive to term in a high dose group is likely not identical to the set of those who would survive to birth under a low dose group. Second, litter size (the number of fetuses that develop and survive until birth) plays a mediating role in data from Segment II studies, since it is affected by the toxin on one hand and affects the developmental endpoints on the other. As a result, failing to control litter size likely produces bias. In this paper, we use the potential outcome framework to estimate the causal effect of a toxic agent on developmental endpoints. We define the causal toxic effect as the difference between what the outcomes would have been for a fetus had the dam in which the fetus develops been exposed to dose level Z = z* rather than dose level Z = z. In particular, we employ the Frangakis and Rubin (2002) principal stratification approach and construct principal strata that are a function of survival status of the fetuses. We propose various estimands of the effect and consider their identifications and interpretations. We also accommodate correlations within a dam of fetuses for both the outcomes and principal strata by using random effects. Bayesian procedures are developed to estimate the model and to make inference. We illustrate the proposed methodology through application to data from a developmental toxicity study of ethylene glycol in mice conducted by the National Toxicology Program. This is a joint work with M. R. Elliott and M. M. Joffe. | Return to top | SEMINAR DATE: Thursday, November 10, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Larry Wasserman, Professor, Department of Statistics, Carnegie Mellon University TOPIC: Rodeo: Sparse Nonparametric Regression in High Dimensions | I will present a method for simultaneously performing bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, the approach avoids the curse of dimensionality. The method---called rodeo (regularization of derivative expectation operator)---conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces testing with soft thresholding may be viewed as solving a sequence of lasso problems. When applied in one dimension, the rodeo yields a method for choosing the locally optimal bandwidth. Joint work with John Lafferty, Computer Science Department, Carnegie Mellon. | Return to top | SEMINAR DATE: Thursday, December 8, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Hong Wang, Research Assistant Professor, Biostatistics, University of Pittsburgh Cancer Institute TOPIC: Analysis of Smad3 Cell Migration | In this collaborative study, Smad3 cell growth and migration were tracked in wells with the cell culture imaging system in the Department of Radiation Oncology. Database was created from images and queried to obtain cell growth and migration information. Stationary objects were removed from wells based on the distribution of their location. Large objects were eliminated based on the distribution of object area at early scans. The cumulative distribution function of velocity was examined. Fixed effects ANOVA models and multiple comparison procedures were used to compare cell velocity at different times. Several multi-level mixed models were also considered. Three Smad3 cell lines (+/+, -/-, and -/-3) were studied and compared. | Return to top |  |