University of Pittsburgh
            Site Map | Find People
 
 

Welcome
Overview
FACULTY & STAFF
Faculty
Faculty Position(s)
Administrative Staff
ACADEMICS
Academic Programs
Requirements
Frequent Questions
Course Offerings
Seminars
    - Seminar Notices
Admission Procedures
Financial Aid
Statistical Genetics
STUDENTS & ALUMNI
Student Information
Alumni
Consulting Service
RESEARCH
Active Research
Funded Projects
Faculty Publications

 

BIOST 2025: Biostatistics Seminar Notices

Seminar Notices Fall Term 2005
Seminar Notices Spring Term 2006


Seminar Speakers
Spring Term 2006

Paul J. Rathouz, January 19, 2006
Lan Kong, February 2, 2006
Nicholas Lange , February 9, 2006
Matthias Schonlau, February 23, 2006
David Dunson, March 2, 2006
Feng-shou Ko, March 23, 2006
Jia Li, March 23, 2006
Li Qin, March 23, 2006
Hua Yun Chen, April 6, 2006


 

SEMINAR

DATE: Thursday, January 19, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Paul J. Rathouz, Associate Professor, Department of Health Studie,University of Chicago

TOPIC: Missing Covariate Data in Matched Case-Control Studies

We consider the problem of highly-stratified or matched studies with a binary outcome that are analyzed using conditional logistic regression
(CLR). We assume that data on some covariates are missing for some study participants and illustrate the problem with an example data set. Existing CLR methods for this problem involve either modeling the distribution of missing covariates or modeling the probability of data being missing. When the missingness process is modeled, a previously proposed method did not make use of data for those records with missing covariate data except in the model for the missingness. We extend this method, embedding it in a new class of estimators that use outcome and available covariate data for all study participants. We show that a particular member of this class always has better efficiency than the previously-proposed estimator. A simulation study compares these methods with respect to efficiency and robustness to model misspecification. We then present a variation on our method for the case of missingness due to drop-out in longitudinal data analyses with fixed effects models.

Time permitting, we consider the approach wherein the distribution of the missing covariate is modeled. The semiparametric efficient estimator of the regression parameters is identified, and a new estimator, which reduces dependence on the model for the missing covariate, is proposed.

Return to top


 

SEMINAR

DATE: Thursday, February 2, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Lan Kong, Assistant Professor, Department of Biostatistics, Graduate School of Public Healh, University of Pittsburgh

TOPIC: Case-Cohort Analysis with Accelerated Failure Time Model

In a case-cohort design, covariates are assembled only for a subcohort that is randomly selected from the entire cohort and any additional cases outside the subcohort. This design is appealing for large cohort studies of rare disease, especially when the exposures of interest are expensive to ascertain for all the subjects. We propose statistical methods for analyzing the case-cohort data with a semiparametric accelerated failure time model that interprets the covariates effects as to accelerate or decelerate the time to failure. Asymptotic properties of the proposed estimators are developed. The finite sample properties of case-cohort estimator and its relative efficiency to full cohort estimator are assessed via simulation studies. A real example from a study of cardiovascular disease is provided to illustrate the estimating procedure.

Return to top


 

SEMINAR

DATE: Thursday, February 9, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Nicholas Lange,Associate Professor,Department of Biostatistics,Harvard School of Public Health

TOPIC: Pediatric Brain Development and Bayesian Model Averaging

In this talk, I will first describe an ongoing, nationwide longitudinal MRI study of brain and behavioral development of approximately N = 500 healthy representative children aged 10 days to 19 years. I will then outline model selection strategies for numerous and potentially important covariates of interest in the absence of adequate neurodevelopmental theory or previous empirical evidence for guidance. Frequentist and Bayesian model averaging methods will be described, applied and their results compared. My emphases will be on the interpretability and generalizability of findings for effective communication with pediatric neurologists, psychologists and psychiatrists. No previous knowledge of human neuroanatomy, pediatric development or MRI will be necessary.

Return to top


 

SEMINAR

DATE: Thursday, February 23, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Matthias Schonlau, Statistician, Head RAND Statistical Consulting Service, RAND Corporation

TOPIC: The sinking of the Titanic: An application of Interactive Visualization of Data

On her maiden voyage in 1912 the Titanic hit an iceberg and sank. Due to outdated laws the Titanic did not carry enough lifeboats to evacuate all passengers. Following the social norm of the day "women and children" were evacuated first with remaining seats in the lifeboats going to men. Do the data support this characterization of events? Did it extend to 3rd class passengers? We will explore data related to the sinking of the Titanic interactively using the freely available software package Mondrian. Rather than using the data set that is available in R, we are using a different source and have also been able to get information about who went into which boat.

Return to top


 

SEMINAR

DATE: Thursday, March 2, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: David Dunson, Adjunct Associate Professor,Institute of Statistics and Decision Sciences, Duke University
& Senior Investigator, Biostatistics Branch, NIEHS

TOPIC: Bayesian Density Regression: Applications in Epidemiology

In biomedical studies, there is often interest in the relationship between one or more response variables and a set of predictors. Most regression models allow one aspect of the distribution, such as the mean or median, to hange
with predictors. However, in many cases, interest focuses on studying how
the distribution evolves across the predictor space, and there may be unanticipated changes in shape. In fact, in epidemiologic studies, such
changes in shape are a natural consequence of gene x environment interactions with unmeasured genetic factors. To address this problem, a nonparametric Bayesian approach is proposed, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of regression models, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. Theoretical properties are outlined, an approach is developed for efficient computation, and the methods are illustrated through application to genetic and epidemiologic studies.

Return to top

 

SEMINAR

DATE: Thursday, March 23, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Feng-shou Ko

TOPIC: Assessing the Effectiveness of Potential Longitudinal Biomarkers in Multivariate Survival Analysis

We develop an extension of a method proposed earlier by Henderson et al. (2002) which combines the analysis of longitudinal and time to event information. In our development of the joint likelihood function, we incorporated a frailty parameter into a semi-parametric survival model. To compare our method to the proposed by Henderson et al., we performed simulations to assess the power for detecting longitudinal biomarkers. In our simulations, three latent processes were generated similar to those used in their work. Our results were: 1) higher correlations between the longitudinal biomarker values and survival time functions were associated with higher values for the power of score test; 2) for equal sample sizes, the power of a score test for relatively large numbers of subjects and small numbers of time points was higher than relatively small numbers of subjects and large numbers of observed time points; and 3) the power associated with our method was somewhat higher than that in Henderson, et al. To further compare our method to that of Henderson et al., we analyzed the liver cirrhosis data set presented in their paper. Our results were similar to theirs. Both methods showed that prothrombin is an effective surrogate for survival in liver cirrhosis patients.

Return to top


 

SEMINAR

DATE: Thursday, March 23, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Jia Li, Biostatistics Graduate Student, University of Pittsburgh

TOPIC: A Comparison of Two Multiple Imputation Packages: MICE and MIX
In Survival Analysis With Missing Covariates

Estimating potential prognostic factors is a common and important issue when analyzing medical data, because the main goal of such studies is to distinguish effects of a set of covariates on an outcome, such as survival. However, missing data lead to the whole process complex. Multiple imputation method is a general method developed from a Bayesian perspective for handling missing-data problems. It has practical advantages of allowing standard complete-data methods of analysis to be used and reflecting uncertainty of the missing data in models. In this article, we review two easy-to-use packages MICE and MIX that implement the multiple imputation procedure. We compare performance of the packages with respect to bias, efficiency and robustness by using several simulation studies in survival settings and the National Surgical Adjuvant Breast and Bowel Project (NSABP) Protocol B-06. Practical limitations and valuable features of the packages are assessed in details as well.

Return to top


 

SEMINAR

DATE: Thursday, March 23, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Li Qin, Biostatistics Graduate Student, University of Pittsburgh

TOPIC: An Extension of Latent Variable Model for Informative Intermittent Missing Data

In longitudinal studies, subjects are followed over time, so missing data are a frequent problem. We propose a latent variable model for informative intermittent missingness which is an extension of Roy's (2003) latent dropout class model. In our model, the value of the latent variable is affected by the missing pattern and it is also as a covariate in modeling the longitudinal response. Using this approach, the latent variable links the longitudinal response and the missing process. In our model the latent variable is continuous instead of categorical and we assume that it is from a normal distribution with unity variance. To simplify the analysis for intermittent missing patterns, we define two variables: one for the dropout time, and the other for the number of missing time points before dropout. The EM algorithmis used to obtain the estimates of the parameter we are interested in and Gauss-Hermite quadrature is used to approximate the integration of the latent variable (Sammel, et al., 1997). The standard errors of the parameter estimates are obtained from the inverse of the Fisher information matrix of the final marginal likelihood. This method is illustrated using data from a pediatric obesity study on evaluating the effectiveness of family-based intervention. We use the generalized Pearson residuals to assess the fit of the model.

Return to top


 

SEMINAR

DATE: Thursday, April 6, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Hua Yun Chen, Ph.D., Associate Professor, Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago

TOPIC: Likelihood Robustification in Parametric Regression with Missing Data

In parametric regression with missing data, when either the covariates are missing with or without auxiliary information, or the outcome are missing with auxiliary information, additional semiparametric or parametric models are often required for carrying out the likelihood inferences. To protect the regression parameter estimator from misspecification of the covariate models or the misspecification of the missing data mechanism model in the alternative inverse missing probability weighted approach, Robins et al. proposed a doubly robust procedure for making inferences on the regression parameter. Their approach to finding the doubly robust estimating equations was based on the projection of the weighted estimating score. A general doubly robust estimator may have low efficiency and computing the best doubly robust estimator, the locally efficient estimator, can be very challenging even for missing data with the simplest missing pattern when Robins et al.'s approach is followed. We propose an alternative representation of the semiparametric efficient score. Computationally, likelihood robustification, an approximation to the locally semiparametric efficient estimator based on the proposed representation is relatively easy to obtain and the estimator obtained from the approximating score has the doubly robust property when data are missing at random. Asymptotic inference on the regression parameter based on the likelihood robustification estimator is also proposed. Simulation results show that estimates based on the likelihood robustification performs well with finite sample sizes.

Return to top


Seminar Speakers
Fall Term 2005

Kaifeng Lu, September 15, 2005
Yijian Huang, September 29, 2005
Cyrille Joutard, October 6, 2005
Xuelin Huang, October 20, 2005
Zhen Chen, November 3, 2005
Larry Wasserman, November 10, 2005
Hong Wang , December 8, 2005

SEMINAR

DATE: Thursday, September 15, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Kaifeng Lu, Biometrician, Merck Research Laboratories

TOPIC: Responder Analysis in Longitudinal Studies with Missing Data

In some longitudinal studies, researchers might be interested in estimating the treatment difference in response rate at the primary time point of interest, where the responder indicator is generated by dichotomizing an underlying continuous measurement at the primary time point of interest according to a prespecified threshold value. The underlying continuous measurement is often analyzed using a repeated measures model and the responder status analyzed using a logistic regression model. In this article, we discuss two approaches for the responder analysis in the presence of missing data on the continuous underlying continuous measurement as a result of patient skipping visits or patient dropping out before normal completion of the study. One approach is to impute missing data based on the repeated measures model for the underlying continuous measurement and fit the logistic regression model on the observed or otherwise imputed responder status. The other approach is based on an estimating equation that does not impute missing data. Large sample properties of the resulting estimators are derived and simulation studies conducted to assess the performance of the proposed estimators.

Return to top


SEMINAR

DATE: Thursday, September 29, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Yijian (Eugene) Huang, PhD Associate Professor,Department of Biostatistics,Emory University

TOPIC: Errors-in-Covariates Effect on Estimating Functions: Additivity in Limit and Nonparametric Correction

We consider Poisson, logistic, and Cox regressions when some covariates are not accurately ascertainable but contaminated with additive errors. Huang and Wang (1999, Technical report; 2000, JASA; 2001, JASA) showed that the slope parameters can be consistently estimated via nonparametric correction, without imposing distributional assumptions on both the underlying true covariates and the errors. However, certain instrumental variables, particularly replicated error-contaminated covariates, are required. In this talk, we reveal that the error effect is additive in the limit on some properly formulated estimating functions. This finding gives rise to a new nonparametric correction technique that accommodates a broad variety of internal and external error-assessment data. Simulations for Cox regression with external reliability data are conducted, and the application to an AIDS study is presented as an illustration. This talk is based on joint work with C. Y. Wang.

Return to top


SEMINAR

DATE: Thursday, October 6, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Cyrille Joutard, Post-doctoral Fellow, Department of Statistics, Carnegie Mellon University

TOPIC: Grade of membership analysis of a subset of functional disability data

The rapid growth of the elderly population in the United States has given rise to an increasing interest in the assessment of chronically disabled elderly. The National Long Term Care Survey (NLTCS) is a longitudinal survey of the U.S. population aged 65 and older with waves conducted in 1982, 1984, 1989, 1994, 1999 and 2004. We deal with a subset of 16 functional disability measures which was extracted from the analytic file of the NLTCS for the 1982, 1984, 1989 and 1994 waves by Erosheva (2002). The disability measures consist of measures of activities of daily living (ADL) which include basic activities of hygiene and personal care, and measures of instrumental activities of daily living (IADL) which include basic activities necessary to reside in the community. For each ADL/IADL measure, individuals are classified as being either healthy or disabled on that measure. We used the grade of membership (GoM) model to analyze these data.

The GoM model assumes a hierarchical latent structure for mixed membership in a set of extreme profiles (clusters).

This paper focuses on problems of model selection. We carry out the GoM analysis using MCMC methods to obtain the posterior distribution of the model parameters and use different criteria for selecting an optimal number of extreme profiles. Finally, we study variational approximation methods to estimate the model parameters and we compare the model selection results with the ones obtained using MCMC methods.

Return to top


SEMINAR

DATE: Thursday, October 20, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Xuelin Huang, Assistant Professor,Biostatistics & Applied MathUniversity of Texas MD Anderson Cancer Center

TOPIC: Optimization of Multi-Stage Treatment Sequences for Recurrent Diseases

In the treatment of cancer, patients commonly receive a variety of sequential treatments. The initial treatments administered following diagnosis can vary, as well as subsequent salvage regimens given after disease recurrence. Since the initiation and choice of salvage treatments depend on the progress of the disease, the Cox proportional hazard model with time-dependent covariates cannot be used to make causal inference for the effects of initial and salvage treatments on survival. Besides the difficulty in the estimation of the effect of multi-stage treatment sequence, there is a further problem, that is, how to identify the optimal treatment strategies using sufficiently rich but non-randomized data. The work by Tsiatis, Robins, Murphy and colleagues are extended to solve this problem in the situation of recurrent diseases. Simulation studies are conducted to evaluate the methods. The methods are illustrated by a retrospective study of soft tissue sarcoma, which motivated this research. Various dynamic and non-dynamic treatment strategies are evaluated and compared.

Return to top


SEMINAR

DATE: Thursday, November 3, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Zhen Chen, Assistant Professor,Department of Biostatistics & Clinical Epidemiology,University of Pennsylvania

TOPIC: Modelling Developmental Toxicity Endpoints: A Causal Inference Approach

In a typical Segment II developmental toxicity study, pregnant dams are
randomized into groups for treatment with a toxin during organogenesis. The pregnant dams are sacrificed near term and the fetuses are examined for a variety of developmental endpoints. Current approaches to analyzing data from these Segment II studies take the observed outcome differences in live fetuses between dose groups as the effect of toxin. The estimates from these practices usually lack causal interpretations for two reasons. First, fetuses that survive to birth at a high dose level of the toxin might be more robust on average than those at a low dose level. As such, the set of fetuses that would survive to term in a high dose group is likely not identical to the set of those who would survive to birth under a low dose group. Second, litter size (the number of fetuses that develop and survive until birth) plays a mediating role in data from Segment II studies, since it is affected by the toxin on one hand and affects the developmental endpoints on the other. As a result, failing to control litter size likely produces bias.

In this paper, we use the potential outcome framework to estimate the
causal effect of a toxic agent on developmental endpoints. We define the causal toxic effect as the difference between what the outcomes would have been for a fetus had the dam in which the fetus develops been exposed to dose level Z = z* rather than dose level Z = z. In particular, we employ the Frangakis and Rubin (2002) principal stratification approach and construct principal strata that are a function of survival status of the fetuses. We propose various estimands of the effect and consider their identifications and interpretations. We also accommodate correlations within a dam of fetuses for both the outcomes and principal strata by using random effects.
Bayesian procedures are developed to estimate the model and to make inference. We illustrate the proposed methodology through application to data from a developmental toxicity study of ethylene glycol in mice conducted by the National Toxicology Program.

This is a joint work with M. R. Elliott and M. M. Joffe.

Return to top


SEMINAR

DATE: Thursday, November 10, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Larry Wasserman, Professor, Department of Statistics, Carnegie Mellon University

TOPIC: Rodeo: Sparse Nonparametric Regression in High Dimensions

I will present a method for simultaneously performing bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, the approach avoids the curse of dimensionality. The method---called rodeo (regularization of derivative expectation operator)---conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces testing with soft thresholding may be viewed as solving a sequence of lasso problems. When applied in one dimension, the rodeo yields a method for choosing the locally optimal bandwidth.

Joint work with John Lafferty, Computer Science Department, Carnegie
Mellon.

Return to top


SEMINAR

DATE: Thursday, December 8, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Hong Wang, Research Assistant Professor, Biostatistics, University of Pittsburgh Cancer Institute

TOPIC: Analysis of Smad3 Cell Migration

In this collaborative study, Smad3 cell growth and migration were tracked in wells with the cell culture imaging system in the Department of Radiation Oncology. Database was created from images and queried to obtain cell growth and migration information. Stationary objects were removed from wells based on the distribution of their location. Large objects were eliminated based on the distribution of object area at early scans. The cumulative distribution function of velocity was examined. Fixed effects ANOVA models and multiple comparison procedures were used to compare cell velocity at different times. Several multi-level mixed models were also considered. Three Smad3 cell lines (+/+, -/-, and -/-3) were studied and compared.

Return to top

© 2001-2005
Dept. of Biostatistics, University of Pittsburgh

Program Contact:
Registrar, biostat@pitt.edu

Webmaster:
Susan Grasky, BSIS


Home | Graduate School of Public Health Home | Univ. of Pittsburgh Home | Top of Page |
Overview | Faculty | Faculty Position(s) | Administrative Staff | Academic Programs |
Requirements | Frequent Questions | Course Offerings | Seminars | Admission Procedures | Financial Aid |
Statistical Genetics | Student Information | Alumni | Consulting Service |
Active Research | Funded Projects | Faculty Publications


Department of Biostatistics, 130 Desoto Street, 311 Parran Hall,
Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261
Phone: (412) 624-3022 Fax: (412) 624-2183

Revised on March 17, 2006