| | BIOST 2025: Biostatistics Seminar Notices Seminar Notices Spring Term 2005 Seminar Notices Fall Term 2004 Seminar Speakers Spring Term 2005 | SEMINAR DATE: Thursday, January 13, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Gong Tang, Department of Biostatistics, University of Pittsburgh TOPIC: A pseudolikelihood method for analysis of multivariate data with nonignorable nonresponse | | We consider multivariate regression analysis with missing data in the outcome variables, when the nonresponse mechanism depends on the underlying values of the responses and hence is nonignorable. Related problems include response-biased sampling where data are sampled with probability depending only on the univariate response. Standard approaches require specification of the missing-data mechanism and misspecification often leads to biased estimates. If the marginal distribution of the covariates is known, all the regression parameters can be identified and estimated via a conditional likelihood method. Otherwise, a pseudolikelihood method resulted from substituting a parametric or empirical estimator of the marginal distribution of the covariates into the conditional likelihood can be applied. These methods yield consistent and asymptotically normal estimates and are robust that specification of the missing-data mechanism is not necessary. This pseudolikehood method is extended to analysis of multivariate missing data with a monotone pattern and in turn to analysis of longitudinal data with nonignorable dropouts where the dropout mechanism depends on the underlying values of the repeatedly-measured outcomes. These methods are illustrated via simulation studies and analysis of a schizophrenia trial dataset. | Return to top | SEMINAR DATE: Thursday, February 3, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Guy Brock, Department of Human Genetics University of Pittsburgh TOPIC: Estimating significance levels in NPL scores for large pedigrees | | The significance level of a non-parametric linkage statistic is often found by simulation since the distribution of the test statistic is complex and unknown. Ideally, simulation occurs by assigning founder genotypes and then `dropping’ genotypes through the pedigree. This situation mimics the actual pedigree data, where IBD sharing information is not known with certainty. However, this approach is usually computationally infeasible for larger pedigrees which require MCMC techniques to calculate the statistic, as an additional MCMC run is required to estimate the statistic for each gene-drop. One alternative is to drop inheritance vectors rather thangenotypes, but this assumes that IBD sharing is known unambiguously and results in an overestimate of the variance and therefore a conservativesignificance level. In this work, we propose a novel method to estimate the significance level of the statistic. This is accomplished by estimating the Markov chain variability using the first gene drop, and then estimating the variability due to gene-dropping by running shorter MCMC chains on the additional simulated datasets. The two estimates are combined to form an overall estimate of the NPL statistic variability. | Return to top | SEMINAR DATE: Thursday, February 17, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Debashis Ghosh, Assistant Professor, Department of Biostatistics, University of Michigan TOPIC: Statistical Methods for Genomic Localization with Microarray Data | | In many disease and scientific settings, understanding the spatial distribution of transcription is quite important. In a variety of settings, ranging from yeast biology to cancer, the role of the location of transcriptional events is important. Developing such methods requires incorporating various biological features, such as variable gene density. In this talk, we discuss methods for the analysis of gene expression as a function of chromosomal position. Developing this methodology requires combining microarray data information with that on genomic location. The methods are illustrated with data from an ovarian cancer study. This is joint work with Al Levin, Kathy Cho and Sharon Kardia. | Return to top | SEMINAR DATE: Thursday,March 3, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Wei Pan, Division of Biostatistics, School of Public Health, Universityof Minnesota TOPIC: Incorporating prior information via shrinkage: a combined analysis of genome-wide location data and gene expression data | | Transcriptional control is a critical step in regulation of gene expression. Understanding such a control on a genomic level involves deciphering the mechanisms and structures of regulatory programs and networks. A difficulty arises due to the weak signal and high noise in various sources of data while most current approaches are limited to analysis of a single source of data. A natural alternative is to improve statistical efficiency and power by a combined analysis of multiple sources of data. Here we propose a shrinkage method to combine genome-wide location data and gene expression data to detect the binding sites or target genes of a transcription factor. Specifically, a prior ``non-target'' gene list is generated by analyzing the expression data, and then this information is incorporated into the subsequent binding data analysis via a shrinkage method. There is a Bayesian justification for this shrinkage method. Both simulated and real data were used to evaluate the proposed method. In simulation studies, the proposed method gives higher sensitivity and lower false discovery rate (FDR) in detecting the target genes. In real data example, proposed method can reduce the estimated FDR and increase the power to detect the previously known target genes of a broad transcription regulator, leucine responsive regulatory protein (Lrp) in Escherichia coli. This method can also be used to incorporate other information to microarray data analysis, such as using Gene Ontology (GO) information, to detect differentially expressed genes. This is joint work with Yang Xie, Keyong S. Jeong and Arkady Khodursky. | Return to top | SEMINAR ENAR STUDENT PRESENTATION DATE: Thursday, March 17, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Andriy Bandos, Department of Biostatistics, University of Pittsburgh TOPIC: A Permutation Test Senistive To Differences in Areas for Comparing ROC Curves from a Paired Design | | The Area Under the ROC Curve (AUC) is a widely accepted summary index of the overall performance of diagnostic procedures and the difference between AUCs is often used when comparing two diagnostic systems. We developed an exact non-parametric statistical procedure for comparing two ROC Curves in paired design settings. The test which is based on all permutations of the subject specific rank ratings is formally a test for equality of ROC curves that is sensitive to the alternatives of AUC difference. The operating characteristics of the proposed test were evaluated using extensive simulations over a wide range of parameters. The proposed procedure can be easily implemented in experimental ROC datasets. For small samples and for underlying parameters that are ommon in experimental studies in diagnostic imaging the test possesses good operating characteristics and is more powerful than the conventional non-parametric procedure for AUC comparisons. We also derived an asymptotic version of the test which uses an exact estimate of the variance in the permutation space and provides a good approximation even when the sample sizes are small. This asymptotic procedure is a simple and precise approximation to the exact test and is useful for large sample sizes where the exact test may be computationally burdensome. KEY WORDS: Non-parametric procedure; Permutation test; Receiver operating characteristic (ROC) curve; Paired design. | Return to top | SEMINAR ENAR STUDENT PRESENTATION DATE: Thursday,March 17, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Gina D'Angelo, Department of Biostatistics, University of Pittsburgh TOPIC: Likelihood-Based Approach for Truncated Covariates | | Truncated and censored data methodology has been developed for the last 30 years with the focus on the outcome variable. This has lead to the development of models such as the Tobit regression model for truncated data in addition to numerous models for censored data. Another commonly encountered problem is that of truncated covariate data. This type of data is generally observed in the laboratory setting, where the lower limit of detection of an assay is often observed. To address this problem, we propose a method to estimate the coefficients and their standard errors for a regression model with a left truncated covariate using a likelihood-based technique. This method will be compared to a standard method of filling in the truncated values with the lower threshold value. The application of this method is illustrated in a sepsis study conducted at the University of Pittsburgh. One aim of this study is to determine the relationship between severe sepsis status and measures of inflammation such as interleukin-6 and interleukin-10. | Return to top | SEMINAR ENAR STUDENT PRESENTATION DATE: Thursday, March 18, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Li Qin, Department of Biostatistics, University of Pittsburgh TOPIC: A Latent Class Model for Longitudinal Binary Response Data with Nonignorable Missingness | | Nonignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. Roy (2004) proposed a latent class model for continuous data, in which the classes are related to the time of dropouts. In our study, we extend this approach to categorical data, dividing the observed data into two latent classes; a special class in which subjects definitely have '0' outcomes and a second one in which the outcomes can be modeled using logistic regression. In latent class models, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Thus the longitudinal responses and the missingness process are independent given the latent classes. The latent class model is also a special case of a pattern mixture model. Parameters are estimated by the method of maximum likelihood based on the above assumption and correlation between responses (le Cessie and van Houwelingen, 1994). This methodology is illustrated with a data set of weight concern in a smoking cessation study for women. For this study we compare the proposed method with a mixed effects model (Ten Have, et al., 1998) and weighted GEE (Robins et al., 1995). The results show that our method and Ten Have's model are similar and differ from the weighted GEE model. Although the results obtained using the proposed method and Ten Have's model are similar, our method is simpler to implement and can also be used for intermittent missing data. | Return to top | SEMINAR ENAR STUDENT PRESENTATION DATE: Thursday, March 17, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Dianxu Ren, Department of Biostatistics, University of Pittsburgh TOPIC: A Bayesian Adjustment for covariate misclassification with correlated binary outcome data | | Estimates of association between an outcome variable and misclassified covariates tend to be biased when the usual methods of estimation that ignore the classification error are applied. Available methods to account for misclassification often require the use of a gold standard (i.e, a validation subsample). But in practice, a gold standard may be unavailable or impractical. We propose a Bayesian approach to adjust for misclassification in a binary covariate in logistic and random effect logistic models when a gold standard is not available. This Markov Chain Monte Carlo (MCMC) approach uses two imperfect measures of a dichotomous exposure under the assumption of conditional independence and non-differential misclassification. We illustrate the proposed approach to adjust for misclassification with respect to oxygenation status in a multi-center trial of patients with pneumonia. We validate the approach with a simulation study. Ignoring misclassification produces downwardly biased estimates and underestimate uncertainty. | Return to top | SEMINAR DATE: Thursday, March 24, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Henry Block, Department of Statistics, University of Pittsburgh TOPIC: The Failure Rates of Mixtures | Mixtures of distributions of lifetimes occur in many settings. In engineering applications it is often the case that populations are heterogeneous, often with a small number of subpopulations. In survival analysis, selection effects can often occur. The concept of a failure rate in these settings becomes a complicated topic, especially when one attempts to interpret the shape as a function of time. Even if the failure rates of the subpopulations of the mixture have simple geometric or parametric forms, the shape of the mixture is often not transparent. Recent results, developed by the author (with Joe, Li, Mi, Savits and Wondmagegnehu) in a series of papers, are presented. These results focus on general results concerning the asymptotic limit and eventual monotonicity of a mixture and also of the overall behavior for mixtures of specific parametric families. Connections between unimodality of densities and their mixtures and changes in monotonicity of the failure rates have recently been studied and these results will also be presented. An overall picture of the various things which influence the behavior of the failure rate of a mixture will be given. | Return to top | SEMINAR DATE: Thursday, March 31, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Kerby Shedden, Department of Statistics, University of Michigan TOPIC: Intergene and Serial Correlations in Microarray Data: Implications for Inference | | In microarray data sets, correlations between the expression levels of different genes exist for a number of technical and biological reasons. Such intergene correlations have a substantial impact on significance levels for many common microarray data analysis procedures. While permutation approaches are sometimes adequate for addressing this issue, potential problems arise in certain cases. I will illustrate this in the case of a meta-analysis procedure, and propose an alternative procedure that is not subject to this critique. I will also consider serial correlations that are present in timecourse microarray experiments, focusing on how serial and intergene correlations affect the sampling properties of several common analyses. To illustrate I will discuss the identification of differentially expressed genes during the cell cycle. | Return to top | SEMINAR DATE: Thursday, April 14, 2005 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jane Fridlyand, Department of Epidemiology and Biostatistics, UCSF TOPIC: Application of copy number transitions finder to the analysis of the tumor copy number data and to mapping sequence variations in mice using BAC array CGH | This talk will discuss two different applications of the microarray-based comparative genomic hybridization (array CGH). In the first part of the talk we will discuss detection and characterization of the genomic alterations in tumors and in the second part we will describe a study we conducted on laboratory mice in search for small/large scale sequence variation. The development of solid tumors is associated with acquisition of complex genetic alterations, indicating that failures in the mechanisms that maintain the integrity of the genome contribute to tumor evolution. Thus, one expects that the particular types of genomic derangement seen in tumors reflect underlying failures in maintenance of genetic stability, as well as selection for changes that provide growth advantage. The computational task is to map and characterize the number and types of copy number alterations present in the tumors, and so define copy number phenotypes as well as to associate them with known biological markers. We discuss computational approaches leading to copy number phenotypes and the application of the results to testing and classification. We also introduce a connection between copy number and expression subtypes. In the second part of the talk we describe a study we conducted on laboratory mice. In this study we used BAC arrays to map sequence variation among several inbred and outbred mouse strains. We have identified a number of autosomal loci of copy number variation and have shown that these variant loci distinguish laboratory strains. Additionally, we have shown that small ratio changes detected using copy number finder distinguish homozygous and heterozygous regions of the genome in interspecific backcross mice, providing an efficient method for enotyping progeny of backcrosses. | Return to top Seminar Speakers Fall Term 2004 | SEMINAR DATE: Thursday, September 9, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: John Storey, Department of Biostatistics, University of Washinton TOPIC: A Significance Method for Time Course Microarray Experiments Applied to Two Human Studies | | A common goal in microarray experiments is to identify genes that are differentially expressed among two or more biological conditions. There is currently no standard methodology for detecting differential expression in time course studies. However, it is clear that monitoring the behavior of gene expression over time is important and will be a common experimental design in the future. Here we present a general statistical significance method for detecting temporal differential expression that can be applied to the typical types of comparisons and sampling schemes. We apply this method to two studies that we have carried out on humans. The goal of one study is to identify genes showing temporal differential expression between controls and endotoxin-treated individuals, and the other is to identify genes that show aging effects in the kidney. | Return to top | SEMINAR DATE: Thursday, September 23, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jason Fine, Department of Statistics, Department of Biostatistics & Medical Informatics, University of Wisconsin, Madison TOPIC: Cumulative Incidence Regression, with Application to Cancer Clinical Trials | | With explanatory covariates, the standard analysis for competing risks data involves modeling the cause-specific hazard functions via a proportional hazards assumption. Unfortunately, the cause-specific hazard function does not have a direct interpretation in terms of survival probabilities for the particular failure type. In recent years, many clinicians have begun using the cumulative incidence function -- the marginal failure probabilities for a particular cause -- which is intuitively appealing and more easily explained to the non-statistician. The cumulative incidence is especially relevant in decision analyses in which the survival probabilities are needed to determine treatment utility. Previously, researchers have considered methods for combining stimates of the cause-specific hazard functions under the proportional hazards formulation. However, these methods do not allow the analyst to assess the net effect of a covariate on the cumulative incidence function. In this talk, we discuss an alternative semiparametric modelling strategy in which the cumulative incidence is modelled directly, as one ordinarily models the survival function. Inference procedures are developed for covariate effects, separately from baseline failure probabilities, similarly to standard Cox model analyses. A uniformly consistent estimator for the predicted cumulative incidence for an individual with certain covariates is given and confidence intervals and bands can be obtained analytically or with a resampling technique. To contrast the different modelling frameworks, data from a breast cancer clinical trial is analyzed using both approaches. | Return to top | SEMINAR DATE: Thursday, September 30, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Stanley Lemeshow, Dean, School of Public Health, Ohio State University TOPIC: Assessing Scale of Continuous Covariates in Logistic Regression Modeling | Interval and ratio scale data present a challenge in logistic regression modeling because they should not be included in their original, continuous, form unless linearity in the logit holds. This talk will discuss three methods for assessing the scale of continuous covariates: quartile dummy variables, lowess plots and fractional polynomials. These methods can help determine whether the variable should be entered "as is" into the model, whether a transformation of the variable should be used, or whether categories should be established and included in the model through the use of dummy variables. Application to real data will be presented and discussed. | Return to top | SEMINAR DATE: Friday, October 8, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Haiyan Huang, Department of Statistics, University of California at Berkley TOPIC: Characterizing short DNA-binding motifs | | The recognition of regulatory motifs is challenging as the regulatory binding sites are often short and fuzzy. Here I will introduce several statistical methods we developed for characterizing short DNA-binding sites, in which we have taken into account the evolutionary context and dependence among positions. These newly developed motif models have been found to be advantageous in many applications. | Return to top | SEMINAR DATE: Thursday, October 28, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Jeanne Kowalski, Division of Biostatistics, Johns Hopkins University TOPIC: Nonparametric, Hypothesis-based Analysis of Molecular Heterogeneity for Comparative Phenotype Characterization | In this talk, I describe two novel, inference-based approaches to analysis of molecular heterogeneity associated with phenotypes. A common theme among them is the construction of testable hypotheses in a very high-dimensional setting, based on developed U-statistic theory, with nonparametric inference. With a modest sample, I discuss a distance-based approach for analysis of sequence heterogeneity. In the extreme case of several single, high-dimensional samples that are to be compared from a microarray experiment, I introduce a class of stochastic linear hypotheses that includes the Mann-Whitney Wilcoxon rank sum test as a special case. In each setting, I discuss the statistical and bioinformatic approaches developed to characterize either genes within a genome or locations within a sequence that depict groups of similar phenotype. As motivation, I examine two separate problems, one for relating sequence heterogeneity in a region of the HIV genome to drug resistance, and a second for relating gene expressions to hypothesized pathways for immunogenetic analysis of T cells. | Return to top | SEMINAR DATE: Thursday, November 11, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Zhiqiang Tan, Assistant Professor, Department of Biostatistics, Johns Hopkins University TOPIC: A distributional approach for causal inference using the propensity score | Drawing inferences about the effects of exposures or treatments is a common challenge in many scientific fields. We propose two new methods serving complementary purposes in causal inference. One can be used to estimate average causal effects, assuming "no confounding" given measured covariates. The other can be used to assess the sensitivity of the estimates to possible departures from "no confounding". We establish asymptotic results for the methods, and also address practical issues in planning data analysis, checking propensity score models, and interpreting sensitivity parameters. Both methods are developed from a nonparametric likelihood perspective. We illustrate the methods by analyzing the data from an observational study on right heart catheterization. The talk will be based on the working paper, "Efficient and Robust Causal Inference: A Distributional Approach", available at http://www.bepress.com/jhubiostat/paper48. | Return to top | SEMINAR DATE: Thursday, November 18, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Chiara Sabatti, Assistant Professor, Department of Statistics, University of California at Los Angeles TOPIC: Gene regulatory networks and blind deconvolution | The problem of reconstructing gene regulatory networks in simple systems as E. Coli bears some resemblance to a blind deconvolution one: the concentrations of active forms of regulatory proteins are unobserved sources and the transcript levels of genes observed outputs. In general, it is not known which source influences which output; genome sequence data can be used to obtain information on the presence of binding sites in the region upstream genes, providing some indication on the set of genes controlled by a given regulatory protein. The control strength of a regulatory protein on target genes varies across genes, and across chemical conditions of the cell. It is of interest to reconstruct both the concentration of active forms of the regulatory proteins, and their control strength, in a series of conditions on the basis of measurements of gene transcripts levels, obtained with microarrays. We describe a Bayesian model for this system and illustrate itsestimation via MCMC. | Return to top | SEMINAR DATE: Thursday, December 2, 2004 TIME: 3:30p.m. PLACE: A-115 Crabtree Hall, Graduate School of Public Health SPEAKER: Wesley Thompson , Assistant Professor, Department of Statistics, University of Pittsburgh TOPIC: An Introduction To Functional Data Analysis With Some Applications | Functional data analysis (FDA) is a relatively new body of statistical methodologies in which the observations consist of continuous functions, often of time. FDA can be viewed as a fusion of longitudinal data analysis and smoothing techniques; it tends to take a nonparametric approach which is well-suited to exploratory data analysis. This talk will provide some background and a brief introduction to FDA. Additionally, some applications of FDA techniques to psychiatric studies will be presented. | Return to top |  |