University of Pittsburgh
            Site Map | Find People
 
 

Welcome
Overview
FACULTY & STAFF
Faculty
Faculty Position(s)
Administrative Staff
ACADEMICS
Academic Programs
Requirements
Frequent Questions
Course Offerings
Seminars
    - Seminar Notices
Admission Procedures
Financial Aid
Statistical Genetics
STUDENTS & ALUMNI
Student Information
Alumni
Consulting Service
RESEARCH
Active Research
Funded Projects
Faculty Publications

 

BIOST 2025: Biostatistics Seminar Notices

Seminar Notices Spring Term 2005
Seminar Notices Fall Term 2004


Seminar Speakers
Spring Term 2005

Gong Tang , January 13, 2005
Guy Brock, February 3, 2005
Debashis Ghosh, February 17, 2005
Wei Pan, March 3, 2005
Andriy Bandos, March 17, 2005
Gina D'Angelo, March 17, 2005
Li Qin, March 17, 2005
Dianxu Ren, March 17, 2005
Henry Block, March 24, 2005
Kerby Shedden, March 31, 2005
Jane Fridlyand, April 14, 2005


SEMINAR

DATE: Thursday, January 13, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Gong Tang, Department of Biostatistics, University of Pittsburgh

TOPIC: A pseudolikelihood method for analysis of multivariate data
with nonignorable nonresponse

We consider multivariate regression analysis with missing data in the outcome variables, when the nonresponse mechanism depends on the underlying values of the responses and hence is nonignorable. Related problems include response-biased sampling where data are sampled with probability depending only on the univariate response. Standard approaches require specification of the missing-data mechanism and misspecification often leads to biased estimates. If the marginal distribution of the covariates is known, all the regression parameters can be identified and estimated via a conditional likelihood method. Otherwise, a pseudolikelihood method resulted from substituting a parametric or empirical estimator of the marginal distribution of the covariates into the conditional likelihood can be applied. These methods yield consistent and asymptotically normal estimates and are robust that specification of the missing-data mechanism is not necessary. This pseudolikehood method is extended to analysis of multivariate missing data with a monotone pattern and in turn to analysis of longitudinal data with nonignorable dropouts where the dropout mechanism depends on the underlying values of the repeatedly-measured outcomes. These methods are illustrated via simulation studies and analysis of a schizophrenia trial dataset.

Return to top


SEMINAR

DATE: Thursday, February 3, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Guy Brock, Department of Human Genetics
University of Pittsburgh

TOPIC: Estimating significance levels in NPL scores for large pedigrees

The significance level of a non-parametric linkage statistic is often found by simulation since the distribution of the test statistic is complex and unknown. Ideally, simulation occurs by assigning founder genotypes and then `dropping’ genotypes through the pedigree. This situation mimics the actual pedigree data, where IBD sharing information is not known with certainty. However, this approach is usually computationally infeasible for larger pedigrees which require MCMC techniques to calculate the statistic, as an additional MCMC run is required to estimate the statistic for each gene-drop. One alternative is to drop inheritance vectors rather thangenotypes, but this assumes that IBD sharing is known unambiguously and results in an overestimate of the variance and therefore a conservativesignificance level. In this work, we propose a novel method to estimate the significance level of the statistic. This is accomplished by estimating the Markov chain variability using the first gene drop, and then estimating the variability due to gene-dropping by running shorter MCMC chains on the additional simulated datasets. The two estimates are combined to form an overall estimate of the NPL statistic variability.

Return to top


SEMINAR

DATE: Thursday, February 17, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Debashis Ghosh, Assistant Professor, Department of Biostatistics, University of Michigan

TOPIC: Statistical Methods for Genomic Localization with Microarray Data

In many disease and scientific settings, understanding the spatial
distribution of transcription is quite important. In a variety of settings,
ranging from yeast biology to cancer, the role of the location of
transcriptional events is important. Developing such methods requires
incorporating various biological features, such as variable gene density.
In this talk, we discuss methods for the analysis of gene expression as a
function of chromosomal position. Developing this methodology requires
combining microarray data information with that on genomic location. The
methods are illustrated with data from an ovarian cancer study.
This is joint work with Al Levin, Kathy Cho and Sharon Kardia.

Return to top


SEMINAR

DATE: Thursday,March 3, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Wei Pan, Division of Biostatistics, School of Public Health, Universityof Minnesota

TOPIC: Incorporating prior information via shrinkage:
a combined analysis of genome-wide location data and gene expression data

Transcriptional control is a critical step in regulation of gene expression. Understanding such a control on a genomic level involves deciphering the mechanisms and structures of regulatory programs and networks. A difficulty arises due to the weak signal and high noise in various sources of data while most current approaches are limited to analysis of a single source of data. A natural alternative is to improve statistical efficiency and power by a combined analysis of multiple sources of data. Here we propose a shrinkage method to combine genome-wide location data and gene expression data to detect the binding sites or target genes of a transcription factor. Specifically, a prior ``non-target'' gene list is generated by analyzing the expression data, and then this information is incorporated into the subsequent binding data analysis via a shrinkage method. There is a Bayesian justification for this shrinkage method. Both simulated and real data were used to evaluate the proposed method. In simulation studies, the proposed method gives higher sensitivity and lower false discovery rate (FDR) in detecting the target genes. In real data example, proposed method can reduce the estimated FDR and increase the power to detect the previously known target genes of a broad transcription regulator, leucine responsive regulatory protein (Lrp) in Escherichia coli. This method can also be used to incorporate other information to microarray data analysis, such as using Gene Ontology (GO) information, to detect differentially expressed genes. This is joint work with Yang Xie, Keyong S. Jeong and Arkady Khodursky.

Return to top


SEMINAR

ENAR STUDENT PRESENTATION


DATE:
Thursday, March 17, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Andriy Bandos, Department of Biostatistics, University of Pittsburgh

TOPIC: A Permutation Test Senistive To Differences in Areas for Comparing ROC Curves from a Paired Design

The Area Under the ROC Curve (AUC) is a widely accepted summary index of the overall performance of diagnostic procedures and the difference between AUCs is often used when comparing two diagnostic systems. We developed an exact non-parametric statistical procedure for comparing two ROC Curves in paired design settings. The test which is based on all permutations of the subject specific rank ratings is formally a test for equality of ROC curves that is sensitive to the alternatives of AUC difference. The operating characteristics of the proposed test were evaluated using extensive simulations over a wide range of parameters.
The proposed procedure can be easily implemented in experimental ROC datasets. For small samples and for underlying parameters that are ommon in experimental studies in diagnostic imaging the test possesses good operating characteristics and is more powerful than the conventional non-parametric procedure for AUC comparisons. We also derived an asymptotic version of the test which uses an exact estimate of the variance in the permutation space and provides a good approximation even when the sample sizes are small. This asymptotic procedure is a simple and precise approximation to the exact test and is useful for large sample sizes where the exact test may be computationally burdensome.
KEY WORDS: Non-parametric procedure; Permutation test; Receiver operating characteristic (ROC) curve; Paired design.

Return to top


SEMINAR

ENAR STUDENT PRESENTATION

DATE: Thursday,March 17, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Gina D'Angelo, Department of Biostatistics, University of Pittsburgh

TOPIC: Likelihood-Based Approach for Truncated Covariates

Truncated and censored data methodology has been developed for the last 30 years with the focus on the outcome variable. This has lead to the development of models such as the Tobit regression model for truncated data in addition to numerous models for censored data. Another commonly encountered problem is that of truncated covariate data. This type of data is generally observed in the laboratory setting, where the lower limit of detection of an assay is often observed. To address this problem, we propose a method to estimate the coefficients and their standard errors for a regression model with a left truncated covariate using a likelihood-based technique. This method will be compared to a standard method of filling in the truncated values with the lower threshold value. The application of this method is illustrated in a sepsis study conducted at the University of Pittsburgh. One aim of this study is to determine the relationship between severe sepsis status and measures of inflammation such as interleukin-6 and interleukin-10.

Return to top


SEMINAR

ENAR STUDENT PRESENTATION

DATE: Thursday, March 18, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Li Qin, Department of Biostatistics, University of Pittsburgh

TOPIC: A Latent Class Model for Longitudinal Binary Response Data with
Nonignorable Missingness

Nonignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. Roy (2004) proposed a latent class model for continuous data,
in which the classes are related to the time of dropouts. In our study, we extend this approach to categorical data, dividing the observed data into two latent classes; a special class in which subjects definitely have '0' outcomes and a second one in which the outcomes can be modeled using logistic regression. In latent class models, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Thus the longitudinal responses and the missingness process are independent given the latent classes. The latent class model is also a special case of a pattern mixture model. Parameters are estimated by the method of maximum likelihood based on the above assumption and correlation between responses (le Cessie and van Houwelingen, 1994). This methodology is illustrated with a data set of weight concern in a smoking cessation study for women. For this study we compare the proposed method with a mixed effects model (Ten Have, et al., 1998) and weighted GEE (Robins et al., 1995). The results show that our method and Ten Have's model are similar and differ from the weighted GEE model. Although the results obtained using the proposed method and Ten Have's model are similar, our method is simpler to implement and can also be used for intermittent missing data.

Return to top


SEMINAR

ENAR STUDENT PRESENTATION

DATE: Thursday, March 17, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Dianxu Ren, Department of Biostatistics, University of Pittsburgh

TOPIC: A Bayesian Adjustment for covariate misclassification with correlated binary outcome data

Estimates of association between an outcome variable and misclassified
covariates tend to be biased when the usual methods of estimation that ignore the classification error are applied. Available methods to account for misclassification often require the use of a gold standard (i.e, a validation subsample). But in practice, a gold standard may be unavailable or impractical. We propose a Bayesian approach to adjust for misclassification in a binary covariate in logistic and random effect logistic models when a gold standard is not available. This Markov Chain Monte Carlo (MCMC) approach uses two imperfect measures of a dichotomous
exposure under the assumption of conditional independence and non-differential misclassification. We illustrate the proposed approach to adjust for misclassification with respect to oxygenation status in a multi-center trial of patients with pneumonia. We validate the approach with a simulation study. Ignoring misclassification produces downwardly biased estimates and underestimate uncertainty.

Return to top


SEMINAR

DATE: Thursday, March 24, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Henry Block, Department of Statistics, University of Pittsburgh

TOPIC: The Failure Rates of Mixtures

Mixtures of distributions of lifetimes occur in many settings. In engineering applications it is often the case that populations are heterogeneous, often with a small number of subpopulations. In survival analysis, selection effects can often occur. The concept of a failure rate in these settings becomes a complicated topic, especially when one attempts to interpret the shape as a function of time. Even if the failure rates of the subpopulations of the mixture have simple geometric or parametric forms, the shape of the mixture is often not transparent.

Recent results, developed by the author (with Joe, Li, Mi, Savits and Wondmagegnehu) in a series of papers, are presented. These results focus on general results concerning the asymptotic limit and eventual monotonicity of a mixture and also of the overall behavior for mixtures of specific parametric families.

Connections between unimodality of densities and their mixtures and changes in monotonicity of the failure rates have recently been studied and these results will also be presented.

An overall picture of the various things which influence the behavior of the failure rate of a mixture will be given.

Return to top


SEMINAR

DATE: Thursday, March 31, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Kerby Shedden, Department of Statistics, University of Michigan

TOPIC: Intergene and Serial Correlations in Microarray Data: Implications for Inference

In microarray data sets, correlations between the expression levels of
different genes exist for a number of technical and biological reasons.
Such intergene correlations have a substantial impact on significance levels
for many common microarray data analysis procedures. While permutation
approaches are sometimes adequate for addressing this issue, potential
problems arise in certain cases. I will illustrate this in the case of a
meta-analysis procedure, and propose an alternative procedure that is not
subject to this critique.
I will also consider serial correlations that are present in timecourse
microarray experiments, focusing on how serial and intergene correlations
affect the sampling properties of several common analyses. To illustrate I
will discuss the identification of differentially expressed genes during the
cell cycle.

Return to top


SEMINAR

DATE: Thursday, April 14, 2005

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Jane Fridlyand, Department of Epidemiology and Biostatistics, UCSF

TOPIC: Application of copy number transitions finder to the analysis of the tumor copy number data and to mapping sequence variations in mice using BAC array CGH

This talk will discuss two different applications of the microarray-based comparative genomic hybridization (array CGH). In the first part of the talk we will discuss detection and characterization of the genomic alterations in tumors and in the second part we will describe a study we conducted on laboratory mice in search for small/large scale sequence variation.

The development of solid tumors is associated with acquisition of complex genetic alterations, indicating that failures in the mechanisms that
maintain the integrity of the genome contribute to tumor evolution. Thus, one expects that the particular types of genomic derangement seen in tumors reflect underlying failures in maintenance of genetic stability, as well as selection for changes that provide growth advantage. The computational task is to map and characterize the number and types of copy number alterations present in the tumors, and so define copy number phenotypes as well as to associate them with known biological markers. We discuss computational approaches leading to copy number phenotypes and the application of the results to testing and classification. We also introduce a connection between copy number and expression subtypes.

In the second part of the talk we describe a study we conducted on laboratory mice. In this study we used BAC arrays to map sequence variation among several inbred and outbred mouse strains. We have identified a number of autosomal loci of copy number variation and have shown that these variant loci distinguish laboratory strains. Additionally, we have shown that small ratio changes detected using copy number finder distinguish homozygous and heterozygous regions of the genome in interspecific backcross mice, providing an efficient method for enotyping progeny of backcrosses.

Return to top


Seminar Speakers
Fall Term 2004

John Storey, September 9, 2004
Jason Fine, September 23, 2004
Stanely Lemeshow, Septembear 30, 2004
Haiyan Huang, October 8, 2004
Jeanne Kowalski, October 28, 2004
Zhiqiang Tan, November 11, 2004
Chiara Sabatti, November 18, 2004
Wesley Thompson, December 2, 2004


SEMINAR

DATE: Thursday, September 9, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: John Storey, Department of Biostatistics, University of Washinton

TOPIC: A Significance Method for Time Course Microarray Experiments
Applied to Two Human Studies

A common goal in microarray experiments is to identify genes that are differentially expressed among two or more biological conditions. There is currently no standard methodology for detecting differential expression in time course studies. However, it is clear that monitoring the behavior of gene expression over time is important and will be a common experimental design in the future. Here we present a general statistical significance method for detecting temporal differential expression that can be applied to the typical types of comparisons and sampling schemes. We apply this method to two studies that we have carried out on humans. The goal of one study is to identify genes showing temporal differential expression between controls and endotoxin-treated individuals, and the other is to identify genes that show aging effects in the kidney.

Return to top


SEMINAR

DATE: Thursday, September 23, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Jason Fine, Department of Statistics, Department of Biostatistics & Medical Informatics, University of Wisconsin, Madison

TOPIC: Cumulative Incidence Regression, with Application to Cancer Clinical Trials

With explanatory covariates, the standard analysis for competing risks data involves modeling the cause-specific hazard functions via a proportional hazards assumption. Unfortunately, the cause-specific hazard function does not have a direct interpretation in terms of survival probabilities for the particular failure type. In recent years, many clinicians have begun using the cumulative incidence function -- the marginal failure probabilities for a particular cause -- which is intuitively appealing and more easily explained to the non-statistician. The cumulative incidence is especially relevant in decision analyses in which the survival probabilities are needed to determine treatment utility. Previously, researchers have considered methods for combining stimates of the cause-specific hazard functions under the proportional hazards formulation. However, these methods do not allow the analyst to assess the net effect of a covariate on the cumulative incidence function. In this talk, we discuss an alternative semiparametric modelling strategy in which the cumulative incidence is modelled directly, as one ordinarily models the survival function. Inference procedures are developed for covariate effects, separately from baseline failure probabilities, similarly to standard Cox model analyses. A uniformly consistent estimator for the predicted cumulative incidence for an individual with certain covariates is given and confidence intervals and bands can be obtained analytically or with a resampling technique. To contrast the different modelling frameworks, data from a breast cancer clinical trial is analyzed using both approaches.

Return to top


SEMINAR

DATE: Thursday, September 30, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Stanley Lemeshow, Dean, School of Public Health, Ohio State University

TOPIC: Assessing Scale of Continuous Covariates in Logistic Regression Modeling

Interval and ratio scale data present a challenge in logistic regression modeling because they should not be included in their original, continuous, form unless linearity in the logit holds. This talk will discuss three methods for assessing the scale of continuous covariates: quartile dummy variables, lowess plots and fractional polynomials. These methods can help determine whether the variable should be entered "as is" into the model, whether a transformation of the variable should be used, or whether categories should be established and included in the model through the use of dummy variables. Application to real data will be presented and discussed.

Return to top


SEMINAR

DATE: Friday, October 8, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Haiyan Huang, Department of Statistics, University of California at Berkley

TOPIC: Characterizing short DNA-binding motifs

The recognition of regulatory motifs is challenging as the regulatory binding sites are often short and fuzzy. Here I will introduce several statistical methods we developed for characterizing short DNA-binding sites, in which we have taken into account the evolutionary context and dependence among positions. These newly developed motif models have been found to be advantageous in many applications.

Return to top


SEMINAR

DATE: Thursday, October 28, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Jeanne Kowalski, Division of Biostatistics, Johns Hopkins University

TOPIC: Nonparametric, Hypothesis-based Analysis of Molecular Heterogeneity for Comparative Phenotype Characterization

In this talk, I describe two novel, inference-based approaches to analysis of molecular heterogeneity associated with phenotypes. A common theme among them is the construction of testable hypotheses in a very high-dimensional setting, based on developed U-statistic theory, with nonparametric inference. With a modest sample, I discuss a distance-based approach for analysis of sequence heterogeneity. In the extreme case of several single, high-dimensional samples that are to be compared from a microarray experiment, I introduce a class of stochastic linear hypotheses that includes the Mann-Whitney Wilcoxon rank sum test as a special case. In each setting, I discuss the statistical and bioinformatic approaches developed to characterize either genes within a genome or locations within a sequence that depict groups of similar phenotype. As motivation, I examine two separate problems, one for relating sequence heterogeneity in a region of the HIV genome to drug resistance, and a second for relating gene expressions to hypothesized pathways for immunogenetic analysis of T cells.

Return to top


SEMINAR

DATE: Thursday, November 11, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Zhiqiang Tan, Assistant Professor, Department of Biostatistics,
Johns Hopkins University

TOPIC: A distributional approach for causal inference using the propensity score

Drawing inferences about the effects of exposures or treatments is a common challenge in many scientific fields. We propose two new methods serving complementary purposes in causal inference. One can be used to estimate average causal effects, assuming "no confounding" given measured covariates. The other can be used to assess the sensitivity of the estimates to possible departures from "no confounding". We establish asymptotic results for the methods, and also address practical issues in planning data analysis, checking propensity score models, and interpreting sensitivity parameters. Both methods are developed from a nonparametric likelihood perspective. We illustrate the methods by analyzing the data from an observational study on right heart catheterization.

The talk will be based on the working paper, "Efficient and Robust Causal Inference: A Distributional Approach", available at
http://www.bepress.com/jhubiostat/paper48.

Return to top

 


SEMINAR

DATE: Thursday, November 18, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Chiara Sabatti, Assistant Professor, Department of Statistics, University of California at Los Angeles

TOPIC: Gene regulatory networks and blind deconvolution

The problem of reconstructing gene regulatory networks in simple systems as E. Coli bears some resemblance to a blind deconvolution one: the concentrations of active forms of regulatory proteins are unobserved sources and the transcript levels of genes observed outputs.
In general, it is not known which source influences which output;
genome sequence data can be used to obtain information on the presence of binding sites in the region upstream genes, providing some
indication on the set of genes controlled by a given regulatory protein. The control strength of a regulatory protein on target genes varies across genes, and across chemical conditions of the cell. It is of interest to reconstruct both the concentration of active forms of the regulatory proteins, and their control strength, in a series of conditions on the basis of measurements of gene transcripts levels, obtained with microarrays.
We describe a Bayesian model for this system and illustrate itsestimation via MCMC.

Return to top


 

SEMINAR

DATE: Thursday, December 2, 2004

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Wesley Thompson , Assistant Professor, Department of Statistics, University of Pittsburgh

TOPIC: An Introduction To Functional Data Analysis With Some Applications

Functional data analysis (FDA) is a relatively new body of statistical methodologies in which the observations consist of continuous functions, often of time. FDA can be viewed as a fusion of longitudinal data analysis and smoothing techniques; it tends to take a nonparametric approach which is well-suited to exploratory data analysis.
This talk will provide some background and a brief introduction to FDA. Additionally, some applications of FDA techniques to psychiatric studies will be presented.

Return to top

 
© 2001-2005
Dept. of Biostatistics, University of Pittsburgh

Program Contact:
Registrar, biostat@pitt.edu

Webmaster:
Susan Grasky, BSIS


Home | Graduate School of Public Health Home | Univ. of Pittsburgh Home | Top of Page |
Overview | Faculty | Faculty Position(s) | Administrative Staff | Academic Programs |
Requirements | Frequent Questions | Course Offerings | Seminars | Admission Procedures | Financial Aid |
Statistical Genetics | Student Information | Alumni | Consulting Service |
Active Research | Funded Projects | Faculty Publications


Department of Biostatistics, 130 Desoto Street, 311 Parran Hall,
Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261
Phone: (412) 624-3022 Fax: (412) 624-2183

Revised on April 6, 2005