University of Pittsburgh
            Site Map | Find People
 
 

Welcome
Overview
FACULTY & STAFF
Faculty
Faculty Position(s)
Administrative Staff
ACADEMICS
Academic Programs
Requirements
Frequent Questions
Course Offerings
Seminars
    - Seminar Notices
Admission Procedures
Financial Aid
Statistical Genetics
STUDENTS & ALUMNI
Student Information
Alumni
Consulting Service
RESEARCH
Active Research
Funded Projects
Faculty Publications


RESOURCES

Computing Resource

 

BIOST 2025: Biostatistics Seminar Notices


Seminar Notices Spring Term 2007

Seminar Notices Fall Term 2006



Seminar Speakers
Spring Term 2007

Kelly H. Zou, February 8, 2007
Yan Lin , March 1, 2007
Qing Xu , March 1, 2007
Wentao Feng , March 1, 2007
Kiros Berhane, March 22, 2007
Huiman X. Barnhart, Ph.D., April 12, 2007

SEMINAR

DATE: Thursday, February 8, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Kelly H. Zou, Director of Biostatistics, Clinical Research Program, Children's Hospital Boston, Associate Professor of Radiology, Harvard Medical School

TOPIC: Accuracy and Reliability of Statistical Classification Methods with Applications in Image-Guided Therapy and Functional MR Imaging

The validity of image segmentation is an important issue in image processing and image-guided interventions because it has a direct impact on therapeutic planning. We examined classification accuracy in imaging analysis based on three two-sample validation metrics against the estimated composite latent gold standard, which was derived from several experts' manual segmentations by an expectation-maximization (EM) algorithm called simultaneous truth and performance level estimation (STAPLE). The distribution functions of the target and control pixel data were parametrically assumed to be a mixture of two beta distributions with different shape parameters. We estimated the corresponding receiver operating characteristic (ROC) curve, Dice similarity coefficient, and mutual information, over all possible decision thresholds. Based on each validation metric, an optimal threshold was then computed via maximization. We illustrated these methods using magnetic resonance (MR) imaging data on three radiological examples: (1) hidden gold standard in prostate peripheral zone segmentation for brachytherapy, (2) accuracy of brain tumor segmentation, and (3) multi-institutional functional imaging for detection of brain activation. Extensions to spatial correlation structures were also considered under a Markov random fields model.

Key Words: Sensitivity, Specificity, Receiver operating characteristic (ROC) curve, Dice similarity coefficient, Mutual information, Expectation maximization (EM) algorithm, magnetic resonance (MR) imaging, image-guided therapy, Functional MR imaging (fMRI).

Return to top


SEMINAR

DATE: Thursday, March 1, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Yan Lin, Graduate Student, Department of Biostatistics, University of Pittsburgh

TOPIC: Smarter Clustering Methods for High-throughput SNP Genotype Calling

Many high-throughput genotyping technologies for single nucleotide polymorphism (SNP) markers have been developed. Most use clustering methods to ”call” the SNP genotypes, but standard clustering methods are not optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of a number of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is mostly ignored. Furthermore, when prior information about the distribution of the measurements for each cluster is available, choosing an appropriate model-based clustering method can significantly improve the genotype calls. In this paper, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes. We also propose two new methods to call genotypes using family data. The first method is a modification of the K-means method which uses the family information by updating all members from a family together. The second method is a likelihood-based method that combines the Gaussian or Beta mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Sequenom platform (www.sequenom.com).

Return to top


SEMINAR

DATE: Thursday, March 1, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Qing Xu, Graduate Student, Department of Biostatistics, University of Pittsburgh

TOPIC: A weighted log-rank test to detect early difference in censored survival distributions

We revisit the weighted log-rank test where the weight function was derived by assuming the inverse Gaussian distribution for an omitted exponentiated covariate that induces a nonproportionality under the proportional hazards model (Oakes and Jeong, 1998). The method was based on the score test statistic for a group comparison from the proportional hazards model. In this paper, we perform a simulation study to compare the test statistic based on the inverse Gaussian distribution with ones using other popular weight functions including members of the Harrington-Fleming's G-rho family (1982). The nonproportional hazards data are generated by changing the hazard ratios over time under the proportional hazards model. The results indicate that the inverse Gaussian-based test tends to have higher power than some of the members that belong to the G-rho family in detecting a difference between two survival distributions when populations become homogeneous as time progresses. One of the datasets from phase III clinical trials on breast cancer will be illustrated as a real example.

Return to top


SEMINAR

DATE: Thursday, March 1, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Wentao Feng, Graduate Student, Department of Biostatistics, University of Pittsburgh

TOPIC: A Supremum Log-Rank Test for Adaptive Two-Stage Treatment Strategies and Corresponding Sample Size Formula

In two-stage adaptive treatment strategies, patients receive one of the induction treatments, followed by a maintenance treatment given that the patient responded to the induction treatment they received. To test for a difference in the effect of different induction and maintenance treatment combinations, a modified supremum weighted log-rank test is proposed. The test is applied to the data from a two-stage randomized trial and compared to the results obtained using a standard weighted log rank test.

A sample size formula is proposed based on the limiting distribution of the supremum weighted log-rank statistic. The sample size formula reduces to the Eng and Kosorok's sample-size formula for two-sample supremum log-rank test when there is no second randomization. Monte Carlo simulation studies show that the proposed test provides sample sizes which are close to that obtained by standard weighted log-rank test under proportional hazard alternative. However, the proposed test is more powerful than the standard weighted log-rank test under non-proportional hazard alternatives.

 

KEYWORDS : Adaptive treatment strategies; Brownian motion; Censoring distribution; Counting process; Proportional hazards; Two-stage designs; Sample size formula; Supremum log rank statistics; Survival Function

Return to top


 

SEMINAR

DATE: Thursday, March 22, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Kiros Berhane, Associate Professor, Department of Preventive Medicine, University of Southern California

TOPIC: Functional Based Multi Level Models for Multiple Longitudinal Outcomes

We propose a flexible multi-level modeling technique that uses splines to model multiple longitudinal outcomes. This work is motivated by our desire to study the long term effects of air pollution on children's health by examining the relationship between ecologic covariates (e.g. air pollution) and functionals related to various nonlinear lung function growth curves. The latent variable approach is used in connecting the outcomes within a subject. This technique allows for the estimation of cluster specific growth curves, after adjusting for subject-specific growth-curve parameters. A Gibbs sampling approach is implemented to obtain posterior mean and variance estimates of non-linear functionals of growth curves in a unified way. Ecologic inference is then conducted in a multi level setting. We illustrate the technique via analysis of data from the Southern California Children's Health Study.

Return to top


SEMINAR

DATE: Thursday, April 12, 2007

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Huiman Barnhart, Associate Professor, Department of Biostatistics and Bioinformatics, Duke University

TOPIC: An Overview on Assessing Agreement with Continuous Agreement

Reliable and accurate measurements serve as basis for evaluation in many disciplines of science. The issues related to reliable and accurate measurement have evolved over several decades dating back to the 19th century with pioneering work by Galton (1886), Pearson (1896, 1899, 1901) and Fisher (1925). Oftentimes, it is not practical to require the new measurement to be identical to the truth either because (1) we are willing to accept a measurement up to some tolerable (or acceptable) error or (2) the truth is simply not available to us because either it is not measurable or it is only possible to measure with some error. To deal with issues related to  both (1) and (2), different concepts, methods, or theories have been developed in different disciplines. Some of these concepts have  been  used across different disciplines and some have been limited to a particular field and may have potential use in other disciplines. In this talk, I elucidate and contrast fundamental concepts used in different disciplines and bring these concepts into one common theme: assessing closeness (agreement) of the observations. We focus on assessing agreement with continuous measurement and classify different statistical approaches for expressing agreement in terms of (a) descriptive tools; (b) unscaled summary indices based on absolute differences of measurements, and (c) scaled summary indices attaining values between -1 and 1 for various data structure and for cases with and without reference. We identify gaps that require further research as well as future directions in assessing agreement.

This is the joint work with Michal Haber in Emory University and Lawrence Lin in Baxter Healthcare Inc..  This research is supported by NIH R01-MH070028.

 

 

Return to top


 

Seminar Speakers
Fall Term 2006

Chien-Cheng (George) Tseng, September 7, 2006
Dev Chakraborty, September 21, 2006
Sanat K. Sarkar, October 19, 2006
Jeffrey Blume, November 3, 2006
Yu Cheng, November 16, 2006
Joseph Ibrahim, December 7 , 2006

 

SEMINAR

DATE: Thursday, September 7, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Chien-Cheng (George) Tseng, Assistant Professor, Graduate School of Public Health, Department of Biostatistics, University of Pittsburgh

TOPIC: Which Missing Value Imputation Method to Use in Expression Profiles: a Comparative Study and Two Selection Schemes

Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method appeared to have their pros and cons and the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of current imputation methods on multiple types of microarray experiments including time series, multiple exposures, and multiple exposures X time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data. We found that the success of each imputation method largely depended on the underlying complexity of the expression data. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme performed better than any single method. In addition, a self-training selection (STS) scheme, which determines the optimal imputation method via simulation, was proposed and performed even better than the EBS scheme with increased computation cost. Our findings have provided much insight to the problem of missing value imputation in microarray data. We conclude that EBS and STS schemes are complementary and effective tools for selecting the optimal imputation method.

Return to top


SEMINAR

DATE: Thursday, September 21, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Dev Chakraboraty, Associate Professor, School of Medicine
Department of Radiology, University of Pittsburgh

TOPIC: Free-response observer performance methodology - recent advances and applications to imaging system assessment

The free-response paradigm is being increasingly used in the assessment of medical imaging systems. It differs from the receiver operating characteristic (ROC) method in that it uses location information, rewarding the observer when lesions are correctly located and penalizing the observer when they are not. In the FROC paradigm the data-unit is a variable number of mark-rating pairs per image. A mark is the indicated location of a suspicious region. The rating is the corresponding degree of suspicion. A method for analyzing free-response data, termed JAFROC analysis is summarized. JAFROC has been shown to have significantly greater statistical power at discriminating between modalities than the ROC method but it has significant limitations. A recent search-model for free-response data will be described. The search model can be used to infer a figure of merit of observer performance and to predict ROC and FROC curves. An approach to estimating search model parameters for computer aided detection algorithms is described. Significant problems inhibiting full utilization of the method are summarized.

Return to top


SEMINAR

DATE: Thursday, October 19, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Sanat K. Sarkar, Professor, The Fox School of Business, Department of Statistics, Temple University

TOPIC: Generalizing the False Discovery Rate in Multiple Testing of Large Number of Hypotheses

While testing a large number of hypotheses, one is often willing to tolerate a few false rejections but wants to control too many of them, say k or more. In such a case, a procedure controlling at least k false discoveries will have
better abilities to detect false null hypotheses than the one that controls at least one false discovery. With that rationale, the traditional concept of familywise error rate (FWER), the probability of at least one false discovery, has recently been generalized to that of the k-FWER, the probability of at least k false discoveries, and procedures controlling it have been proposed. In this talk, I'll introduce a less conservative notion of error rate than the
k-FWER, which is the k-FDR, the expected proportion of k or more false discoveries among the total number of discoveries. Procedures controlling the k-FDR will be presented under certain assumptions on the dependence structure of the underlying p-values. Some of these procedures are more
powerful than the corresponding k-FWER procedures and some are often more powerful than the Benjamini-Hochberg FDR procedure.

Return to top


SEMINAR

DATE: Friday, November 3, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Jeffrey Blume, Assistant Professor, Center for Statistical Sciences, Brown University

TOPIC: Why Likelihood is on the Critical Path

The statistical challenges and opportunities along the FDA’s ‘critical path of new medical products’ are substantial. The critical path calls for increased flexibility and greater efficiency of statistical methods without sacrificing their accuracy. This is exactly what Likelihood methods provide.

Consider using the likelihood ratio, instead of a p-value or posterior probability, to measure of the strength of statistical evidence in the data. It has already been established that Likelihood ratios are flexible, efficient, and seldom misleading. But more to the point is that they are more flexible, more efficient and more accurate than the traditional hypothesis testing methods. For example, examining likelihood ratios as I propose minimizes the overall probability of making an error (either type I or type II) in an experiment and this result holds even in cases with multiple endpoints where the type I error is adjusted to avoid inflation. In addition, likelihood ratios remain seldom misleading even when the study is (statistically) rigged to produce evidence favoring the pet hypothesis over the true hypothesis.

The take home message is that likelihood ratios represent an excellent compromise between the Bayesian and frequentist paradigms when it comes to measuring the strength of evidence in data.
This is because likelihood ratios retain the desirable properties from both paradigms (irrelevance of sample space, good performance probabilities) while shedding the undesirable ones (dependence on prior distributions, ad-hoc adjustments to control error probabilities). Hence, Likelihood ratios make ideal regulatory tools because (1) they eliminate the need for seemingly arbitrary adjustments, (2) they frequently characterize the evidence correctly even when the study is (statistically) rigged to do otherwise, (3) they can be made robust to the underlying model choice in a way that preserves (1) and (2) above.

Return to top


SEMINAR

DATE: Thursday, November 16, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Yu Cheng, Assistant Professor, Department of Statistics, University of Pittsburgh

TOPIC: MODELING ASSOCIATION OF BIVARIATE COMPETING RISKS
DATA THROUGH CUMULATIVE INCIDENCE FUNCTIONS

Frailty models are frequently used to analyze clustered survival data and evaluate within-cluster associations. However, they are seldom used in multivariate competing risks settings because of the challenge imposed by dependence structure between the primary and competing risks. To address this challenge, we focus on a nonparametrically identifiable quantity: cumulative incidence function (CIF). Frailty models are constructed expressing the bivariate CIF in terms of its marginals based on some improper random variables whose distribution functions corresponding to CIFs. Estimating equations are proposed to estimate the unknown association parameter involved in frailty models. The large sample properties of the association parameter estimators are established using empirical processes techniques and their practical performances are studied by Monte-Carlo simulations. We illustrate their practical utility by an analysis of dementia in the Cache County Study.

Return to top


SEMINAR

DATE: Thursday, December 7, 2006

TIME: 3:30p.m.

PLACE: A-115 Crabtree Hall, Graduate School of Public Health

SPEAKER: Joseph Ibrahim,Alumni Distinguished Professor,School of Public Health,Department of Biostatistics,University of North Carolinah

TOPIC: Variable selection in regression mixture modeling for the discovery of gene regulatory networks

To access slides from talk click here.

The profusion of genomic data through genome sequencing and gene expression microarray technology has facilitated statistical research in determining gene interactions regulating a biological process. Current methods generally consist of a two-stage procedure: clustering gene expression measurements, and searching for regulatory ``switches'', typically short, conserved sequence patterns (motifs) in the DNA sequence adjacent to the genes.

This process often leads to misleading conclusions as incorrect cluster selection may lead to missing important regulatory motifs or making many false discoveries.

Treating cluster memberships as known, rather than estimated, introduces bias into analyses, preventing uncertainty about cluster parameters. Further, there is under-utilization of the available data, as the sequence information is ignored for purposes of expression clustering and vice-versa.

We propose a way to address these issues by combining gene clustering and motif discovery in a unified framework, a mixture of hierarchical regression models, with unknown components representing the latent gene clusters, and genomic sequence features linked to the resultant gene expression through a multivariate hierarchical regression. We demonstrate a Monte Carlo method for simultaneous variable selection (for motifs) and clustering (for genes). The selection of the number of components in the mixture is addressed by computing the analytically intractable Bayes factor through a novel multi-stage mixture importance sampling approach.

This methodology is illustrated on a yeast cell cycle dataset to determine an optimal set of motifs that discriminates between groups of genes and simultaneously finds the most significant gene clusters.

 

© 2001-2007
Dept. of Biostatistics, University of Pittsburgh

Program Contact:
Registrar, biostat@pitt.edu

Webmaster:
Susan Grasky, BSIS


Home | Graduate School of Public Health Home | Univ. of Pittsburgh Home | Top of Page |
Overview | Faculty | Faculty Position(s) | Administrative Staff | Academic Programs |
Requirements | Frequent Questions | Course Offerings | Seminars | Admission Procedures | Financial Aid |
Statistical Genetics | Student Information | Alumni | Consulting Service |
Active Research | Funded Projects | Faculty Publications
| Computing Resource

Department of Biostatistics, 130 Desoto Street, 311 Parran Hall,
Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261
Phone: (412) 624-3022 Fax: (412) 624-2183

Revised on March 29, 2007