Seminars

Spring 2026 Seminars 

The Department of Statistics Seminar Series for the Spring 2026 semester will take place on Mondays at 2 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted. 

Zoom Link: 
https://pitt.zoom.us/j/91721753371
Meeting ID: 917 2175 3371


February 2, 2026 

Antti Honkela, Professor at the University of Helsinki Department of Computer Science

Title: TBD 

Abstract: TBD 


February 9, 2026 

Jerome Reiter, Professor at Duke University Department of Statistical Science 

Title: TBD

Abstract: TBD 


February 16, 2026 

Amin Rahimian, Assistant Professor at the University of Pittsburgh Swanson School of Engineering 

Title: TBD 

Abstract: TBD 


February 23, 2026 

Thibault Randrianarisoa, Assistant Professor at the University of Toronto Scarborough Department of Computer and Mathematical Sciences 

Title: TBD

Abstract: TBD 


March 23, 2026 

Felipe Barrientos, Associate Professor at Florida State University Department of Statistics 

Title: TBD 

Abstract: TBD 


March 30, 2026 

Larry Wasserman, Professor at Carnegie Mellon University Department of Statistics and the Department of Machine Learning 

Title: TBD

Abstract: TBD 


April 4, 2026 

Stephane Guerrier, Associate Professor at the University of Geneva Department on Statistics and Data Science 

Title: TBD

Abstract: TBD  

Fall 2025 Seminars

The Department of Statistics Seminar Series for the Fall 2025 semester will take place on Mondays at 2 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted. 

Zoom Link: 
https://pitt.zoom.us/j/91721753371
Meeting ID: 917 2175 3371


September 15, 2025

Yu-Xiang Wang, Professor at Halıcıoğlu Data Science Institute of UC San Diego 

Title: Automating Differential Privacy: New Algorithmic Recipes and How AI can help.

Abstract: Differential Privacy is one of the most promising approaches towards solving the data privacy issues in the AI era. However, the design and implementation of DP algorithms are hard and error-prone even for experts, especially if we want the best utility guarantees.  This talk will cover two recent projects in my group. The first is a new algorithmic primitive named “purification” which allows generic conversion of approximate-DP mechanisms into pure-DP mechanisms, which enables more flexible design of near-optimal and data-adaptive pure DP mechanism.  The second is an effort to use large-language models (LLMs) for semi- or fully-automated design and implementation of DP algorithms, as well as for checking the code and proofs for potential errors.


September 22, 2025

Cesare Miglioli, Postdoctoral Researcher at the University of Pittsburgh Department of Statistics 

Title: Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction.

Abstract: U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain. These include their inherently high computational cost—addressed partly through incomplete U-statistics—and their non-standard asymptotic behavior in the degenerate case, which typically requires resampling methods for hypothesis testing. This talk presents a novel perspective on U-statistics, grounded in hypergraph theory and combinatorial designs. Our approach bypasses the traditional Hoeffding decomposition, which is the the main analytical tool in this literature but is highly sensitive to degeneracy. By fully characterizing the dependence structure of a U-statistic, we derive a new Berry–Esseen bound that applies to all incomplete U-statistics based on deterministic designs, yielding conditions under which Gaussian limiting distributions can be established even in the degenerate case and when the order diverges. Moreover, we introduce efficient algorithms to construct incomplete U-statistics of equireplicate designs, a subclass of deterministic designs that, in certain cases, enable to achieve minimum variance. To illustrate the power of this novel framework, we apply it to kernel-based testing, focusing on the widely used two-sample Maximum Mean Discrepancy (MMD) test. Our approach leads to a permutation-free variant of the MMD test that delivers substantial computational gains while retaining statistical validity. 


October 6, 2025

Roberto Molinari, Assistant Professor at Auburn University Department of Mathematics and Statistics 

Title: More of Less: A Rashomon Algorithm for Sparse Model Sets

Abstract: The current paradigm of machine learning consists in finding a single best model to deliver predictions and, if possible, interpretations for a specific problem. This paradigm has however been strongly challenged in recent years through the study of the "Rashomon effect" which was coined initially by Leo Breiman. This phenomenon occurs when there exist many good predictive models for a given dataset/problem, with considerable practical implications in terms of interpretation, usability, variable importance, replicability and many others. The set of models (within a specific class of functions) which respect this definition is referred to as the "Rashomon set" and an important amount of recent work has been focused on ways of finding these sets as well as studying their properties. Developed in parallel to current research on the Rashomon Effect and motivated by sparse latent representations for high-dimensional problems, we present a heuristic procedure that aims to find sets of sparse models with good predictive power through a greedy forward-search that explores the low-dimensional variable space. Throughout this algorithm, good low-dimensional models identified from the previous steps are used to build models with more variables in the following steps. While preserving almost-equal performance with respect to a single reference model in a given class (i.e. a Rashomon set), the sparse model sets from this algorithm include diverse models which can be combined into networks that deliver additional layers of interpretation and new insights into how variable combinations can explain the Rashomon Effect.


October 20, 2025

Seonjoo Lee, Associate Professor at Columbia University Data Science Institute 

Title: Statistical methods for longitudinal neuroimaging data – longitudinal CCA

Abstract: This talk considers canonical correlation analysis for two or more longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis (LCCA) effectively recovers underlying correlation patterns between two high-dimensional longitudinal data sets. We will also discuss the recent extensions of LCCA to longitudinal categorical variables. 


October 27, 2025

Ethan Fang, Associate Professor at Duke University Department of Biostatistics & Bioinformatics  

Title: Offline Data-Driven Decision Making with Applications to Assortment Optimization: Estimation and Inference

Abstract: We present a unified offline decision making framework. In the first part, we consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, we propose an algorithm referred as Pessimistic ASsortment opTimizAtion (PASTA) following the spirit of pessimism. We show the algorithm identifies the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish a regret bound for the offline assortment optimization problem under the celebrated multinomial logit model and its generalizations, where the regret is shown to be minimax optimal. Joint work with Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, and Vahid Tarokh.

In the second part, we consider the inferential problem in assortment optimization. Uncertainty quantification for the optimal assortment is still largely unexplored despite its  great practical significance. Instead of estimating and recovering the complete optimal offer set, decision-makers may only be interested in testing whether a given property holds true for the optimal assortment, such as whether they should include several products of interest in the optimal set, or how many categories of products the optimal set should include. We proposes a novel inferential framework for testing such properties. We reduce inferring a general optimal assortment property to quantifying the uncertainty associated with the sign change point detection of the marginal revenue gaps. We show the asymptotic normality of the marginal revenue gap estimator, and construct a maximum statistic via the gap estimators to detect the sign change point. Joint work with Shuting Shen, Alex Belloni, Xi Chen, and Junwei Lu.


November 3, 2025

David Banks, Professor at Duke University Department of Statistical Science 

Title: The Future of Statistical Publication

Abstract: It is not clear that our publication practices serve the professional interests of our field.  Much has changed in recent decades.  We no longer access and share information in the way that we did in the 1990s.  Artificial intelligence is affecting both the submissions we receive and the reviews that are given. It is time for us to collectively rethink the costs and benefits of our current system.


November 10, 2025

Jing Lei, Professor at Carnegie Mellon University Department of Statistics and Data Science  

Title:  When Cross-Validation Meets Stability: Online Selection and Discrete Confidence Sets

Abstract: Cross-validation is one of the most widely used tools for model quality assessment and comparison. When combined with appropriate notions of stability, cross-validation can be adapted to solve many interesting inference problems. In this talk, I will describe two examples. The first is a variant of cross-validation, called "rolling validation," which can achieve superior model selection accuracy for bath data and is naturally extendable to online problems. The second is the construction of confidence sets in discrete population comparison or model selection problems. 


November 17, 2025 

Yaotian Wang, Postdoctoral Fellow at Emory University Department of Biostatistics and Bioinformatics 

Title: Statistical Learning for the Developing Brain Connectome across Large Cohorts

Abstract:  Late childhood to early adulthood is a critical period of brain maturation that shapes lifelong cognition and behavior. Understanding normal brain development during this period is crucial, as deviations from these processes may underlie cognitive impairments as well as neurological and psychiatric disorders. To elucidate brain development, I developed statistical learning methods to characterize the brain connectome, a network of interconnected brain regions. In this talk, I will present two approaches: (1) The first approach focuses on the functional connectome, where network edges represent functional connectivity (FC), the statistical associations in neural activity. I developed a Bayesian blind source separation (BSS) framework to decompose FC data into latent connectivity matrices that represent neural circuits associated with different brain functions. This BSS framework is the first to simultaneously account for connectome topology, incorporate neuroimaging domain knowledge, and capture population heterogeneity. (2) The second approach focuses on effective connectivity (EC), which is the directed influence exerted from one brain region on another. I developed an AI-enhanced task-aware directed acyclic graph learning framework to identify EC associated with downstream tasks, including a novel graph neural network message-passing mechanism tailored for directed edges and a unique pretraining procedure based on my Bayesian EC model. Applied to functional magnetic resonance imaging (fMRI) data from two large-scale brain studies, the Lifespan Human Connectome Project in Development and the Philadelphia Neurodevelopmental Cohort, these two approaches revealed novel and reproducible findings about the developing connectome.

Spring 2025 Seminars

The Department of Statistics Seminar Series for the Spring 2025 semester will take place on Mondays at 2 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted. 

Zoom Link: https://pitt.zoom.us/j/99769710291

Meeting ID: 997 6971 0291


March 24, 2025

Wei-Yin Loh Professor at University of Wisconsin-Madison 

Title: A Regression Tree Approach to Missing Data

Abstract: When data are incomplete, missing values are often imputed for those observations. We use a large COVID-19 electronic health record dataset to show that imputation can hide important information in the variables being imputed. When imputation is absolutely necessary, as in estimation of a population mean for a variable with missing values, we show how a regression tree algorithm called GUIDE can be used, without imputation of missing values in the predictor variables. Data from the Bureau of Labor Statistics and the Department of Agriculture are used to illustrate the method.


March 31, 2025 

Ali Shojaie Associate Chair of Biostatistics at the University of Washington

Title: A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning 

Abstract: We consider statistical inference under a semi-supervised setting with access to both labeled and unlabeled datasets and ask the question: under what circumstances, and by how much, can incorporating the unlabeled dataset improve upon inference using the labeled data? To answer this question, we investigate semi-supervised learning through the lens of semiparametric efficiency theory. We characterize the efficiency lower bound under the semi-supervised setting for an arbitrary inferential problem, and show that incorporating unlabeled data can potentially improve efficiency if the parameter is not well-specified. We then propose two types of semi-supervised estimators: a safe estimator that imposes minimal assumptions, is simple to compute, and is guaranteed to be at least as efficient as the initial supervised estimator; and an efficient estimator, which (under stronger assumptions) achieves the semiparametric efficiency bound. Our findings unify existing semiparametric efficiency results for particular special cases, and extend these results to a much more general class of problems. Moreover, we show that our estimators can flexibly incorporate predicted outcomes arising from “black-box” machine learning models, and thereby achieve the same goal as prediction-powered inference (PPI), but with superior theoretical guarantees. We also provide a complete understanding of the theoretical basis for the existing set of PPI methods. Finally, we apply the theoretical framework developed to derive and analyze efficient semi-supervised estimators in a number of settings, including M-estimation, U-statistics, and average treatment effect estimation, and demonstrate the performance of the proposed estimators in simulation.


April 7, 2025 

Huixia Wang Department Chair at George Washington University 

Title: Conformal Prediction in Non-Exchangeable Data Contexts

Abstract: Conformal prediction is a distribution-free method for uncertainty quantification that ensures finite sample guarantees. However, its validity relies on the assumption of data exchangeability. In this talk, I will introduce several conformal prediction approaches tailored for non-exchangeable data settings, including clustered data with missing responses, nonignorable missing data, and label shift data. To provide an asymptotic conditional coverage guarantee for a given subject, we propose constructing prediction regions by establishing the highest posterior density region of the target. This method is more accurate under complex error distributions, such as asymmetric and multimodal distributions, making it beneficial for personalized and heterogeneous scenarios. I will present some numerical results to illustrate their effectiveness.

Fall 2023 Seminars

The Department of Statistics Seminar Series for the Fall 2023 semester will take place on Mondays at 3 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted.


September 18, 2023 (via Zoom)

Phyllis Wan Assistant Professor at Erasmus University Rotterdam

Title: Graphical Lasso for Extremes

Abstract: Gaussian graphical lasso is a powerful tool for modeling sparse dependence structure for non-extreme data. For extreme data, Gaussian graphical model is not suitable. Instead, Huesler-Reiss graphical model was recently proposed as an alternative. The adaptation of graphical lasso in this scenario is not straightforward due to the different structure in parameter matrix. We propose a graphical lasso for Huesler-Reiss graphical model through a reparametrization. The estimator is solved via a penalized likelihood and enjoys convenient properties of the traditional graphical lasso: concentration equalities, fast computation and the ability to scale up to large dimensions.


September 11, 2023

Barry Nussbaum Adjunct Professor at University of Maryland Baltimore County

Title: It's Not What We Said, It's Not What They Heard, It's What They Say They Heard

Abstract: Statisticians have long known that success in our profession frequently depends on our ability to succinctly explain our results so decision makers may correctly integrate our efforts into their actions. However, this is no longer enough. While we still must make sure that we carefully present results and conclusions, the real difficulty is what the recipient thinks we just said. The situation becomes more challenging in the age of “big data”. This presentation will discuss what to do, and what not to do. Examples, including those used in court cases, executive documents, and material presented for the President of the United States, will illustrate the principles.

Spring 2023 Seminars

Unless otherwise noted, the Department of Statistics Seminar Series for the Spring 2023 semester will take place on Wednesdays at 3 PM EST. Please note that our public seminar series is abbreviated this term due to faculty recruitment season.


March 29, 2023

Nicole A. Lazar, Professor of Statistics at Penn State University 

Title: Hypothesis Testing for Shapes Using Vectorized Persistence Diagrams

Abstract: Topological data analysis involves the statistical characterization of data shape. One of the key tools for this purpose is persistent homology, which can be used to summarize the relevant features. In this talk, I will start with a quick survey of the topological data analysis approach and the inferential challenges it poses. I will then introduce a two-stage hypothesis test for vectorized persistence diagrams represented in Euclidean space. The method is flexible, and can be applied to a wide variety of data types. Furthermore, the proposed procedure yields more accurate and informative inference compared to other hypothesis testing methods for persistent homology.


April 5, 2023 (via Zoom)

Xueying Tang, Assistant Professor of Mathematics at University of Arizona

Title: Modeling sparsity using log-Cauchy priors

Abstract: Sparsity is often a desired structure for parameters in high-dimensional statistical problems. Within a Bayesian framework, sparsity is usually induced by spike-and-slab priors or global-local shrinkage priors. The latter choice is often expressed as a scale mixture of normal distributions. It marginally places a polynomial-tailed distribution on the parameter. In general, a heavy-tailed prior with significant probability mass around zero is preferred in estimating sparse parameters. In this talk, we consider a general class of priors, with the log Cauchy priors as a special case, in the normal mean estimation problem. This class of priors is proper while having a tail order arbitrarily close to one. The resulting posterior mean is a shrinkage estimator, and the posterior contraction rate is sharp minimax. We also demonstrate the performance of this class of priors on simulated and real datasets.


April 19, 2023

Martin Lindquist, Professor of Biostatistics at Johns Hopkins University 

Title: Individualized spatial topology in functional neuroimaging

Abstract: Neuroimaging is poised to take a substantial leap forward in understanding the neurophysiological underpinnings of human behavior, due to a combination of improved analytic techniques and the quality of imaging data. These advances are allowing researchers to develop population-level multivariate models of the functional brain representations underlying behavior, performance, clinical status and prognosis, and other outcomes. Population-based models can identify patterns of brain activity, or ‘signatures’, that can predict behavior and decode mental states in new individuals, producing generalizable knowledge and highly reproducible maps. These signatures can capture behavior with large effect sizes and can be used and tested across research groups.  However, the potential of such signatures is limited by neuroanatomical constraints, in particular individual variation in functional brain anatomy.  To circumvent this problem, current models are either applied only to individual participants, severely limiting generalizability, or force participants’ data into anatomical reference spaces (atlases) that do not respect individual functional topology and boundaries.  Here we seek to overcome this shortcoming by developing new topological models for inter-subject alignment, which register participants’ functional brain maps to one another. This increases effective spatial resolution, and more importantly allow us to explicitly analyze the spatial topology of functional maps make inferences on differences in activation location and shape across persons and psychological states. In this talk we discuss several approaches towards functional alignment and highlight promises and pitfalls.


April 26, 2023

Snigdha Panigrahi, Assistant Professor of Statistics at University of Michigan

Title: Approximate selective inference via maximum likelihood

Abstract: Several strategies have been developed recently to ensure valid inferences after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this talk, we will address the problem of selective inference through approximate maximum likelihood estimation. 

Our goal is to: (i) efficiently utilize hold-out information from selection with the aid of randomization, (ii) bypass expensive MCMC sampling from exact conditional distributions that are hard to evaluate in closed forms. At the core of our new method is the solution to a convex optimization problem which assumes a separable form across multiple learning queries during selection. We illustrate the potential of our method across wide-ranging values of signal-to-noise ratio in simulated experiments.


Tuesday, June 6, 2023

Jason Fine, Visiting Researcher at National Cancer Institute

Title: Emerging Areas for Competing Risks in Nested Case Control Studies

Abstract: This talk will overview some emerging statistical area for competing risks in nested case control studies, which are popular in cancer research. A key issue in the design of such studies is the definition of a "case". We consider an alternative "case" definition to the usual definition based on not having failed from any of the failure types of interest. Each failure type may have a marginal "case" definition. Issues of endpoint definition, efficiency and interpretability are presented, with a focus on unification of the two case definitions in a single modelling framework. A second issue is that it is common that many nested case control studies may be conducted using a common dataset. The potential for efficiency gain when combining data from nested case-control studies on different cancer types has not been fully explored. I will discuss a model speification based on Lunn-McNeil proportional risk model which can be applied in the nested case control setting. The main idea is that efficiency may be gained by specifying a single proportional hazards model which subsumes
data from the multiple nested case control studies. Large gains in efficiency may be achieved, particularly for event types which are rare. Finally, I consider the issue of secondary endpoints in competing risks nested case control studies, with a focus on tting Fine-Gray model for the cumulative incidence function using data from a standard nested case-control study, where the primary endpoint is the cause specific hazard function. I will heuristically sketch an approach based on weighting and describe some of the complicating factors which distinguish this analysis from previous analyses of secondary endpoints in nested case-control studies.

Please be sure to check back for updates, or email srh75@pitt.edu to be added to the Seminar Series mailing list. For any virtual seminars, a Zoom link will be sent to the mailing list.