Fall 2022 Seminars

The Department of Statistics Seminar Series for the Fall 2022 semester will take place on Wednesdays at 3 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted.

August 31, 2022 (via Zoom)

Siyuan Ma, Postdoctoral Researcher at University of Pennsylvania

Title: Modelling the Joint Distribution of Compositional Microbiome Data

Abstract: Microbiome epidemiology demands generative models of community profiles for study design considerations such as power analysis. We developed SparseDOSSA, a statistical model that parameterizes microbial communities and can be used to simulate new, realistic profiles to inform study designs. Our model connects zero-inflated marginals with a Gaussian copula, and has an additional renormalization component. As such, it uniquely satisfies common compositional, zero-inflation, and interaction properties of microbiome data. We demonstrate that SparseDOSSA accurately models human-associated microbiomes, and can generate realistic synthetic communities with prescribed population and ecological structures. We provide an open-source implementation for SparseDOSSA, which can be used in practice for power analysis and method benchmarking to inform microbiome study designs


September 14, 2022

Peter Song, Professor of Biostatistics at University of Michigan

Title: Method Of Contraction-Expansion (MOCE) For Simultaneous Inference in Linear Models

Abstract: Simultaneous inference after model selection is of critical importance to address scientific hypotheses involving a set of parameters. We consider a high-dimensional linear regression model in which a regularization procedure such as LASSO is applied to yield a sparse model. To establish a simultaneous post-model selection inference, we propose a method of contraction and expansion (MOCE) along the line of debiasing estimation in that we investigate a desirable trade-off between model selection variability and sample variability by the means of forward screening. We establish key theoretical results for the inference from the proposed MOCE procedure. Once the expanded model is properly selected, the theoretical guarantees and simultaneous confidence regions can be constructed by the joint asymptotic normal distribution. In comparison with existing methods, our proposed method exhibits stable and reliable coverage at a nominal significance level and enjoys substantially less computational burden. Thus, our MOCE approach is trustworthy in solving real-world problems. This is a joint work with Wang, Zhou and Tang.


September 21, 2022

Arun Kumar Kuchibhotla, Assistant Professor of Statistics and Data Science at CMU

Title: Median bias, HulC, and valid inference

Abstract: Confidence intervals for functionals are an integral part of statistical inference. Traditional methods of constructing confidence intervals rely on studying the limiting distribution of an estimator of the functional and then estimating the unknown parameters of the limiting distribution. This is crucial for methods including the Wald intervals, bootstrap intervals, and the subsampling intervals (among many others). In this talk, I will argue that the median bias of an estimator (as opposed to the limiting distribution) is more central to the problem of performing inference. Firstly, we introduce an inference methodology called the HulC that uses the convex hull of several independent estimators as a confidence interval. Validity of HulC intervals only requires control of the median bias of the estimator and hence, is more widely applicable than the Wald, bootstrap, and subsampling. Secondly, we prove that (asymptotically) valid inference for a functional is possible if and only if there exists an estimator that is (asymptotically) median unbiased. The same holds true for uniformly valid (honest) inference. This leads to a new notion of regularity we call "median regularity" which is necessary for uniformly valid confidence intervals. The classical notion of regular estimators is not necessary for uniformly valid inference while median regularity is both necessary and sufficient.


September 28, 2022

Zhigang Yao, Associate Professor of Statistics and Data Science at the National University of Singapore and Center of Mathematical Sciences and Applications at Harvard University

Title: Principal sub-manifolds and beyond

Abstract: While classical statistics has dealt with observations which are real numbers or elements of a real vector space, nowadays many statistical problems of high interest in the sciences deal with the analysis of data which consist of more complex objects, taking values in spaces which are naturally not (Euclidean) vector spaces but which still feature some geometric structure. I will discuss the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. The aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. I will introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. We show the principal sub-manifold yields the usual principal components in Euclidean space. We illustrate how to find, use and interpret the principal sub-manifold, by which a principal boundary can be further defined for data sets on manifolds. 


October 12, 2022

Brian Caffo, Professor of Biostatistics at Johns Hopkins University

Title:AI for organoids and organoids for AI

Abstract: In this talk, we discuss unsupervised methods for studying brain organoids and functional neural systems. Particularly, we consider the important role that parsimony can play in non-parsimonious decompositions and non-linear embeddings. We consider a study of neurogenesis including in vivo, in vitro, single cell and bulk RNA sequencing. We contrast several methods for joint decompositions that share information across experiment and tissue types and contrast results in novel experiments not used in model training.

We conclude with a discussion of the role that in vitro neural systems can play in performing AI tasks. In this, we use multi-electrode arrays (MEAs) to study functioning brain organoids. Such use of organoids for biocomputing is a nascent and exciting field that we refer to as organoid intelligence.


October 26, 2022

Zhigen Zhao, Associate Professor of Statistics, Operations, and Data Science

Title: On the testing of multiple hypothesis in sliced inverse regression

Abstract: We consider the multiple testing of the general regression framework aiming at studying the relationship between a univariate response and a p-dimensional predictor. To test the hypothesis of the effect of each predictor, we construct a mirror statistic based on the estimator of the sliced inverse regression without assuming a model of the conditional distribution of the response. According to the developed limiting distribution results in this paper, we have shown that the mirror statistic is asymptotically symmetric with respect to zero under the null hypothesis. We then propose the Model-free Multiple testing procedure using Mirror statistics and show theoretically that the false discovery rate of this method is less than or equal to a designated level asymptotically. Numerical evidence has shown that the proposed method is much more powerful than its alternatives, subject to the control of the false discovery rate.


November 2, 2022

Yanyuan Ma, Professor of Statistics at Penn State University

Title: A Versatile Estimation Procedure without Estimating the Nonignorable Missingness Mechanism

Abstract: We consider the estimation problem in a regression setting where the outcome variable is subject to nonignorable missingness and identifiability is ensured by the shadow variable approach. We propose a versatile estimation procedure where modeling of missingness mechanism is completely bypassed. We show that our estimator is easy to implement and we derive the asymptotic theory of the proposed estimator. We also investigate some alternative estimators under different scenarios. Comprehensive simulation studies are conducted to demonstrate the finite sample performance of the method. We apply the estimator to a children's mental health study to illustrate its usefulness.


November 9, 2022

Kean Ming Tan, Assistant Professor of Statistics at University of Michigan

Title: Convolution-Type Smoothing Approach for Quantile Regression

Abstract: Quantile regression is a powerful tool for learning the relationship between a response variable and a multivariate predictor while exploring heterogeneous effects. However, the non-smooth piecewise linear loss function introduces challenges to the computational aspect when the number of covariates is large. To address the aforementioned challenge, we propose a convolution-type smoothing approach that turns the non-differentiable quantile piecewise linear loss function into a twice- differentiable, globally convex, and locally strongly convex surrogate, which admits a fast and scalable gradient-based algorithm to perform optimization. In the low-dimensional setting, we establish nonasymptotic error bounds for the resulting smoothed estimator. In the high-dimensional setting, we propose the concave regularized smoothed quantile regression estimator, which we solve using a multi-stage convex relaxation algorithm. Theoretically, we characterize both the algorithmic error due to non-convexity and statistical error for the resulting estimator simultaneously. We show that running the multi-stage algorithm for a few iterations will yield an estimator that achieves the oracle property. Our results suggest that the smoothing approach leads to a significant computational gain without a loss in statistical accuracy. 


November 16, 2022

Maggie Niu, Associate Professor and Director of the Statistical Consulting Center at Penn State University

Title: Learning Network Properties without Network Data -- A Correlated Network Scale-up Model

Abstract: The network scale-up method based on ``how many X's do you know?'' questions has gained popularity in estimating the sizes of hard-to-reach populations. The success of the method relies primarily on the easy nature of the data collection and the flexibility of the procedure, especially since the model does not require a sample from the target population, a major limitation of traditional size estimation models. In this talk, we propose a new network scale-up model which incorporates respondent and subpopulation covariates in a regression framework and includes a bias term that is correlated between subpopulations. We also introduce a new scaling procedure utilizing the correlation structure. In addition to estimating the unknown population sizes, our proposed model depicts people's social network patterns in an aggregated level without using the network data.


December 7, 2022

Leying Guan, Assistant Professor of Biostatistics at Yale University

Title: Localized conformal prediction

Abstract: We propose a new inference framework called localized conformal prediction. It generalizes the framework of conformal prediction and offers a single-test-sample adaptive construction by emphasizing a local region around it. Although there have been methods constructing heterogeneous prediction intervals by designing better conformal score functions, to our knowledge, this is the first work that introduces an adaptive nature to the inference framework itself. We prove that our proposal leads to an assumption-free and finite sample marginal coverage guarantee, as well as an approximate conditional coverage guarantee. Our proposal achieves asymptotic conditional coverage under suitable assumptions. The localized conformal prediction can be combined with many existing works in conformal prediction, including different types of conformal score constructions. We will demonstrate how to change from conformal prediction to localized conformal prediction in these related works and a potential gain via numerical examples.


Please be sure to check back for updates, or email srh75@pitt.edu to be added to the Seminar Series mailing list. For any virtual seminars, a Zoom link will be sent to the mailing list.