The Department of Statistics Seminar Series for the Fall 2022 semester will take place on Wednesdays at 3 PM EST. Events will be a mixture of in-person (with a remote viewing option) unless otherwise noted.
August 31, 2022 (via Zoom)
Title: Modelling the Joint Distribution of Compositional Microbiome Data
Abstract: Microbiome epidemiology demands generative models of community profiles for study design considerations such as power analysis. We developed SparseDOSSA, a statistical model that parameterizes microbial communities and can be used to simulate new, realistic profiles to inform study designs. Our model connects zero-inflated marginals with a Gaussian copula, and has an additional renormalization component. As such, it uniquely satisfies common compositional, zero-inflation, and interaction properties of microbiome data. We demonstrate that SparseDOSSA accurately models human-associated microbiomes, and can generate realistic synthetic communities with prescribed population and ecological structures. We provide an open-source implementation for SparseDOSSA, which can be used in practice for power analysis and method benchmarking to inform microbiome study designs
September 14, 2022
Title: Method Of Contraction-Expansion (MOCE) For Simultaneous Inference in Linear Models
Abstract: Simultaneous inference after model selection is of critical importance to address scientific hypotheses involving a set of parameters. We consider a high-dimensional linear regression model in which a regularization procedure such as LASSO is applied to yield a sparse model. To establish a simultaneous post-model selection inference, we propose a method of contraction and expansion (MOCE) along the line of debiasing estimation in that we investigate a desirable trade-off between model selection variability and sample variability by the means of forward screening. We establish key theoretical results for the inference from the proposed MOCE procedure. Once the expanded model is properly selected, the theoretical guarantees and simultaneous confidence regions can be constructed by the joint asymptotic normal distribution. In comparison with existing methods, our proposed method exhibits stable and reliable coverage at a nominal significance level and enjoys substantially less computational burden. Thus, our MOCE approach is trustworthy in solving real-world problems. This is a joint work with Wang, Zhou and Tang.
September 21, 2022
Title: Median bias, HulC, and valid inference
Abstract: Confidence intervals for functionals are an integral part of statistical inference. Traditional methods of constructing confidence intervals rely on studying the limiting distribution of an estimator of the functional and then estimating the unknown parameters of the limiting distribution. This is crucial for methods including the Wald intervals, bootstrap intervals, and the subsampling intervals (among many others). In this talk, I will argue that the median bias of an estimator (as opposed to the limiting distribution) is more central to the problem of performing inference. Firstly, we introduce an inference methodology called the HulC that uses the convex hull of several independent estimators as a confidence interval. Validity of HulC intervals only requires control of the median bias of the estimator and hence, is more widely applicable than the Wald, bootstrap, and subsampling. Secondly, we prove that (asymptotically) valid inference for a functional is possible if and only if there exists an estimator that is (asymptotically) median unbiased. The same holds true for uniformly valid (honest) inference. This leads to a new notion of regularity we call "median regularity" which is necessary for uniformly valid confidence intervals. The classical notion of regular estimators is not necessary for uniformly valid inference while median regularity is both necessary and sufficient.
September 28, 2022
Zhigang Yao, Associate Professor of Statistics and Data Science at the National University of Singapore and Center of Mathematical Sciences and Applications at Harvard University
Title: Principal sub-manifolds and beyond
Abstract: While classical statistics has dealt with observations which are real numbers or elements of a real vector space, nowadays many statistical problems of high interest in the sciences deal with the analysis of data which consist of more complex objects, taking values in spaces which are naturally not (Euclidean) vector spaces but which still feature some geometric structure. I will discuss the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. The aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. I will introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. We show the principal sub-manifold yields the usual principal components in Euclidean space. We illustrate how to find, use and interpret the principal sub-manifold, by which a principal boundary can be further defined for data sets on manifolds.
October 12, 2022
Title:AI for organoids and organoids for AI
Abstract: In this talk, we discuss unsupervised methods for studying brain organoids and functional neural systems. Particularly, we consider the important role that parsimony can play in non-parsimonious decompositions and non-linear embeddings. We consider a study of neurogenesis including in vivo, in vitro, single cell and bulk RNA sequencing. We contrast several methods for joint decompositions that share information across experiment and tissue types and contrast results in novel experiments not used in model training.
We conclude with a discussion of the role that in vitro neural systems can play in performing AI tasks. In this, we use multi-electrode arrays (MEAs) to study functioning brain organoids. Such use of organoids for biocomputing is a nascent and exciting field that we refer to as organoid intelligence.
October 26, 2022
Title: On the testing of multiple hypothesis in sliced inverse regression
Abstract: We consider the multiple testing of the general regression framework aiming at studying the relationship between a univariate response and a p-dimensional predictor. To test the hypothesis of the effect of each predictor, we construct a mirror statistic based on the estimator of the sliced inverse regression without assuming a model of the conditional distribution of the response. According to the developed limiting distribution results in this paper, we have shown that the mirror statistic is asymptotically symmetric with respect to zero under the null hypothesis. We then propose the Model-free Multiple testing procedure using Mirror statistics and show theoretically that the false discovery rate of this method is less than or equal to a designated level asymptotically. Numerical evidence has shown that the proposed method is much more powerful than its alternatives, subject to the control of the false discovery rate.
November 2, 2022
Title: A Versatile Estimation Procedure without Estimating the Nonignorable Missingness Mechanism
Abstract: We consider the estimation problem in a regression setting where the outcome variable is subject to nonignorable missingness and identifiability is ensured by the shadow variable approach. We propose a versatile estimation procedure where modeling of missingness mechanism is completely bypassed. We show that our estimator is easy to implement and we derive the asymptotic theory of the proposed estimator. We also investigate some alternative estimators under different scenarios. Comprehensive simulation studies are conducted to demonstrate the finite sample performance of the method. We apply the estimator to a children's mental health study to illustrate its usefulness.
November 9, 2022
Title: Convolution-Type Smoothing Approach for Quantile Regression
Abstract: Quantile regression is a powerful tool for learning the relationship between a response variable and a multivariate predictor while exploring heterogeneous effects. However, the non-smooth piecewise linear loss function introduces challenges to the computational aspect when the number of covariates is large. To address the aforementioned challenge, we propose a convolution-type smoothing approach that turns the non-differentiable quantile piecewise linear loss function into a twice- differentiable, globally convex, and locally strongly convex surrogate, which admits a fast and scalable gradient-based algorithm to perform optimization. In the low-dimensional setting, we establish nonasymptotic error bounds for the resulting smoothed estimator. In the high-dimensional setting, we propose the concave regularized smoothed quantile regression estimator, which we solve using a multi-stage convex relaxation algorithm. Theoretically, we characterize both the algorithmic error due to non-convexity and statistical error for the resulting estimator simultaneously. We show that running the multi-stage algorithm for a few iterations will yield an estimator that achieves the oracle property. Our results suggest that the smoothing approach leads to a significant computational gain without a loss in statistical accuracy.
November 16, 2022
Maggie Niu, Associate Professor and Director of the Statistical Consulting Center at Penn State University
Title: Learning Network Properties without Network Data -- A Correlated Network Scale-up Model
Abstract: The network scale-up method based on ``how many X's do you know?'' questions has gained popularity in estimating the sizes of hard-to-reach populations. The success of the method relies primarily on the easy nature of the data collection and the flexibility of the procedure, especially since the model does not require a sample from the target population, a major limitation of traditional size estimation models. In this talk, we propose a new network scale-up model which incorporates respondent and subpopulation covariates in a regression framework and includes a bias term that is correlated between subpopulations. We also introduce a new scaling procedure utilizing the correlation structure. In addition to estimating the unknown population sizes, our proposed model depicts people's social network patterns in an aggregated level without using the network data.
December 7, 2022
Title: Localized conformal prediction
Abstract: We propose a new inference framework called localized conformal prediction. It generalizes the framework of conformal prediction and offers a single-test-sample adaptive construction by emphasizing a local region around it. Although there have been methods constructing heterogeneous prediction intervals by designing better conformal score functions, to our knowledge, this is the first work that introduces an adaptive nature to the inference framework itself. We prove that our proposal leads to an assumption-free and finite sample marginal coverage guarantee, as well as an approximate conditional coverage guarantee. Our proposal achieves asymptotic conditional coverage under suitable assumptions. The localized conformal prediction can be combined with many existing works in conformal prediction, including different types of conformal score constructions. We will demonstrate how to change from conformal prediction to localized conformal prediction in these related works and a potential gain via numerical examples.
Please be sure to check back for updates, or email email@example.com to be added to the Seminar Series mailing list. For any virtual seminars, a Zoom link will be sent to the mailing list.