## Tuesday, September 15, 2020 at 4 PM

### Vivak Patel, University of Wisconsin Department of Statistics

Click here to view a recording of this seminar

Title: When Do We Stop SGD?

Abstract: Stochastic gradient descent (SGD) and related methods have become a staple in a broad class of optimization problems, even though the question of when to terminate the iterations is still open. In this talk, we will discuss why termination criteria are important, what the specific challenges are for developing rigorous termination criteria for SGD, and our recent efforts to address these challenges.

## Tuesday, September 22, 2020 at 3:30 PM

### Claire McKay Bowen, Lead Data Scientist, Privacy and Data Security at the Urban Institute

Click here to view a recording of this seminar

Title: Data Privacy in the Real World

Abstract: With recent misuses of data access such as the Facebook - Cambridge Analytica Scandal, society raises valid data privacy concerns when private companies and other entities gather their information. Statistical disclosure control (SDC) or limitation are methods that aim to release high-quality data products while preserving the confidentiality of sensitive data. These techniques have existed within the statistics field since the mid-twentieth century, but, over the past two decades, the data landscape has dramatically changed. Data adversaries (or intruders) can more easily reconstruct datasets and identify individuals from supposedly anonymized data with the advances in modern information infrastructure and computation. While traditional methods of SDC and secure data centers are still used extensively, varying opinions about procedures have been developed across academia, government, and industry and in different countries. A definition known as Differential Privacy (DP) has garnered much attention, and many researchers and data maintainers are moving to develop and implement differentially private methods. In this talk, I will introduce and survey what SDC and DP are and the current challenges in applying these methods to real world data. I will provide motivating examples such as the current collaboration with the Urban Institute and IRS to generate synthetic data (pseudo record data) of tax return data that is invaluable for analyzing US presidential candidates’ proposed tax policies.

## Tuesday, September 29, 2020 at 3:30 PM

### Jesús Arroyo Relión, postdoctoral fellow, Center for Imaging Science at Johns Hopkins University

Click here to view a recording of this seminar

Title: Simultaneous prediction and community detection for networks with application to neuroimaging

Abstract: Community structure in networks is observed in many different domains, and unsupervised community detection has received a lot of attention in the literature. Increasingly the focus of network analysis is shifting towards using network information in some other prediction or inference task rather than just analyzing the network itself. In neuroimaging applications, brain networks are available for multiple subjects and the goal is often to predict a phenotype of interest. Community structure is well known to be a feature of brain networks, typically corresponding to different regions of the brain responsible for different functions. There are standard parcellations of the brain into such regions, usually obtained by applying clustering methods to brain connectomes of healthy subjects. However, when the goal is predicting a phenotype or distinguishing between different conditions, these unsupervised communities from an unrelated set of healthy subjects may not be useful.

In this talk, I will present a method for supervised community detection, aiming to find a partition of the network into communities that are most useful for predicting a particular response. We use a block-structured regularization penalty combined with a prediction loss function, and compute the solution with a combination of a spectral method and an ADMM optimization algorithm. We show that the spectral clustering method recovers the correct communities under a weighted stochastic block model. The method performs well on both simulated and real brain networks, providing support for the idea of task-dependent brain regions. This is joint work with Elizaveta Levina

## Tuesday, October 6, 2020 at 3:30 PM

### Georgia Papadogeorgou, Assistant Professor, Department of Statistics, University of Florida

Click here to view a recording of this seminar

Title: Causal inference with spatio-temporal data: estimating the effects of airstrikes on insurgent violence in Iraq

Abstract: Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal processes with an infinite number of possible event locations at each point in time. We take up the challenge of extending the potential outcomes framework to these settings by formulating the treatment point process as stochastic intervention. Our causal estimands include the expected number of outcome events in a specified area of interest under a particular stochastic treatment assignment strategy. We develop an estimation technique that applies the inverse probability of treatment weighting method to spatially-smoothed outcome surfaces. We demonstrate that the proposed estimator is consistent and asymptotically normal as the number of time period approaches infinity. A primary advantage of our methodology is its ability to avoid structural assumptions about spatial spillover and temporal carryover effects. We use the proposed methods to estimate the effects of American airstrikes on insurgent violence in Iraq (February 2007 – July 2008). We find that increasing the average number of daily airstrikes for up to one month increases insurgent attacks across Iraq and within Baghdad. We also find evidence that airstrikes can displace attacks from Baghdad to new locations up to 400 kilometers away.

## Tuesday, October 13, 2020 at 3:30 PM

### Ran Dai, University of Nebraska Medical Center, Department of Biostatistics

Click here to view a recording of this seminar

Title: Statistical and Algorithmic Guarantees for the High Dimensional Single Index Model

Abstract: High dimensional semiparametric models are useful for analyses of data that are not only high-dimensional, but also exhibit complex data structures. One example of such models is the High Dimensional Single Index Model (HDSIM), which is more flexible than a parametric model, but also circumvents the “curse of dimensionality”, since its nonparametric element is a one dimensional isotonic regression. We are interested in the statistical and algorithmic convergence guarantee of the HDSIMs. In this presentation, we first talk about the bias of isotonic regression, which provides insights for open questions in the statistical properties of single index models (SIMs). Then we discuss a projected gradient descent algorithm for estimation of the HDSIMs, for which we show a polynomial-time algorithmic convergence guarantee and a statistical convergence guarantee for the estimation and prediction.

## Tuesday, October 20, 2020 at 3:30 PM

### David Hong, postdoctoral researcher, University of Pennsylvania (Wharton) Statistics

Click here to view a recording of this seminar

Title: Optimally weighted PCA for large-dimensional and heterogeneous data

Abstract: Modern applications increasingly involve high-dimensional and heterogeneous data. Datasets are often formed by combining numerous measurements from myriad sources of varying quality. Principal Component Analysis (PCA) is a standard and ubiquitous tool for discovering patterns in such large-dimensional data, but PCA does not robustly recover underlying components when samples have heterogeneous quality, i.e., noise variance. Specifically, PCA suffers from treating all data samples as if they are equally informative. We consider a weighted variant of PCA that gives noisier samples less weight. Using tools from random matrix theory, we analyze the asymptotic recovery of the underlying components as a function of the chosen weights in the large-dimensional regime, i.e., sample dimension comparable with the number of samples. Surprisingly, it turns out that whitening the noise by using inverse noise variance weights is suboptimal! We derive optimal weights, characterize the performance of weighted PCA, and consider the problem of optimally collecting samples under budget constraints.

## Tuesday, October 27, 2020 at 3:30 PM

### Cynthia Rudin, Professor of Computer Science, Electrical and Computer Engineering, and Statistical Science, Duke University

Click here to view a recording of this seminar

Title: The Extremes of Interpretability: Sparse Decision Trees and Scoring Systems

Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice, flawed models in healthcare, and black box loan decisions in finance. Transparency and interpretability of machine learning models is critical in high stakes decisions. In this talk, I will focus on two of the most fundamental and important problems in the field of interpretable machine learning: optimal sparse decision trees and optimal scoring systems. I will also briefly describe work on interpretable neural networks for computer vision.

Optimal sparse decision trees: We want to find trees that maximize accuracy and minimize the number of leaves in the tree (sparsity). This is an NP hard optimization problem with no polynomial time approximation. I will present the first practical algorithm for solving this problem, which uses a highly customized dynamic-programming-with-bounds procedure, computational reuse, specialized data structures, analytical bounds, and bit-vector computations.

Optimal scoring systems: Scoring systems are sparse linear models with integer coefficients. Traditionally, scoring systems have been designed using manual feature elimination on logistic regression models, with a post-processing step where coefficients have been rounded. However, this process can fail badly to produce optimal (or near optimal) solutions. I will present a novel cutting plane method for producing scoring systems from data. The solutions are globally optimal according to the logistic loss, regularized by the number of terms (sparsity), with coefficients constrained to be integers. Predictive models from our algorithm have been used for many medical and criminal justice applications, including in intensive care units in hospitals.

Interpretable neural networks for computer vision: We have developed a neural network that performs case-based reasoning. It aims to explains its reasoning process in a way that humans can understand, even for complex classification tasks such as bird identification.

## Tuesday, November 3, 2020 at 3:30 PM

### Ulises Pereira, postdoctoral researcher, NYU

Click here to view a recording of this seminar

Title: Attractors, chaos, sequences and meta-stable attractors in recurrent networks endowed with Hebbian learning

Abstract: During behavior, cortical circuits display a diverse repertoire of neuronal dynamics, including persistent activity, sequences, meta-stable states, and heterogeneous activity. It has been hypothesized that such dynamics could arise from unsupervised learning processes in which the synaptic strengths are modified via Hebbian learning from random synaptic inputs. In this talk, I will present a modeling framework in which I explore this hypothesis. Throughout the talk, I will highlight insights from the mean-field theories developed for analyzing these networks. First, I will present a recurrent network model endowed with Hebbian learning in which both learning rules and the distribution of stored patterns are inferred from distributions of visual responses for novel and familiar images in the monkey inferior temporal cortex (ITC). We show that two types of retrieval states exist: one in which firing rates are constant in time (fixed-point attractors), and another in which firing rates fluctuate chaotically. Consistent with what has been observed in ITC, fixed-point attractor retrieval states exhibit distributions of firing rates that are close to lognormal, while chaotic retrieval states present irregular temporal dynamics that strongly resemble the temporal variability observed during delay periods in the frontal cortex. Second, I will describe the effect of introducing temporally asymmetry in the learning process. In this scenario, instead of fixed-point or chaotic attractors, the network naturally learns sequences of activity reflected in the transient correlation of network activity with each of the stored patterns. Interestingly, sequences maintain robust decodability, but display highly labile dynamics, when synaptic connectivity is continuously modified due to noise or storage of other patterns, similar to recent observations in the hippocampus and parietal cortex. Third, I will show that low-dimensional correlated variability leads to sequences of meta-stable attractors in a network endowed with both a temporally symmetric and asymmetric Hebbian learning processes. The dynamics displayed by this network recapitulates several statistical features extracted from recordings in the rat motor cortex during self-initiated behavior, suggesting a new neuronal mechanism for accounting the behavioral variability underlying self-initiated actions.

## Tuesday, November 10, 2020 at 3:30 PM

### Max Goplerud, Assistant Professor of Political Science at Pitt

Click here to view a recording of this seminar

Abstract: Estimating non-linear hierarchical models can be computationally burdensome in the presence of large datasets and many non-nested random effects. Popular inferential techniques may take hours to fit even relatively straightforward models. This paper provides two contributions to scalable and accurate inference. First, I propose a new mean-field algorithm for estimating logistic hierarchical models with an arbitrary number of non-nested random effects. Second, I propose "marginally augmented variational Bayes" (MAVB) that further improves the initial approximation through a post processing step. I show that MAVB provides a guaranteed improvement in the approximation quality at low computational cost and induces dependencies that were assumed away by the initial factorization assumptions. I apply these techniques to a study of voter behavior. Existing estimation took hours whereas the algorithms proposed run in minutes. The posterior means are well-recovered even under strong factorization assumptions. Applying MAVB further improves the approximation by partially correcting the under-estimated variance. The proposed methodology is implemented in an open source software package.

## Tuesday, November 17, 2020 at 3:30 PM

### Jiebiao Wang, Assistant Professor of Biostatistics at Pitt

Click here to view a recording of this seminar

Title: Bayesian estimation of cell-type-specific gene expression for each tissue sample with prior derived from single-cell data

Abstract: When assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by evaluating gene expression in each cell. Both data types lend insights into disease etiology. With current technologies, however, scRNA-seq data are known to be noisy. Moreover, constrained by costs, scRNA-seq data are typically generated from a relatively small number of samples, which limits their utility for some analyses, such as identification of gene expression quantitative trait loci (eQTLs). To address these issues while maintaining the unique advantages of each data type, we develop a Bayesian method (bMIND) to integrate bulk and scRNA-seq data. With a prior derived from scRNA-seq data, we propose to estimate sample-level cell-type-specific (CTS) expression from bulk expression data. The CTS expression enables large-scale sample-level downstream analyses, such as detecting CTS differentially expressed genes (DEGs), CTS eQTLs, and CTS networks. Through simulations, we demonstrate that bMIND improves the accuracy of sample-level CTS expression estimates and power to discover CTS-DEGs when compared to existing methods. To further our understanding of two complex phenotypes, autism and Alzheimer’s disease, we apply bMIND to gene expression data of relevant brain tissue to identify CTS-DEGs. Our results complement findings for CTS-DEGs obtained from scRNA-seq studies, replicating certain DEGs in specific cell types while nominating other novel genes in those cell types. Finally, we calculate CTS-eQTLs for eleven brain regions by analyzing the GTEx V8 genotype and gene expression data, creating a new resource for biological insights.