## Assistant Professor of Statistics

**Research Interests**

Dependence modeling, copulas, high-dimensional statistics, non-parametric statistics, kernel smoothing, statistical modeling of conditional distributions

Skip to content
# Alexis Derumigny

## Assistant Professor of Statistics

**Research Interests**

## Biography

### Jobs

### Education

## Publications

## Preprints and submitted articles

## Conferences & communications:

## Teaching

### 2020 – 2021:

### 2018 – 2019:

### 2017 – 2018:

### 2016 – 2017:

Dependence modeling, copulas, high-dimensional statistics, non-parametric statistics, kernel smoothing, statistical modeling of conditional distributions

- February 2021 – now: Assistant Professor of Statistics at Delft University of Technology (Delft, Netherlands)
- August 2019 – January 2021: Researcher at the University of Twente (Enschede, Netherlands)
- October 2016 – July 2019: Teaching Assistant (“Chargé de TD”) at ENSAE ParisTech (Palaiseau, France)
- May 2016 – September 2016: Graduate Research Intern, CREST (Palaiseau, France)
- June 2015 – January 2016: Quantitative Analyst Intern, Meteo Protect (Paris, France)

- 2016 – 2019: PhD Student in the Laboratory of Statistics and the Laboratory of Finance-Insurance at CREST and Université Paris-Saclay under the joint supervision of Alexandre Tsybakov and Jean-David Fermanian.Thesis: “Some statistical results in high-dimensional dependence modeling” (“Contributions à l’analyse statistique des modèles de dépendance en grande dimension”)
- 2013 – 2016: M.Sc. in Probability, Statistics, Economics and Finance at ENSAE ParisTech
- 2011 – 2013: Preparatory class for entrance to graduate schools (“Grandes Écoles”), Lycée Henri IV, MPSI-MP*

Testing for equality between conditional copulas given discretized conditioning events, with Jean-David Fermanian and Aleksey Min (Technical University of Munich). Canadian Journal of Statistics (

**2022**).

**Abstract:** ” Several procedures have been recently proposed to test the simplifying assumption for conditional copulas. Instead of considering pointwise conditioning events, we study the constancy of the conditional dependence structure when some covariates belong to general Borel conditioning subsets. We introduce several test statistics based on the equality of conditional Kendall’s taus and derive their asymptotic distributions under the null hypothesis. In settings where such conditioning events are not fixed *ex ante*, we propose a data-driven procedure to recursively build such relevant subsets. This procedure is based on decision trees that maximize the differences between the conditional Kendall’s taus, which correspond to the leaves of the trees. Empirical results for such tests are illustrated in the supplementary materials.
Moreover, a study of the conditional dependence between financial stock returns is presented, and highlights specific contagion effects of past returns. The last application deals with conditional dependence between coverage amounts in an insurance dataset. “

GitHub page of the package: https://github.com/AlexisDerumigny/CondCopulas

Also available on CRAN at: https://cran.r-project.org/package=CondCopulas. The algorithms proposed in this article are available in the R functions `bCond.simpA.CKT()`

and `bCond.treeCKT()`

of the `CondCopulas`

package.

Conditional empirical copula processes and generalized dependence measures, with Jean-David Fermanian. Electronic Journal of Statistics, 16(2): 5692-5719 (

**2022**)

**Abstract:** ” We study the weak convergence of conditional empirical copula processes indexed by general families of conditioning events that have non zero probabilities. Moreover, we also study the case where the conditioning events are chosen in a data-driven way.
The validity of several bootstrap schemes is stated, including the exchangeable bootstrap. We define general multivariate measures of association, possibly given some fixed or random conditioning events.
By applying our theoretical results, we prove the asymptotic normality of the estimators of such measures. We detail the link between pointwise conditional copulas and conditional copulas indexed by general events and their application in statistical methodology. We illustrate our results with financial data. “

- Spatial clustering of waste reuse in a circular economy: A spatial autocorrelation analysis on locations of waste reuse in the Netherlands using global and local Moran’s I, with Tanya Tsui, David Peck, Arjan van Timmeren and Alexander Wandl. Frontiers in Built Environment (
**2022**).

**Abstract:** ” In recent years, implementing a circular economy in cities has been considered by policy makers as a potential solution for achieving sustainability. Existing literature on circular cities is mainly focused on two perspectives: urban governance and urban metabolism. Both these perspectives, to some extent, miss an understanding of space. A spatial perspective is important because circular activities, such as the recycling, reuse, or storage of materials, require space and have a location. It is therefore useful to understand where circular activities are located, and how they are affected by their location and surrounding geography. This study therefore aims to understand the existing state of waste reuse activities in the Netherlands from a spatial perspective, by analyzing the degree, scale, and locations of spatial clusters of waste reuse. This was done by measuring the spatial autocorrelation of waste reuse locations using global and local Moran’s I, with waste reuse data from the national waste registry of the Netherlands. The analysis was done for 10 material types: minerals, plastic, wood and paper, fertilizer, food, machinery and electronics, metal, mixed construction materials, glass, and textile. It was found that all materials except for glass and textiles formed spatial clusters. By varying the grid cell sizes used for data aggregation, it was found that different materials had different “best fit” cell sizes where spatial clustering was the strongest. The best fit cell size is ∼7 km for materials associated with construction and agricultural industries, and ∼20–25 km for plastic and metals.The best fit cell sizes indicate the average distance of companies from each other within clusters, and suggest a suitable spatial resolution at which the material can be understood. Hotspot maps were also produced for each material to show where reuse activities are most spatially concentrated. “

- Identifiability and estimation of meta-elliptical copula generators, with Jean-David Fermanian. Journal of Multivariate Analysis, 190, article 104962 (
**2022**).

**Abstract:** ” Meta-elliptical copulas are often proposed to model dependence between the components of a random vector. They are specified by a correlation matrix and a map g, called a density generator. When the latter correlation matrix can easily be estimated from pseudo-samples of observations, this is not the case for the density generator when it does not belong to a parametric family. We state sufficient conditions to non-parametrically identify this generator. Several nonparametric estimators of g are then proposed, by M-estimation, simulation-based inference or by an iterative procedure available in a R package. Some simulations illustrate the relevance of the latter method. “

GitHub page of the package: https://github.com/AlexisDerumigny/ElliptCopulas

Also available on CRAN at: https://cran.r-project.org/package=ElliptCopulas

- Estimation of copulas via Maximum Mean Discrepancy, with Pierre Alquier (RIKEN AIP), Badr-Eddine Chérief-Abdellatif (University of Oxford), and Jean-David Fermanian. Journal of the American Statistical Association (
**2022**).

**Abstract:** ” This paper deals with robust inference for parametric copula models. Estimation using Canonical Maximum Likelihood might be unstable, especially in the presence of outliers. We propose to use a procedure based on the Maximum Mean Discrepancy (MMD) principle. We derive non-asymptotic oracle inequalities, consistency and asymptotic normality of this new estimator. In particular, the oracle inequality holds without any assumption on the copula family, and can be applied in the presence of outliers or under misspecification. Moreover, in our MMD framework, the statistical inference of copula models for which there exists no density with respect to the Lebesgue measure on [0,1]^{d}, as the Marshall-Olkin copula, becomes feasible. A simulation study shows the robustness of our new procedures, especially compared to pseudo-maximum likelihood estimation. An R package implementing the MMD estimator for copula models is available. “

GitHub page of the package: https://github.com/AlexisDerumigny/MMDCopula

Also available on CRAN at: https://cran.r-project.org/package=MMDCopula

The R scripts to reproduce the simulations and the figures of the paper are available at https://github.com/AlexisDerumigny/Reproducibility-EstimationOfCopulasViaMMD.

- A multifunctional matching algorithm for sample design in agricultural plots, with N. Ohana-Levi, A. Peeters, A. Ben-Gal, I. Bahat, L. Katz, Y. Netzer, A. Naor, Y. Cohen. Computers and Electronics in Agriculture, 187, article 106262 (
**2021**).

**Abstract:** ” Collection of accurate and representative data from agricultural fields is required for efficient crop management. Since growers have limited available resources, there is a need for advanced methods to select representative points within a field in order to best satisfy sampling or sensing objectives. The main purpose of this work was to develop a data-driven method for selecting locations across an agricultural field given observations of some covariates at every point in the field. These chosen locations should be representative of the distribution of the covariates in the entire population and represent the spatial variability in the field. They can then be used to sample an unknown target feature whose sampling is expensive and cannot be realistically done at the population scale.

An algorithm for determining these optimal sampling locations, namely the multifunctional matching (MFM) criterion, was based on matching of moments (functionals) between sample and population. The selected functionals in this study were standard deviation, mean, and Kendall’s tau. An additional algorithm defined the minimal number of observations that could represent the population according to a desired level of accuracy. The MFM was applied to datasets from two agricultural plots: a vineyard and a peach orchard. The data from the plots included measured values of slope, topographic wetness index, normalized difference vegetation index, and apparent soil electrical conductivity. The MFM algorithm selected the number of sampling points according to a representation accuracy of 90% and determined the optimal location of these points. The algorithm was validated against values of vine or tree water status measured as crop water stress index (CWSI). Algorithm performance was then compared to two other sampling methods: the conditioned Latin hypercube sampling (cLHS) model and a uniform random sample with spatial constraints. Comparison among sampling methods was based on measures of similarity between the target variable population distribution and the distribution of the selected sample.

GitHub page of the package: https://github.com/AlexisDerumigny/MFunctMatching

- On Kendall’s regression, with Jean-David Fermanian. Journal of Multivariate Analysis, 178, article 104610 (
**2020**).

**Abstract:** ” Conditional Kendall’s tau is a measure of dependence between two random variables, conditionally on some covariates. We assume a regression-type relationship between conditional Kendall’s tau and some covariates, in a parametric setting with a large number of transformations of a small number of regressors. This model may be sparse, and the underlying parameter is estimated through a penalized criterion and a two-step inference procedure. We prove non-asymptotic bounds with explicit constants that hold with high probabilities. We derive the consistency of the latter estimator, its asymptotic law and some oracle properties. Some simulations and applications to real data conclude the paper. “

- On kernel-based estimation of conditional Kendall’s tau: finite-distance bounds and asymptotic behavior, with Jean-David Fermanian. Dependence Modeling, 7:292–321 (
**2019**).

**Abstract:** ” We study nonparametric estimators of conditional Kendall’s tau, a measure of concordance between two random variables given some covariates. We prove non-asymptotic pointwise and uniform bounds, that hold with high probabilities. We provide “direct proofs” of the consistency and the asymptotic law of conditional Kendall’s tau. A simulation study evaluates the numerical performance of such nonparametric estimators. An application to the dependence between energy consumption and temperature conditionally to calendar days is finally provided. “

- A classification point-of-view about conditional Kendall’s tau, with Jean-David Fermanian. Computational Statistics & Data Analysis, 135:70-94 (
**2019**).

**Abstract:** ” It is shown how the problem of estimating conditional Kendall’s tau can be rewritten as a classification task. Conditional Kendall’s tau is a conditional dependence parameter that is a characteristic of a given pair of random variables. The goal is to predict whether the pair is concordant (value of 1) or discordant (value of -1) conditionally on some covariates. The consistency and the asymptotic normality of a family of penalized approximate maximum likelihood estimators is proven, including the equivalent of the logit and probit regressions in our framework. Specific algorithms are detailed, adapting usual machine learning techniques, including nearest neighbors, decision trees, random forests and neural networks, to the setting of the estimation of conditional Kendall’s tau. Finite sample properties of these estimators and their sensitivities to each component of the data-generating process are assessed in a simulation study. Finally, all these estimators are applied to a dataset of European stock indices. “

- Improved bounds for Square-Root Lasso and Square-Root Slope. Electronic Journal of Statistics, 12:741-766 (
**2018**).

**Abstract:** ” Extending the results of Bellec, Lecué and Tsybakov to the setting of sparse high-dimensional linear regression with unknown variance, we show that two estimators, the Square-Root Lasso and the Square-Root Slope can achieve the optimal minimax prediction rate, which is *(s/n) log(p/s)*, up to some constant, under some mild conditions on the design matrix. Here, *n* is the sample size, *p* is the dimension and *s *is the sparsity parameter. We also prove optimality for the estimation error in the *l _{q}*-norm, with

- About tests of the “simplifying” assumption for conditional copulas, with Jean-David Fermanian. Dependence Modeling, 5:154-197 (
**2017**).

**Abstract:** ” We discuss the so-called “simplifying assumption” of conditional copulas in a general framework. We introduce several tests of the latter assumption for non- and semiparametric copula models. Some related test procedures based on conditioning subsets instead of point-wise events are proposed. The limiting distribution of such test statistics under the null are approximated by several bootstrap schemes, most of them being new. We prove the validity of a particular semiparametric bootstrap scheme. Some simulations illustrate the relevance of our results. “

- Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion
, with Lucas Girard (CREST-ENSAE) and Yannick Guyonvarch (INRAE) (
**2022**)

**Abstract:** ” In this article, we obtain explicit bounds on the uniform distance between the cumulative distribution function of a standardized sum S_n of n independent centered random variables with moments of order four and its first-order Edgeworth expansion. Those bounds are valid for any sample size with n^{-1/2} rate under moment conditions only and n^{-1} rate under additional regularity constraints on the tail behavior of the characteristic function of S_n. In both cases, the bounds are further sharpened if the variables involved in S_n are unskewed. We also derive new Berry-Esseen-type bounds from our results and discuss their links with existing ones. We finally apply our results to illustrate the lack of finite-sample validity of one-sided tests based on the normal approximation of the mean. “

- Fast estimation of Kendall’s tau and conditional Kendall’s tau matrices under structural assumptions, with Rutger van der Spek (
**2022**)

**Abstract:** ” Kendall’s tau and conditional Kendall’s tau matrices are multivariate (conditional) dependence measures between the components of a random vector. For large dimensions, available estimators are computationally expensive and can be improved by averaging. Under structural assumptions on the underlying Kendall’s tau and conditional Kendall’s tau matrices, we introduce new estimators that have a significantly reduced computational cost while keeping a similar error level. In the unconditional setting we assume that, up to reordering, the underlying Kendall’s tau matrix is block-structured with constant values in each of the off-diagonal blocks. Consequences on the underlying correlation matrix are then discussed. The estimators take advantage of this block structure by averaging over (part of) the pairwise estimates in each of the off-diagonal blocks. Derived explicit variance expressions show their improved efficiency. In the conditional setting, the conditional Kendall’s tau matrix is assumed to have a constant block structure, independently of the conditioning variable. Conditional Kendall’s tau matrix estimators are constructed similarly as in the unconditional case by averaging over (part of) the pairwise conditional Kendall’s tau estimators. We establish their joint asymptotic normality, and show that the asymptotic variance is reduced compared to the naive estimators. Then, we perform a simulation study which displays the improved performance of both the unconditional and conditional estimators. Finally, the estimators are used for estimating the value at risk of a large stock portfolio; backtesting illustrates the obtained improvements compared to the previous estimators. “

- Robust-to-outliers square-root LASSO, simultaneous inference with a MOM approach, with Gianluca Finocchio (University of Twente) and Katharina Proksch (University of Twente) (
**2021**)

**Abstract:** ” We consider the least-squares regression problem with unknown noise variance, where the observed data points are allowed to be corrupted by outliers. Building on the median-of-means (MOM) method introduced by Lecue and Lerasle Ann.Statist.48(2):906-931(April 2020) in the case of known noise variance, we propose a general MOM approach for simultaneous inference of both the regression function and the noise variance, requiring only an upper bound on the noise level. Interestingly, this generalization requires care due to regularity issues that are intrinsic to the underlying convex-concave optimization problem. In the general case where the regression function belongs to a convex class, we show that our simultaneous estimator achieves with high probability the same convergence rates and a similar risk bound as if the noise level was unknown, as well as convergence rates for the estimated noise standard deviation.
In the high-dimensional sparse linear setting, our estimator yields a robust analog of the square-root LASSO. Under weak moment conditions, it jointly achieves with high probability the minimax rates of estimation s^{1/p}((1/n)log(p/s))^{1/2} for the ℓ_{p}-norm of the coefficient vector, and the rate ((s/n)log(p/s))^{1/2} for the estimation of the noise standard deviation. Here n denotes the sample size, p the dimension and s the sparsity level. We finally propose an extension to the case of unknown sparsity level s, providing a jointly adaptive estimator (β˜,σ˜,s˜). It simultaneously estimates the coefficient vector, the noise level and the sparsity level, with proven bounds on each of these three components that hold with high probability. “

On lower bounds for the bias-variance trade-off, with Johannes Schmidt-Hieber (University of Twente) (

**2020**)

**Abstract:** ” It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. To highlight possible extensions of the proposed framework, we moreover briefly discuss the trade-off between bias and mean absolute deviation. “

- On the construction of confidence intervals for ratios of expectations, with Lucas Girard (CREST-ENSAE) and Yannick Guyonvarch (CREST-ENSAE) (
**2019**)

**Abstract:** ” In econometrics, many parameters of interest can be written as ratios of expectations. The main approach to construct confidence intervals for such parameters is the delta method. However, this asymptotic procedure yields intervals that may not be relevant for small sample sizes or, more generally, in a sequence-of-model framework that allows the expectation in the denominator to decrease to 0 with the sample size. In this setting, we prove a generalization of the delta method for ratios of expectations and the consistency of the nonparametric percentile bootstrap. We also investigate finite-sample inference and show a partial impossibility result: nonasymptotic uniform confidence intervals can be built for ratios of expectations but not at every level. Based on this, we propose an easy-to-compute index to appraise the reliability of the intervals based on the delta method. Simulations and an application illustrate our results and the practical usefulness of our rule of thumb. “

**Abstract:** ” U-statistics constitute a large class of estimators, generalizing the empirical mean of a random variable *X* to sums over every *k*-tuple of distinct observations of *X*. They may be used to estimate a regular functional *θ(P _{X})* of the law of

2020:

- On lower bounds for the bias-variance trade-off, with Johannes Schmidt-Hieber, at the 13th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics2020, virtual conference, 19-21 December 2020).
- On lower bounds for the bias-variance trade-off, with Johannes Schmidt-Hieber, Meeting in Mathematical Statistics 2020 (CIRM virtual conference, 14-18 December 2020).
- Estimation of copulas by Maximum Mean Discrepancy, with Pierre Alquier (RIKEN AIP), Badr-Eddine Chérief-Abdellatif (University of Oxford), and Jean-David Fermanian (online Statistics Seminar at the University of Strasbourg and IRMA, France, 2 November 2020).

2019:

- Meeting in Mathematical Statistics (Luminy, France, 16-20 December 2019).
- On the estimation of elliptical copula generators, with Jean-David Fermanian, Invited speaker at the 12th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics2019, London, UK, 14-16 December 2019).
- On machine learning methods for the estimation of conditional Kendall’s tau (poster in .pdf format), at the Stochastics Meeting Lunteren 2019 (Lunteren, Netherlands, 11-13 November 2019).
- Two presentations: An introduction to copulas and dependence modeling / On machine learning methods for the estimation of conditional Kendall’s tau (slides in .pdf format), at the Rencontres Statistiques Lyonnaises (Lyon, France, 5 November 2019).
- On machine learning methods for the estimation of conditional Kendall’s tau (poster in .pdf format), at the 3rd edition of Data Science Summer School (DS3, Palaiseau, France, 24-28 June 2019).
- On machine learning methods for the estimation of conditional Kendall’s tau (poster in .pdf format), at the Statistics Conference in honor of Aad van der Vaart’s 60th birthday (Leiden, Netherlands, 17-21 June 2019).
- Sur l’estimation du tau de Kendall conditionnel à l’aide de méthodes de classification (.pdf, in French) with Jean-David Fermanian, at the 51èmes Journées de Statistique (JDS2019, Nancy, France, 3-7 June 2019).

2018:

- About the estimation of conditional Kendall’s tau and Kendall’s regression, at the Meeting in Mathematical Statistics 2018 (Fréjus, France, 16-21 December 2018).
- A classification point-of-view about conditional Kendall’s tau, at the 11th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics2018, Pisa, Italy, 14-16 December 2018).
- Improved bounds for Square-root Lasso and Square-root Slope, at the 12th International Vilnius Conference on Probability Theory and Mathematical Statistics and 2018 IMS Annual Meeting on Probability and Statistics (Vilnius, Lithuania, 2-6 July 2018).
- Improved bounds for Square-root Lasso and Square-root Slope (poster in .pdf format), at the 2nd edition of Data Science Summer School (DS3, Palaiseau, France, 25-29 June 2018).
- Improved bounds for Square-root Lasso and Square-root Slope, at the 4th Conference of the International Society for Nonparametric Statistics (ISNPS2018, Salerno, Italy, 11-15 June 2018).
- À propos de la régression du tau de Kendall conditionnel (.pdf, in French) with Jean-David Fermanian, at the 50èmes Journées de Statistique (JDS2018, Palaiseau, France, 28 May – 1 June 2018).
- About Kendall’s Regression, with Jean-David Fermanian (CREST Financial Econometrics seminar, Palaiseau, France, 15 February 2018).

2017:

- Meeting in Mathematical Statistics (Luminy, France, 18-22 December 2017).
- About the estimation of the conditional Kendall’s tau and Kendall’s Regression, with Jean-David Fermanian, Invited speaker at the 11th International Conference on Computational and Financial Econometrics (CFE2017, London, UK, 16-18 December 2017).
- À propos des tests de l’hypothèse simplificatrice pour les copules conditionnelles (.pdf, in French) with Jean-David Fermanian, at the 49èmes Journées de Statistique (JDS2017, Avignon, France, 29 May – 2 June 2017).
- Workshop Statistical Recovery of Discrete, Geometric and Invariant Structures (Oberwolfach, Germany, 19-25 March 2017).
- About tests of the “simplifying” assumption for conditional copulas, with Jean-David Fermanian, Rencontres ENSAE-ENSAI de Statistiques (Bruz, France, 26-27 January 2017).

2016:

- Inference of elliptical copula generators, with Jean-David Fermanian, Invited speaker at the 9th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics2016, Seville, Spain, 9-11 December 2016).

- Introduction to Statistics (University of Twente, ATLAS Bachelor program 3rd Semester)

- Numerical Analysis (ENSAE 1st year)
- Probability Theory ; C++ ; Mathematical Statistics 1 (ENSAE 2nd year)
- Time Series ; Financial Econometrics (ENSAE 3rd year)

- Probability Theory ; Numerical Analysis (ENSAE 1st year)
- C++ (ENSAE 2nd year)
- Time Series ; Financial Econometrics (ENSAE 3rd year)

- Analysis and Topology ; Convex Optimization ; Numerical Analysis (ENSAE 1st year)
- Financial Econometrics (ENSAE 3rd year)