Abstract

Statistics often focuses on designing models, theoretical estimates, related algorithms and model selection. However, some sides of this whole process are somewhat not really tackled by statisticians, leaving the practitioner with some empirically choices, thus poor theoretical warranties. In this context, we identify two situations of interest which are firstly the data unit definition, in case where the practitioner hesitates between few, and secondly the way of saving computational time, for instance by early stopping rules of some estimating algorithms.

In the first case (data units), we highlight that it is possible to embed data unit selection into a classical model selection principle. We introduce the problem in a regression context before to focus on the model-based clustering and co-clustering context, for data of different kinds (continuous, categorical). It is a joint work with Alexandre Lourme (University of Bordeaux).

In the second case (computational saving), we recall thatÂ an increasingly recurrent statistical question is to design a trade-off between estimate accuracy and computation time. Most estimates practically arise from algorithmic processes aiming at optimizing some standard, but usually only asymptotically relevant, criteria. Thus,Â the quality of the resulting estimate is a function of both the iteration number and also the involved sample size.Â We focus on estimating an early stopping time of a gradient descent estimation process aiming at maximizing the likelihoodÂ in the simplified context of linear regression (with some discussion in other contexts). It is a joint work with Alain CÃ©lisse and Maxime Brunin (University of Lille and Inria, both).

Â