Statistics

(last modified 30.4.2024)

Contact person: Professor Anthony Davison

Bachelor (3rd and 4th semester)

Bachelor (5th and 6th semesters)

Master

Description

All students take the second-year courses in Probability (MATH-230) and Statistics (MATH-230). MATH-230 provides a careful introduction to the key notions of probability, including limit theorems crucial for statistical applications. MATH-240 gives a rigorous introduction to the elements of statistical inference (estimation, testing, confidence intervals) for a scalar parameter based on a random sample. 

Linear Models (MATH-341) and Time Series (MATH-342) are core third-year courses for the Statistics track. They consider two important modes of departure from the standard "i.i.d" setup encountered in the second year. In MATH-341, the data remain independent but have differing parameters, subject to linear constraints, and with Gaussian behaviour. In MATH-342, the data may have the same distribution (i.e., are stationary), but are typically dependent. Like many of the other courses below, these two courses describe methods for analysis of existing data, but do not say how to plan investigations that lead to secure inferences. Randomisation and Causation (MATH-336), has two main topics, namely how randomisation can be used to design experiments to give strong data from which reliable inferences can be drawn, and the circumstances under which causal inferences (e.g., `behaviour A causes health outcome B') can be drawn from observational data. Stochastic Processes (MATH-332), which is also strongly recommended, considers more general dependence structures than in MATH-341, emphasising both dependence and non-stationarity, primarily through the Markov property.  It is a basic course for further studies in random processes, and also a source of models for statistical work.  Risk and Environmental Sustainability (MATH-XXX), a new course to be given for the first time in the spring semester 2025, will discuss basic stochastic models for rare events and forecast assessment, with applications to environmental problems. 

Students considering the possibility of higher studies in statistics are strongly encouraged to take Mesure and Intégration (MATH-303), which provides theoretical underpinning necessary for the study of advanced mathematical statistics.

There is an EPFL MSc in Statistics.

The master level courses in statistics cover more advanced material, building on the third year courses. Statistical Inference (MATH-562) gives an overview of the key ideas on which statistical inferences are based, including the likelihood and Bayesian frameworks. Regression Methods (MATH-408) is the natural follow-up to Linear Models (MATH-341), exploring models for non-Gaussian response variables, more complex dependence structures in which some variables may be treated as random, and situations where smoothing is important. Multivariate Statistics (MATH-444) treats inference for collections of random vectors, which are widespread in applications. Statistical Machine Learning (MATH-412) studies methods of supervised and unsupervised machine learning from a mathematical viewpoint.  Statistical Computation and Visualisation (MATH-517) and Applied Statistics (MATH-516) together form the applied statistics sequence at master's level.   Further theory, following on from MATH-562, further statistical theory is developed in Statistical Theory (MATH-442).  

In addition to the above master courses on general theory and methods, various courses on more specialised topics are available; not all of these are given every year. Biostatistics (MATH-449) presents some of the core methods and applications of statistics in the life sciences and medicine. Applied Biostatistics (MATH-493) focuses on the use of the software package R for the analysis of biomedical data. Statistics for Genomic Data Analysis (MATH-443) explores the key challenges and statistical techniques used in the analysis of massive genomic data. Statistical Genetics (MATH-438) covers key probability models and statistical methods that are used for the analsyis of genetic data. Statistical Analysis of Network Data (MATH-448) describes methods and models for the analysis of data that arise in connection with networks, which have become very prominent in recent years.   Other theory courses are Nonparametric Estimation and Inference (MATH-YYY) and Empirical Processes (MATH-ZZZ).

Stochastic Simulation (MATH-414) is an introduction to Monte Carlo methods, which are widely used in statistical applications, especially for Bayesian inference. Probability Theory (MATH-432) takes a second look at probability using the tools of measure theory and is strongly recommended for students wishing to pursue graduate study in statistics. Inference for Graphics (MATH-455) concerns learning from network data, and is a natural complement to MATH-448.

Some other mathematics courses related to statistics

All courses in the Probability track are particularly recommended.

Numerical Analysis (MATH-250), Advanced Numerical Analysis (MATH-351) and Numerical Integration of Stochastic Differential Equations (MATH-452) contain useful background for nonparametric statistics, statistical optimisation and functional data analysis respectively. 

Computational Linear Algebra (MATH-453) considers numerical methods to solve large-scale linear algebra problems, which can be particularly pertinent in multivariate and high-dimensional statistics when massive amounts of data must be stored and manipulated for the purposes of inference.

Discrete Optimization (MATH-261) has important links with statistical inference problems related to discrete structures.  Nonlinear Optimization (MATH-329) and Convexity (MATH-461) discuss aspects of high dimensional geometry that are central to many methods of modern high dimensional statistics.

Other courses and related minors

Mathematics students can take a few credits outside mathematics, some of which may be related to statistics.  Examples are Convex Optimization (MGT-418), Mathematics of Data (EE-556) and Optimization for Machine Learning (CS-439).   Machine Learning (CS-433) is also a good choice, but it has several overlaps with Statistical Machine Learning (MATH-412); interested students should therefore take MATH-412 and, if needed, CS-433 outside their curriculum.

There is a minor in Data Science.