- français
- English

# Statistics

(last modified 31.10.2020)

Contact person: Professor Anthony Davison

**Bachelor (3rd and 4th semester)**

- Probabilités (MATH-230)
- Statistique (MATH-240)

**Bachelor (5th and 6th semesters)**

- Linear Models (MATH-341)
- Time Series (MATH-342)
- Randomisation and Causation (MATH-336)
- Stochastic Processes (MATH-332)
- Mesure et Intégration (MATH-303)

**Master**

- Statistical Theory (MATH-442)
- Modern Regression Methods (MATH-408)
- Multivariate Statistics (MATH-444)
- Statistical Machine Learning (MATH-412)
- Biostatistics (MATH-449)
- Applied Biostatistics (MATH-493)
- Statistics for Genomic Data Analysis (MATH-474)
- Statistical Genetics (MATH-438)
- Statistical Analysis of Network Data (MATH-448)
- Risk, Rare Events, and Extremes (MATH-447)
- Stochastic Simulation (MATH-414)
- Probability Theory (MATH-432)
- Combinatorial Statistics (MATH-455)

**Description**

All students take the second-year courses in Probability (MATH-230) and Statistics (MATH-230). MATH-230 provides a careful introduction to the key notions of probability, including limit theorems crucial for statistical applications. MATH-240 gives a rigorous introduction to the elements of statistical inference (estimation, testing, confidence intervals) for a scalar parameter based on a random sample.

Linear Models (MATH-341) and Time Series (MATH-342) are core third-year courses for the Statistics track. They consider two important modes of departure from the standard "i.i.d" setup encountered in the second year. In MATH-341, the data remain independent but have differing parameters, subject to linear constraints, and with Gaussian behaviour. In MATH-342, the data may have the same distribution (i.e., are stationary), but are typically dependent. Like many of the other courses below, these two courses describe methods for analysis of existing data, but do not say how to plan investigations that lead to secure inferences. Randomisation and Causation (MATH-336), has two main topics, namely how randomisation can be used to design experiments to give strong data from which reliable inferences can be drawn, and the circumstances under which causal inferences (e.g., `behaviour A causes health outcome B') can be drawn from observational data. Stochastic Processes (MATH-332), which is also strongly recommended, considers more general dependence structures than in MATH-341, emphasising both dependence and non-stationarity, primarily through the Markov property.

Students considering the possibility of higher studies in statistics are strongly encouraged to take *Mesure and Intégration* (MATH-303), which provides theoretical underpinning necessary for the study of advanced mathematical statistics.

The master level courses in statistics cover more advanced material, building on the third year courses. Statistical Theory (MATH-442) treats the theoretical foundation of statistics, including optimality theory and asymptotic theory. Modern Regression Methods (MATH-408) is the natural follow-up to Linear Models (MATH-341), exploring models for non-Gaussian response variables, more complex dependence structures in which some variables may be treated as random, and situations where smoothing is important. Multivariate Statistics (MATH-444) treats inference for collections of random vectors, which are widespread in applications. Statistical Machine Learning (MATH-412) studies methods of supervised and unsupervised machine learning from a mathematical viewpoint.

In addition to the above master courses on general theory and methods, various courses on more specialised topics are available; not all of these are given every year. Biostatistics (MATH-449) presents some of the core methods and applications of statistics in the life sciences and medicine. Applied Biostatistics (MATH-493) focuses on the use of the software package R for the analysis of biomedical data. Statistics for Genomic Data Analysis (MATH-443) explores the key challenges and statistical techniques used in the analysis of massive genomic data. Statistical Genetics (MATH-438) covers key probability models and statistical methods that are used for the analsyis of genetic data. Statistical Analysis of Network Data (MATH-448) describes methods and models for the analysis of data that arise in connection with networks, which have become very prominent in recent years. Risk, Rare Events, and Extremes (MATH-447) deals with the formulation and estimation of risk associated with improbable events.

Stochastic Simulation (MATH-414) is an introduction to Monte Carlo methods, which are widely used in statistical applications, especially for Bayesian inference. Probability Theory (MATH-432) takes a second look at probability using the tools of measure theory and is strongly recommended for students wishing to pursue graduate study in statistics. Combinatorial Statistics (MATH-455) concerns learning from network data, and is a natural complement to MATH-448.

**Some other mathematics courses related to statistics**

All courses in the Probability track are particularly recommended.

Numerical Analysis (MATH-250), Advanced Numerical Analysis (MATH-351) and Numerical Integration of Stochastic Differential Equations (MATH-452) contain useful background for nonparametric statistics, statistical optimisation and functional data analysis respectively.

Computational Linear Algebra (MATH-453) considers numerical methods to solve large-scale linear algebra problems, which can be particularly pertinent in multivariate and high-dimensional statistics when massive amounts of data must be stored and manipulated for the purposes of inference.

Discrete Optimization (MATH-261) has important links with statistical inference problems related to discrete structures. Nonlinear Optimization (MATH-329) and Convexity (MATH-461) discuss aspects of high dimensional geometry that are central to many methods of modern high dimensional statistics.

**Other courses and related minors**

Mathematics students can take a few credits outside mathematics, some of which may be related to statistics. Examples are Convex Optimization (MGT-418), Mathematics of Data (EE-556) and Optimization for Machine Learning (CS-439). Machine Learning (CS-433) is also a good choice, but it has several overlaps with Statistical Machine Learning (MATH-412); interested students should therefore take MATH-412 and, if needed, CS-433 outside their curriculum.

There is a minor in Data Science.