Statistics course for PhD students
Introduction
Welcome to the Statistics for Physical Scientists short course! It's
designed to give researchers, particularly in the physical sciences,
some practical background and guidance in applying common statistical
tools. The course covers:
- basic summary statistics, probability distributions and data combinations,
- overview of the Frequentist and Bayesian frameworks,
- correlation testing and significance, and sample comparisons,
- hypothesis tests and p-values,
- model-fitting and hypothesis testing using the chi-squared statistic,
- regression analysis (including least-squares and Gaussian processes),
- principal component analysis,
- practical error estimates (jack-knife, bootstrap and Monte Carlo simulations),
- propagating errors and Fisher matrix,
- Bayesian likelihood methods (including MCMC) and model selection
The full introduction and content summary can be found here
The course is structured in 6 classes, as described below, which are
split into content presentation, worked examples and practical
activities using the datasets provided. Each class comes with an
accompanying python Jupyter notebook, which provides summary notes and
code for all the worked examples, and a recorded video describing the
slides.
Useful books
The following is an (incomplete!) list of books which contain a great
deal of practical wisdom in using statistics:
- Practical Statistics for Astronomers (Wall & Jenkins)
- Statistics for Nuclear and Particle Physicists (Lyons)
- Practical Bayesian Inference: A Primer for Physical Scientists (Bailer-Jones)
- Modern Statistical Methods for Astronomy (Feigelson & Babu)
- Principles of Data Analysis (Sahu)
- Bayesian Logical Data Analysis for the Physical Sciences (Gregory)
- Data Analysis: A Bayesian Tutorial (Sivia)
- Numerical Recipes: The Art of Scientific Computing (Press, Teukolsky, Vetterling, Flannery)
Class material
Datasets
Here are the datasets that are used in the worked examples and activities:
Class 1: Probability and statistics
Here are the Class 1 content slides as pdf and powerpoint .
Here is the accompanying python
Jupyter notebook for Class 1.
Here is a
video describing the Class 1 slides.
Class 2: Correlation Testing
Here are the Class 2 content slides as pdf and powerpoint .
Here is the accompanying python
Jupyter notebook for Class 2.
Here is a video
describing the Class 2 slides.
Class 3: Model Fitting
Here are the Class 3 content slides as pdf and powerpoint .
Here is the accompanying python Jupyter
notebook for Class 3.
Here is a video
describing the Class 3 slides.
Class 4: Regression
Here are the Class 4 content slides as pdf and powerpoint .
Here is the accompanying python Jupyter
notebook for Class 4.
Here is a video
describing the Class 4 slides.
Class 5: Error Estimates
Here are the Class 5 content slides as pdf and powerpoint .
Here is the accompanying python Jupyter
notebook for Class 5.
Here is a video
describing the Class 5 slides.
Class 6: Bayesian Methods
Here are the Class 6 content slides as pdf and powerpoint .
Here is the accompanying python Jupyter
notebook for Class 6.
Here is a video
describing the Class 6 slides.