Data Analysis in Astronomy and Physics (B-KUL-G0Z22A)

Aims
After successful completion of this course, the student will have learned:
- to recognize different types of astronomy and physics data analysis problems
- how to translate these types of problems into a statistical model, and understand the limitations of the model
- how to implement the statistical model in Python using existing libraries and using real-world astronomical and physics datasets
- how to critically assess the numerical results, and quantify the uncertainties of the estimates and the predictions
- how to select the most optimal model
- how to visualize the dataset, the model parameters and their uncertainties, and predictions
The course aims to convince the students that statistical data analysis is an indispensable tool to make discoveries in observational astronomy and experimental physics, by showing them inspiring success stories where statistical analysis tools have been used to solve concrete physical and astronomical challenges.
Previous knowledge
The students should have had an introductory course in astronomy (e.g. similar to “Inleiding tot de sterrenkunde” B-KUL-G0U45A), an introductory course in physics (e.g. similar to “Algemene natuurkunde I” B-KUL-G0N29B) , and an introductory course in probability and statistics (e.g. similar to “Kansrekenen” B-KUL-G0W66A, and “Statistiek” B-KUL-G0U47A).
Is included in these courses of study
Activities
4 ects. Data Analysis in Astronomy and Physics (B-KUL-G0Z22a)




Content
Each lecture will start with highlighting a specific type of astronomical and/or physical data analysis problem, including many examples, for which quantitative statistical tools will then be explained. These tools include:
1. Regression revisited
- Robust regression: dealing with outliers
- Total least-squares: regression with errors in both variables
- Regularized least-squares: constraining the solution using lasso, ridge, and elastic nets
- Generalized linear models
- Overfitting, underfitting, BIC, AIC, and cross-validation
Although students are familiar with ordinary least-squares, this chapter teaches several other regression methods that often appear in the astronomical and physical literature. Regularized least-squares, for example, has important applications in e.g. helioseismology, spectral disentangling, etc.
2. Resampling methods
- Bootstrapping and jackknife
- Bootstrapping for regression models
- Bootstrap based model selection
Resampling is very regularly used in the astronomical and physical literature where it’s most often used for parameter and uncertainty estimation. This chapter teaches when it is useful to use bootstrapping, how to use it, and what are the limitations.
3. Bayesian inference
- Posterior, likelihood, and prior distributions
- Hierarchical Bayesian models
- Model selection and model averaging, Bayesian evidence
- Numerical methods for Bayesian estimation
This chapter teaches how to apply Bayesian techniques to astronomical and physical data analysis, focusing on applications in the literature. Examples include the Period-Luminosity-Color relation of contact binaries, the Initial Mass Function, the fraction of red spirals as a function of the bulge size, etc. The examples are carefully chosen to teach the students on how to set up a Bayesian hierarchical model, to bring them into contact with different types of distributions (not only Normal, but also lognormal, Bernoulli, Beta, etc) and to show how the hierarchical models can be solved using dedicated statistical software libraries. The section on the numerical methods will not give an overly detailed treatment, but rather give a basic understanding of the most popular numerical methods in the astronomical and physical literature, together with their limitations, so that the quality (e.g. convergence) of the numerical results can be assessed.
4. Count models
Problems that involve counting or the analysis of populations sizes occur so often in astronomy that they deserve their own chapter. One example is the relation between the number of globular clusters vs the absolute visual magnitude of nearby galaxies. Statistically they involve the Poisson models, Negative Binomial models, zero-truncated models, or generalizations of these models to account e.g. for overdispersion. Students will learn which model to choose and how to estimate its parameters and uncertainties.
5. Spatial analysis of points
Quite typical for the physical sciences is the analysis of how a group of point sources is spatially distributed, e.g. 3D or on the sky. One famous example is the clustering of Galaxies. Statistically this topic includes spatial autocorrelation, quantitative clustering measures such as the 2-point correlation function, and model-based spatial analysis with e.g. the Von Mises-Fisher distribution or more complicated mixture models.
6. Gaussian processes
GPs are becoming more and more popular in astronomy. The have been used to model star formation histories, to model the Galactic halo, to model the 3D dust distribution in our Galaxy, to study the rotational modulation of the Sun, etc. A Gaussian process is a generalization of the Gaussian distribution. Loosely speaking, where the latter is a distribution over scalars, the former is a distribution over functions. This makes them great tools not only for regression, but also for classification and clustering.
The course is mathematical rather than descriptive, but refrains from being overly rigorous, the focus is on application.
Course material
Course notes will be given on Toledo.
Format: more information
The course consists of 15 lectures of 2 hours each where for theoretical background of the different data analysis tools is explained using real-world astronomical and physical problems.
2 ects. Data Analysis in Astronomy and Physics: Exercises and Applications (B-KUL-G0Z23a)




Content
For each of the chapters given in the Lecture sessions, the students receive in the Exercises & Applications sessions concrete astronomical and/or physical data analysis problems using real-world datasets. The students will then practice:
- Translating the (astro)physical problem at hand to one or more useful statistical models
- Implementing the models in existing software tools.
- Assessing the reliability and physical meaning of the numerical estimates
- Selecting the most optimal model
- Visualizing the dataset, the model parameters and their uncertainties, and predictions
The exercise sessions will make heavy use of the educational capabilities provided by Jupyter notebooks. Existing software tools will be used as much as possible, to keep the amount of programming to a minimum.
Course material
Jupyter notebooks and concrete datasets will be made available to the students, either through Toledo, or through GitHub.
Format: more information
The exercise and applications course consists of 10 sessions of 2 hours each, where students will solve in small groups concrete problems using the theoretical background they received during the lecture.