Course Summary
This graduate course is especially meant for Ph.D. students who have basic familiarity with computer vision, image processing, and machine learning and want to upsurge their knowledge and machinery to the state-of-the-art, with direct utility in their own research.
The topic of attention is the challenge of computer vision by learning. We address the theoretical foundations of computer vision in conjunction with machine learning and present algorithms that achieve state-of-the-art performance while maintaining efficient execution with minimal supervision. This year we explain and emphasize on vision foundation models, including challenges like 3D object detction, fine-grained recognition, geometric deep learning, self-supervised representation learning and video understanding . We give an overview of the latest developments and future trends in the field on the basis of several recent challenges, and we indicate how to obtain improvements in the near future.
Course Location
The course will happen at the Amsterdam Science Park conference centre at Science Park 125, close to CWI.
Course Registration
Course registration is handled by the ASCI research school, via this form. Note that the number of seats for this course is limited.
Lab requirements: bring your own device
For the lab, you are expected to bring your own device, either a laptop with a good GPU or a laptop that can connect to a workstation with a good GPU. In case you cannot connect to a GPU, you should make a CoLAB Google Account and make sure you can run a GPU powered notebook (You can turn the GPU on by the following steps: Edit->Notebook settings->Hardware accelerator->GPU). The lab assignments are detailed on a separate page.
Course Schedule
Monday January 13, 2025: Foundations
Time | Room | Topic | Lecturer |
---|---|---|---|
0900-0930 | Foyer | Welcome with coffee and tea | |
0930-1030 | Turingzaal | What foundation models cannot perceive | Cees Snoek |
1030-1100 | Break | ||
1100-1200 | Turingzaal | Grounding Foundation Models in Reality: Physics- & Causality-informed World Models | Efstratios Gavves |
1200-1330 | Newtonzaal | Lunch break (included) | |
1330-1600 | Eulerzaal | Lab session |
Tuesday January 14, 2025: Machine learning for computer vision
Time | Room | Topic | Lecturer |
---|---|---|---|
0900-0930 | Foyer | Welcome with coffee and tea | |
0930-1030 | Turingzaal | Hyperbolic deep learning | Pascal Mettes |
1030-1100 | Break | ||
1100-1200 | Turingzaal | Learning of time and dynamics | Efstratios Gavves |
1200-1330 | Newtonzaal | Lunch break (included) | |
1330-1600 | Eulerzaal | Lab session |
Wednesday January 15, 2025: 3D vision by learning
Time | Room | Topic | Lecturer |
---|---|---|---|
0900-0930 | Foyer | Welcome with coffee and tea | |
0930-1030 | Turingzaal | 3D representation learning | Martin Oswald |
1030-1100 | Break | ||
1100-1200 | Turingzaal | 3D human-centric perception and synthesis | Dimitris Tzionas |
1200-1330 | Newtonzaal | Lunch break (included) | |
1330-1600 | Eulerzaal | Lab session |
Thursday January 16, 2025: Computer video by learning
Time | Room | Topic | Lecturer |
---|---|---|---|
0900-0930 | Foyer | Welcome with coffee and tea | |
0930-1010 | Turingzaal | Learning to Generalize in Video Space and Time | Cees Snoek |
1010-1020 | Short break | ||
1020-1100 | Turingzaal | Data and Evaluation in Video Understanding | Hazel Doughty |
1100-1130 | Break | ||
1130-1215 | Turingzaal | Object-centric representations for real-world videos | Andrii Zadaianchuk |
1215-1330 | Newtonzaal | Lunch break (included) | |
1330-1600 | Eulerzaal | Lab session |
Friday January 17, 2025: Invited tutorial by Yuki Asano
Time | Room | Topic | Lecturer |
---|---|---|---|
0900-0930 | Foyer | Welcome with coffee and tea | |
0930-1030 | Turingzaal | Better Foundation Models: Self-supervised Learning for Generalisation | Yuki Asano |
1030-1100 | Break | ||
1100-1200 | Turingzaal | Better Foundation Models: Self-supervised Learning for Generalisation | Yuki Asano |
1200-1330 | Newtonzaal | Lunch break (included) |


Invited tutorial
-
Yuki Asano
is head of the Fundamental AI (FunAI) Lab and full Professor at the University of Technology Nuremberg. Prior to this, he led the QUVA lab at the University of Amsterdam, where he closely collaborated with Qualcomm AI Research. His PhD was at the Visual Geometry Group (VGG) at the University of Oxford.
Lecturers
-
Cees Snoek
is full professor in computer science at the University of Amsterdam, where he heads the Video & Image Sense Lab. He is also a director of two public-private AI research labs: QUVA Lab with Qualcomm and Atlas Lab with TomTom. He also leads the HAVA-Lab, an interdisciplinary PhD-programme of the UvA Data Science Centre that aligns video-AI technologies with human values and ethical principles. He was a visiting scientist at Carnegie Mellon University, Pittsburgh and the University of California, Berkeley. His research interest is video and image understanding by computer vision and machine learning.
-
Efstratios Gavves
is an Associate Professor with the University of Amsterdam in the Netherlands. He received his Ph.D. in 2014 at the University of Amsterdam. He was a post-doctoral researcher at the KU Leuven from 2014 - 2015. He has authored several papers in major computer vision and machine learning conferences and journals. He is a recipient of the ERC Career Starting Grant 2020 and NWO VIDI grant 2020 to research on the Computational Learning of Temporality for spatiotemporal sequences.
Guest Lecturers
-
Hazel Doughty
is an Assistant Professor at Leiden University. Previously she was a postdoctoral researcher at the University of Amsterdam and a PhD student at the University of Bristol. Her area of interest is Video Understanding, focusing on fine-grained and detailed video understanding with weak, noisy or other forms of incomplete supervision.
-
Pascal Mettes
is an Assistant Professor at the University of Amsterdam. He received his PhD in 2017 at the University of Amsterdam and was a visiting scientist at Columbia University in 2016. His research interests are in hyperbolic learning for computer vision.
-
Martin Oswald
is an assistant professor at the Atlas lab of the University of Amsterdam. He was previously a the Computer Vision and Geometry Group at ETH Zurich. He obtained his PhD at Technische Universität München.
-
Dimitris Tzionas
is an Assistant Professor for 3D Computer Vision at the University of Amsterdam. Earlier, he was a Research Scientist and postdoc at the Perceiving Systems department at MPI for Intelligent Systems. He received a PhD from the University of Bonn.