Omdena Academy Courses

Understanding Vision Transformers

December 29, 2023

Omdena Course Featured Image

For whom is this course?

Transformers are now considered state-of-the-art in sequence modeling tasks.  However, recent works on transformers employing self-attention and the global attention paradigm have shown remarkable performance in multiple computer vision tasks. In this course, we will explore the baseline vision transformer models and observe their performance on remote sensing image classification.

What will you learn?

  • A good understanding of vision transformers
  • How to deploy transformer models for remote sensing image classification
  • Good knowledge of model implementation in PyTorch


  • Python basics
  • Pytorch
  • Deep Learning basics
  • Linear Algebra basics


Session 1: Understanding Vision Transformers (3 hours)

  • What is a vision transformer?
  • The overall structure of a vision transformer
  • Components of a vision transformer

Session 2: The Attention mechanism (2 hours)

  • What is self-attention?
  • Role of global attention in transformers
  • How attention is computed

Session 3: Using vision transformers for remote sensing image classification (8 hours)

  • Study the impact of data augmentation strategies 
  • Understand the relation of network depth and transformer performance
  • Impact of changing the image size on model accuracy

Session 4: Vision Transformers vs CNNs (2 hours)

  • What is Inductive Bias
  • Understand the impact of Field of View 
  • The difference in data and memory requirements


Course Info

Duration15 hours
Start DateJuly 25, 2022
Last Registration DateJuly 20, 2022
No of Students40
Skill Levelintermediate

View more Courses

media card
View all courses from Omdena Academy Go Back