AI Insights

MediaPipe Python Tutorial [How to Install + Real-Time Hand Tracking Example]

December 30, 2021

article featured image

In this tutorial, we will guide you on how to install MediaPipe Python step by step with an example Real-Time Hand Tracking Project.

MediaPipe Python is a powerful tool for developers looking to incorporate computer vision and machine learning into projects. It provides a high-level API for building real-time ML solutions for mobile, edge, cloud, and web.

In this tutorial, we’ll cover the following topics:

  • Understanding the MediaPipe API
  • Installing MediaPipe Python
  • Building a real-time hand-tracking application

Let’s start! 🙂

What is MediaPipe Python?

MediaPipe is Google’s open-source framework, used for media processing. It is cross-platform or we can say it is platform friendly. It is run on Android, iOS, web, and YouTube servers that’s what Cross-platform means, to run everywhere.

What do you think is common in all these pictures?

Think for a while and guess what is common in all images below!

Your guess is absolutely correct, module MediaPipe is common in all these images.

What are the uses of MediaPipe?

Uses of MediaPipe

Every Youtube video we watch is processed with machine learning models using MediaPipe. Google has not hired thousands of employees to watch every video people upload, because thousands of people are not enough to look after and check each published video, the amount of data Google gets daily is not easy for humans to check. Machine Learning models are developed to make our life easier, so for tasks that are hard for us to complete, machine learning and deep learning models help us to do them in less amount of time, on the other hand, we can save money by not hiring employees.

Yes, Google has machine learning/deep learning models to see if the videos match their policies and if the content is not having copyright issues.

Basically, MediaPipe is a framework for Computer Vision and Deep Learning that builds perception pipelines. For now, you just need to know, perception pipelines are some sort of audio, video, or time-series data that catch the process in the pipelining zone.

Why does Google use MediaPipe?

Google has been using MediaPipe for so long and mainly Google uses it for two tasks.

1. Dataset preparation for Machine learning training

Pose Estimation

Pose estimation means finding a person’s or an object’s key points. A person’s key points are elbow, knee, wrist, etc so MediaPipe can be used for training the ML model to learn the key points and further use the knowledge for specific tasks, this actually can be useful for action recognition.

Pose Estimation

Pose Estimation

2. ML inference pipelines

Live Data

ML inference is the process of running live data points.

Example: We all have used Snap_chat and Instagram filters and may have recorded videos, this is what ML inference means.

ML inference pipelines

ML inference pipelines

What is possible with MediaPipe?

There are a number of AI problems that can be done by MediaPipe. Here some are mentioned:

  • Object Tracking
  • Box Tracking
  • Face Mesh
  • Hair Segmentation
  • Live Hand Tracking and many more.

MediaPipe Hands: Real-Time Hand Tracking Project

Here I have developed the Live Hand Tracking project using MediaPipe.

Hand Tracking uses two modules on the backend

1. Palm detection

Works on complete image and crops the image of hands to just work on the palm.

Palm Detection

Palm Detection

2. Hand Landmarks

From the cropped image, the landmark module finds 21 different landmarks on the hand.

Hand Landmarks

Hand Landmarks

How to Install MediaPipe in Python?

For this specific task, we require three modules, cv2, MediaPipe, and time.

We can install all the modules/libraries of Python by installing pyforest in the Jupyter Notebook.

Installing Modules

Installing Modules

Once the modules are installed and the next time when this command is run, the output will be shown that (the requirements are already satisfied). See below in the image.



If MediaPipe is still not installed and does not work, install it separately because MediaPipe is the newest module maybe it is not yet included in the pyforest, as I thought to work directly on Kaggle notebook but found out that MediaPipe was not working, I installed it and worked on Jupyter Notebook, Jupyter Notebooks do not require internet it is a plus point.

This is how to install MediaPipe in Jupyter notebook.

Mediapipe installation

Mediapipe installation

How to Import Modules in Jupyter?

Importing libraries

Importing libraries

How to Camera Object in Python?

In the below code, I have created a camera object just to check if the camera is working properly.

Create a camera object

Here is the output.

Camera object

Camera object

How to Create Object from Class Hand?

Created a hand object from the hand class so that BGR image is converted to RGB, as the hands object only uses/accepts RGB.



Extracting Information from the object results

Before extracting hands further details, make sure there is something in the object (results), do this simple step, Use a print statement, and print the object result to see what it holds. It just shows MediaPipe solution-based solutions and nothing else even if the hand is shown.

Object Result

How to Check if the Hand is Being Detected or not?

Update print statement by putting (multi_hand_landmarks), and see if the camera is detecting hands.

Update print statement by putting (multi_hand_landmarks)

Now as I have updated the print statement, the information I am getting is “None” because no hand is shown.

Let’s see what information is extracted when hand/ hands are shown.

Hand is detected by the camera

So you see, when the hand is detected by the camera it gives some values.

How to Detect Landmarks and Draw points on Hand?

In the below code, the drawing object is created (mp_draw), further the if statement says that if the landmarks are detected the for loop will run and draw a point wherever landmark is detected.

Interesting right! See the image.

Landmarks are detected and points are drawn

Landmarks are detected and points are drawn

How to Draw Connections Between Landmarks?

Connections are drawn by using a hand object (mp_hand.HAND_CONNECTIONS).



Frame Rate

For fps two variables are declared, p_time and c_time (previous and current time).

Frame Rate

Extracting value of each landmark

Just in case if any specific point is needed to be tracked for any purpose.

As we know there are 21 landmarks in a hand (0 to 20). The landmark information gives the x,y, and z coordinates with id which are listed in the correct order. We can use x and y coordinates to find the location of a landmark on hand.

id and coordinates

id and coordinates

Here firstly I have checked the height, width, and channels (h, w, c) of the image. In the previous code, I have got the decimal values and now I wanted exact integer values, therefore, I have converted the circle values (cx, cy) to integers.

id and coordinates

Drawing circle on a specific landmark

So for drawing, I have created a drawing object (mp_draw), further, I have declared an if condition for point 0 because I wanted a filled circle at the landmark 0.

Drawing circle on a specific landmark

High lighting fingertips

For fingertips, the landmarks are (4,8,12, 16, and 20). See the code in the below image.

High lighting fingertips

This is how we can use these landmarks for different tasks. Here I am ending the article also it’s not the end of the study there is still a lot to explore.

How to contact the Omdena Pakistan Chapter?

If you face any issues regarding any AI and ML project, or you want details about workshops, or you want to be part of any AI project and don’t know where to start you can instantly reach us for assistance. Our social media team is always active in helping Engineers and posting regarding upcoming workshops and ongoing projects. You can follow us on the below-mentioned pages to stay updated.

Facebook:Omdena Pakistan Chapter

Github:Qasim Hassan

Medium:Iqra Anwar

Ready to test your skills?

If you’re interested in collaborating, apply to join an Omdena project at:

media card
A Beginner’s Guide to Exploratory Data Analysis with Python
media card
Best Topic Modeling Python Libraries Compared (+ Top NLP Projects)
media card
Top 10 GitHub Data Science Projects with Source Code in Python
media card
YOLO Object Detection Using Python and OpenCV to Build a Pedestrian Detector