AI Insights

7+ Best Computer Vision Projects on GitHub You Need to Know (Including Research Papers with Source Code)

May 19, 2022

article featured image

In this article, you will find a curated list of the best open-source Computer Vision projects, heavily based on GitHub’s trending stuff for 2024.


The quest for computers’ ability to actually “see” and understand digital images has been a driving force in recent years. In the past two years, organizations rushed to adopt automation on a larger scale accelerated due to the pandemic. Computer vision is the technology solution we turn to today. But that’s just the tip of the iceberg for this fantastic technology. In this article, we talk about the infinite potential of computer vision project ideas and about how we can get our hands on some exciting computer vision projects on GitHub.

What are the uses of Computer Vision?

Computer vision is a broad term for many projects with the use of deep neural networks to develop human-like vision capabilities for various applications. Today, computer vision has immense potential in real-world applications spanning retail, banking, construction, sports, automotive, agriculture, insurance, and beyond. Some use cases involve computer vision projects that positively impact the world into a better place.

Where are Computer Vision projects used?

Where is computer vision used?


The healthcare sector is always looking to treat patients better with more significant insights. Computer vision is already assisting doctors in improving patient diagnosis, monitoring diseases, and prescribing appropriate treatments.

Computer vision applications like COVID-Net can detect the virus in patients from chest X-ray scanned images with high accuracy. They can also perform X-Ray analysis, cancer detection, blood loss detection, CT & MRI analysis, digital pathology, and much more.


Standard CCTV cameras, security drones, and visual monitoring devices continuously produce high volumes of footage, making it impossible for humans to monitor and take proactive action. Here’s where computer vision helps by warning about wall-scaling intrusions, intruders, and hidden weapons and can even stream live video to human security personnel who can choose to intervene.

Computer vision helps with innovative solutions in access control, checkpoint security, theft detection in retail, public health & safety by reducing crime, and many more security applications.


Computer vision solutions for transportation help to overcome obstacles and save passenger and pedestrian lives. For example, automating railroad maintenance solutions using computer vision involves drone-based solutions that leverage video analysis and real-time alerts. 

A computer vision-based traffic monitoring system enables tracking vehicles, detecting abnormal behavior of the driver, predicting collisions, analyzing traffic to help reduce congestion, etc.


Computer vision applications play a significant role in product and component assembly in manufacturing. They help to conduct fully automated product assembly and management processes – for example, a well-known fact that Tesla manufacturing is 70% automated. Based on 3D modeling designs, computer vision systems guide the assembly process precisely. Computer vision also helps in other manufacturing areas like – predictive maintenance, defect detection where they spot defects among a large group of manufactured products, safety and security standards, or counting stock in inventory management.

How does Computer Vision work?

Computer vision works by analyzing massive amounts of data until it recognizes distinctions and recognizes images correctly. Using computer vision, we can train a computer to know the difference between a good car tire and one with defects by feeding in many pictures of these types of tires. Finally, the computer will learn the difference and recognize a good tire with no flaws.

A couple of technologies help establish our desired computer vision outcome of identifying a good tire from a tire with defects.

1. A type of machine learning called Deep Learning

2. Convolutional neural network (CNN) 

Machine learning uses algorithmic models to teach the computer about the context of visual data to differentiate one image from the other. Algorithms enable self-learning in the machines without specific programming to recognize an image.

A CNN is used to understand single images, while a recurrent neural network (RNN) is used for video to help computers understand relations between pictures in a series of frames. A CNN helps the machine learning model look deeper into the images, break them into pixels, and label them. The labels get used to perform convolutions and make predictions about what is “seen.” There are multiple iterations by the neural network that runs convolutions and checks the accuracy of its predictions until it reaches a correct prediction (seeing specified images like humans).

7 New Computer Vision Projects on GitHub 2024

1. Pathology Classification

The amount of data pathologists need to analyze in a day is massive and challenging. Deep learning algorithms can identify patterns in large amounts of data. Optical coherence tomography (OCT) uses light waves to look inside a living human body. It can detect various diseases in humans, plants, and animals. Evaluation of issues like thinning skin, broken blood vessels, heart diseases, and many other medical problems is possible.

Learn more at the Github link here

2. Automatic colorization using deep neural networks

Image colorization is adding plausible colors to monochrome photographs or videos to make them visually acceptable and perceptually meaningful to convince the viewer of their authenticity. 

As color is a very important component of visual representation, the B&W photos make it impossible to fully imagine the actual represented scene. Since objects can have different colors due to perspective, lighting, or many other factors, there are numerous possible ways to assign colors to pixels in an image. Insight into the original colors of old photographs is very often unavailable, thus, the operation of automatic colorization is very challenging and there is no unique solution to this problem. Nevertheless, the aim of colorization is to deceive the viewer, to make him believe in the authenticity of the colorized image, and not to reconstruct the color accurately. 

The process of automatic image colorization relies on assigning color information to grayscale images without any user intervention and combines the use of machine learning and deep neural networks with art.

The python code on automatic image colorization can be found in this Github repository

3. Text Recognition using OpenCV and Tesseract (OCR)

When using text recognition using OpenCV and OCR (Optical Character Recognition) on an image, we identify each letter and convert it into text. The solution is optimal for anyone seeking to take information from an image or video and convert it into text-based data.

Tesseract is an open-source application backed by Google that can recognize text in 100+ languages. We can also train this application to identify many other languages.

 Apps like PDF scanner and Google Lens use OCR.

Learn more at the Github link here

4. Image Animation using First Order Motion Model

This project animates faces from videos and images. Here the model takes a driving video and maps its motion over static images to give a realistic appearance. The model loading uses Python, and the source code is available in the repository. 

This example shows a method of animating static source images through unsupervised region detection. The model uses a driving video and maps its motion over still images to give a realistic appearance.

First Order Motion Model (FOMM) consists of two main parts: motion estimation and image generation. Motion estimation contains coarse motion estimation, which is modeled as sparse motions between separate object parts, and dense motion, producing an optical flow along the confidence map for the entire image.

The improved model Github repository and its paper with further explanation on the taken approach can be found here

5. One Shot Face Stylization

In this repository, you can learn about style mapper, which applies some fixed styles to the input images, for example faces, working on JoJoGAN procedure.

It uses styleGAN’s style-mixing properties to produce paired style datasets from a single example and later, on those paired data, executes style-mapping by GAN inversion followed by the fine-tuned StyleGAN. 

JoJoGAN can use extreme style references (say, animal faces) successfully and one can control which aspects of the style are used and how much of the style is applied as the result. Additionally, the algorithm produces a high-quality resolution output.

To find more about the Face stylization algorithm, visit a Github repo here

6. Image restoration

One can restore video and images that are blurred. Image restoration is a long-standing low-level vision problem that restores HQ images from LQ images (downscaled, noisy, or compressed images)

VRT: A Video Restoration Transformer. Here video restoration (e.g., video super-resolution) sets out to restore high-quality frames from low-quality frames. Unlike single-image restoration, video restoration generally requires the utilization of temporal information from multiple adjacent but usually misaligned video frames. The Video Restoration Transformer (VRT) here has parallel frame prediction and long-range temporal dependency modeling abilities.

Video Deblurring

Source here

Learn more at the Github link here

7. RelTR: end-to-end scene graph generation

Researchers from the Netherlands open-sourced RelTR, a one-stage method via visual appearance only, get only worthy relationships between objects in an image.

Different objects in the same scene are slightly related to each other. However, only a limited number of these relationships are noteworthy. 

The scene graph generation gets viewed as a set prediction problem. The proposed solution is an end-to-end scene graph generation model RelTR having an encoder-decoder architecture. RelTR is a one-stage method that predicts a set of relationships directly only using visual appearance. It does not combine entities and label all possible predicates.

Learn more about ReITR at the Github link here.

Best Computer Vision Projects in Real Life by Omdena

Omdena overcame challenges for people with visual impairment using computer vision. Public transportation, such as buses, is the main tool for individuals to navigate and get to where they want. In this challenge, together with RenewSenses, an Israeli company developing assistive technologies for people who are blind, we will assist people with visual impairment in their experience of catching a bus.

The automated solution uses a computer vision pipeline for detecting buses, bus lines, and empty seats.

Results from this challenge will be directly tested with people with visual impairment through pilots the company is conducting – enabling fast feedback and creating a highly meaningful challenge that will impact the daily-life independence of many.

You can learn more about it in this project.


You can find many innovative computer vision projects on GitHub with source code available. 

Omdena runs AI Projects with organizations that want to get started with AI, solve a real-world problem, or build deployable solutions within two months.

If you want to learn more about us, you can check out all our projects and real-world case studies here.

Ready to test your skills?

If you’re interested in collaborating, apply to join an Omdena project at:

media card
Crop Yield Prediction Using Deep Neural Networks
media card
Comparing Semantic and Instance Segmentation for Weed and Crop Detection