AI Insights

Overcoming Data Challenges in AI Projects through Diverse Teams

March 4, 2022


article featured image

On this demo day, we talked about the inevitable data challenges/roadblocks that come up in real-world AI projects. The insights shared came from our experiences with more than 20 AI projects, working with partners including the UN Refugee Agency (UNHCR), the World Resources Institute, the World Energy Council, and numerous NGOs and corporations.

Omdena is a collaborative platform to build innovative, ethical, and efficient AI solutions to real-world problems. Collaborators from around the world come together on Omdena projects to address significant issues related to hunger, sexual harassment, land conflicts, gang violence, wildfire prevention, and energy poverty.

We’ve seen that the way that we approach AI development, via bottom-up collaboration with diverse team members, fosters innovation and creativity which leads to the breakdown of data roadblocks. Innovation is inherent in the Omdena process.

We shared three Omdena projects to act as case studies for these innovative approaches to tackling data challenges.

Data Roadblock 1: Incomplete Data Sets

In the real world, datasets are rarely complete. We find having large teams of dozens of people means that data gathering, cleaning, and wrangling happen at a phenomenal speed. And by taking a bottom-up approach, we have multiple sub-teams looking at data problems from different angles, allowing for innovative approaches to be explored.

In the following case study, the Omdena team worked out ways to identify safe routes in a city in the aftermath of an earthquake, where the relevant data sets were inconsistent and unreliable.

Case Study: Disaster Response: Improving the Aftermath Management of an Earthquake

In collaboration with Istanbul’s Impact Hub innovation center, Omdena data scientists combined satellite imagery of Istanbul with street map data in order to build a tool that facilitates family reunification by indicating the shortest and safest route between two points after an earthquake.

“Omdena´s approach to AI development is by far the best that I have seen in 2019” — Semih Boyaci, Co-Founder Impact Hub Istanbul

Data Roadblock 2: No Data

We don’t see the lack of data as a showstopper. On those projects without data, the team starts by asking what do we need to know to address the problem. Where might that data live? If it doesn’t exist, how can we create it from something that does exist? Here the diversity of the team members is very powerful.

We’ve seen time and again the impact of bringing together people with vastly different professional and life experiences. Our teams are typically 30% or more female. On any project, we’ll have on average 14 countries represented. Our collaborators range in age from 17 to 65. Not only does this diversity lead to ethical and trusted solutions, but it also fosters creativity and alternative ideas about what data is relevant and where to find it.

In the following project, we looked at how to assess post-traumatic stress disorder among those that have suffered trauma in low-resource environments. In this case, the team started with no data in-hand.

Case Study: Building a chatbot for Post-traumatic-stress-disorder (PTSD) assessment

32 Omdena collaborators developed a machine learning-driven chatbot for PTSD assessment in war and refugee zones.

The unique aspect of the project was that we did not start with a data set.

Through the collaborative efforts of the project community, the team identified and annotated suitable patient data. The teams applied linear classifiers for Natural Language Processing (NLP) for PTSD risk assessment and transfer learning for data augmentation.

Data Roadblock 3: Disparate Data Sources

Relevant data doesn’t typically come packaged in just one form. We often need to meld disparate data sources to get at a solution. Through collaboration, sub-teams focused on separate data and AI techniques come together to integrate those efforts to derive insights into the problem.

In the following project, the goal was to uncover domestic violence in India hidden due to COVID lockdowns. Among the many challenges the team addressed was the integration of data culled from disparate sources.

Case Study: Analyzing Domestic Violence through Natural Language Processing

This project was done with the award-winning Red Dot Foundation. Within Omdena’s collaborative platform, the team looked craft a dataset to reveal domestic violence and online harassment patterns in India during COVID-19 lockdowns. The AI experts scrapped data from news articles as well as social media to apply various natural language processing (NLP) techniques such as topic modeling, document annotations, and stacking machine learning models.

 

[embedyt] https://www.youtube.com/watch?v=rEM1RaYgVPw[/embedyt]

 

Case Study: Analyzing Land Conflicts and Government Policies through NLP

The project aimed at addressing land disputes in India by leveraging the power of machine learning and natural language processing (NLP). The team built a machine learning-driven visualization app that matches land conflict events from news articles with mediating government policies. This enables policymakers to make data-driven decisions and resolve land conflicts faster, save resources, and facilitate environmental sustainability efforts.

Build a web-based tool for the visualization of the conflict event and policy matches.

A web-based tool for the visualization of conflict events and policy matches.

They used NLP techniques to extract key information from unstructured text and fed this data into a machine learning model to predict the likelihood of land use conflicts. The model was trained on a diverse dataset of past land disputes and validated on real-world cases. The project has the potential to contribute to resolving land use conflicts in India, promoting social justice and economic growth

You can learn more about this project here:

Case Study: Helping People with Visual Impairment to Find Use Buses through Computer Vision

The team utilized computer vision techniques to develop an AI-powered solution that could assist individuals with visual impairment in navigating their surroundings. The project involved analyzing large amounts of image and video data to identify and classify objects, landmarks, and obstacles in real time. The team created a machine learning model that could accurately detect and locate these objects, enabling the development of a navigation system that could provide voice-guided directions to the user. The project has the potential to improve the quality of life for people with visual impairment, enhancing their independence and mobility in their daily lives.

[embedyt] https://www.youtube.com/watch?v=I_bVzTjETbk[/embedyt]

You can learn more about this project here:

Case Study: Digitizing Case Management and Risk Scoring for Cross-Border Child Protection

The team of 40 Omdena collaborators developed a solution in just eight weeks that can aid in case of management, benefiting families in need. Initially, access to expert knowledge was limited by confidentiality agreements, but the team gathered over 230 publicly available cases on child protection and abuse through collaborative efforts. Using various Natural Language Processing (NLP) techniques, the team made the data usable and developed an easy-to-use web application with essential information. With this AI-powered tool, caseworkers can acquaint themselves with cases more quickly and access the collective experiences of colleagues worldwide.

 

A snippet of the web application

A snippet of the web application

Case Study: Retail Customer Journey Analysis Using Edge Computer Vision on CCTV Cameras

Omdena conducted a project that utilized edge computer vision to enhance the analysis of the retail customer journey. The project involved developing a machine learning model that could analyze video data from in-store cameras to track customer movement and behavior. The team used edge computing techniques on CCTV cameras, eliminating the need for expensive cloud computing resources. By analyzing customer behavior, such as dwell time and interactions with specific products, the model could provide insights into customer preferences and identify areas for improvement in store layout and product placement.

The project has the potential to revolutionize retail customer journey analysis, providing retailers with valuable insights into customer behavior and preferences that can inform business decisions and improve the customer experience. The project also demonstrates the power of edge computing for running sophisticated AI models in resource-constrained environments.

Provide Customer Journey Analysis Using CCTV Cameras & IoT

Provide Customer Journey Analysis Using CCTV Cameras & IoT

Case Study: Applying NLP to Identify Financial Incentives for Forest and Landscape Restoration

Omdena collaborated with the World Resources Institute (WRI) on a project to use Natural Language Processing (NLP) to identify financial incentives for forest and landscape restoration in Latin America. To accomplish this, the team needed to create a dataset of 700,000 PDFs. The team initially had a starting dataset of a few dozen PDFs, but it was not enough to train the NLP models. To retrieve more policies, we used Scrapy and Selenium to access the websites of the Federal Official Gazettes, but there were too many states and regions to access each of the Official Gazettes for all of them.

The goal of the project was to mine policy documents using NLP to promote knowledge sharing between stakeholders and enable the rapid identification of incentives for policy change that could restore degraded land more quickly.

You can learn more about this project here:

Want to work with us?

If you want to discuss a project or workshop, schedule a demo call with us by visiting: https://form.jotform.com/230053261341340

media card
How We Leveraged Advanced Data Science and AI to Make Farms Greener
media card
The Ethics of AI Data Collection: Ensuring Privacy and Fair Representation
media card
Predicting Short-term Traffic Congestion on Urban Roads Using Machine Learning
media card
Vehicle Image Analysis and Insurance Fraud Prevention through EDA Techniques