Projects / AI Innovation Challenge

Increasing Drug Safety By Detecting Anomalies in Clinical Data Using Machine Learning

Project completed!

Omdena Featured image

This Omdena AI Challenge is to build an online service to detect anomalies in clinical data. This helps democratize clinical data anomaly detection and make devices and drugs safer, more effective, and accessible for patients and physicians. The challenge partner Flaskdata is an Israeli tech startup that specializes in helping life sciences companies complete their Phase 2 and Phase 3 clinical trials

The problem

The world of real-world clinical data is exploding. The variety, volume, and velocity of this type of data are growing exponentially. Our clinical data is being generated all the time, by multiple sources at varying times. The result is an endless body of data, unsynchronized in place, time, and doctor visits. According to Flaskdata, this brings up the following questions: 

Can we rely on this data to make decisions? Can we quantify the risk? Is there a more relevant question for a $1.5 Trillion/year life science industry and a world where over a million people have died from SARS-COV-2 and FDA is tightening guidance on safety follow-up for vaccine development? Where a major global pharma discovered that 20% of the subjects in their Corona vaccine trial were mis-dosed, just before FDA submission?

Regulatory agencies, the global life science industry, and the big tech players all understand the immense value of our real-world clinical data. Amazon, Google, Apple, and Microsoft are intensely engaged in healthcare data processing and delivery.

Can we rely on tech companies to use our clinical data for decisions that affect our lives without independent validation?


The team built an API to detect anomalies in structured clinical data (not free text or images) for two use cases: clinical trials and connected devices.

The algorithm(s) behind the API should readily work on high-dimensional data, be model-free, and scale well.

Use case 1: Clinical trial data

Data from clinical trials are timestamped and typically has a large number of dimensions (300 to over 3000) when compared with textbook use cases of anomaly detection in online commerce or factory process control. Data is often sparse in the dimensions because of missing data or data model items that are not relevant for every patient. 

There are a number of unique challenges with clinical data vs. textbook problems:

  • Physiological data is not necessarily a stationary process
  • Humans (patients and investigators) do unexpected things
  • Each combination of therapeutic and medical indication is unique unlike the textbook anomaly detection (AD) cases of revenue/machine anomalies.
  • At the beginning of a clinical trial, there is no training set
  • Small data sets (average 350 patients Phase 1-4, 50,000 large Phase 3)
  • Issues of bias
  • Use AD to assess the reliability of the data, the efficacy of the therapeutic, and monitor patient safety.

We used an ensemble approach to combine the detection of outliers in multidimensional data points, time-series drifts, and spikes as well as therapeutic-user-specified rules. For example, in psychiatric studies, some patients may report suicidal tendencies using the QIDS form using a mobile app. This is an example of a therapeutic-user-specified rule.

Use case 2: Connected devices

Time-series data from connected wearable devices, watches, connected medical devices. Connected wearables may be standalone devices used by consumers or devices used in clinical trials to monitor efficacy, compliance, and patient safety.

  • Very large data sets, a small number of parameters
  • Current work on fall and stress detection does not address the full potential for clinical data AD with devices.
  • Use AD to assess the reliability of the data.
  • Use AD to assess the suitability of patients to participate in a clinical trial (so-called ‘pre-screening’)
  • Use AD to monitor the clinical trial participants for adverse events/safety issues (for example consistently rising or dropping blood pressure may be indicators of a developing serious adverse event to the patient.

The project outcomes

A RESTful API service for automated detection of anomalies in clinical data. This will help democratize clinical data anomaly detection and make devices and drugs safer, more effective, and accessible for patients and physicians.

Join our challenges here.

And find all our community benefits here.

Join the Omdena community to make a real-world impact, develop your career, build a global network, get mentoring and support, earn money through paid gigs, and many more opportunities

Your benefits

Address a significant real-world problem with your skills

Get hired at top companies by building your Omdena project portfolio (via certificates, references, etc.)

Access paid projects, speaking gigs, and writing opportunities


Good English

A good/very good grasp in computer science and/or mathematics

Student, (aspiring) data scientist, (senior) ML engineer, data engineer, or domain expert (no need for AI expertise)

Programming experience with C/C++, C#, Java, Python, Javascript or similar

This challenge has been hosted with our friends at

Application Form
media card
Optimizing the Accuracy & Explainability of Medical Insurance Claim (Fraud, Waste and Abuse) FWA Detection by Leveraging AI & Anomaly Detection
media card
Analysis and Prediction of Machine Maintenance Using Machine Learning Powered by AfriMine
media card
Health Insurance Fraud Detection Leveraging AI & Anomaly Detection

Become an Omdena Collaborator

media card
Visit the Omdena Collaborator Dashboard Learn More