AI Insights

Deploying a Model Using Docker as Endpoint in a Pathology Mobile App

May 14, 2021

article featured image

The article shows steps for deploying a model with flask, creating a Docker container so that it can be easily deployed in the cloud, and creating an offline pathology mobile app so that it can be used in places without a stable internet connection like in some places in Africa. Check the mobile app in the results below.

Problem Statement

We have participated in Detecting Pathologies Through Computer Vision in Ultrasound Omdena challenge to build an Ultrasound solution that is able to detect the type and location of different pathologies. The solution works with 2D images and also is able to process a video stream.

Identify the presence of a specific pathology on the ultrasound image and provide the location of the pathology with bounding box coordinates and mask. Ultrasound is a relatively inexpensive and portable modality of diagnosis of life-threatening diseases and for use in point of care. This will assist to deliver impactful and feasible medical solutions to countries where there are significant resource challenges.

Inference Pipeline

We deploy a model using Docker container with REST-enabled services that receive an image, do some processing if needed, predict the model output, and sends as bytes or JSON.

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly.

The model we are deploying is a Mask R-CNN (Region-Based Convolutional Neural Network) model that gives mask, bounded box coordinates, type of pathology, and their score (i.e probability).

Source: Omdena Inference Pipeline

Source: Omdena Inference Pipeline

Import the required libraries: flask is for API, flassger is for integration with swagger documentation, NumPy is for some array processing, PIL is for image processing and ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as sci-kit learn, and more.

from flask import Flask, request, jsonify
from flasgger import Swagger
import numpy as np
from PIL import Image
import onnxruntime as rt

We initialize the flask app and write a template for swagger. We also write a function that checks the filename for type and allows only png, jpg, jpeg file types.

app = Flask(__name__)
swagger = Swagger(app, template={
 "swagger": "2.0",
 "info": {
  "title": "Inference",
  "version": "1.0.0"
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}
labels = ["Normal","Benign","Malignant"]
def allowed_file(filename):
  return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

Load the model using onnxruntime. ONNX Runtime reads a model saved in ONNX format. The main class InferenceSession wraps loading and running models functionalities in a single place. And we can use run method of the class to compute the prediction. Model loading happens only when the docker container starts. We can use different models and different inference runtimes (such as onnx or tensorflow). When adding models the increased memory consumption and inference time should be considered according to the available resources for deployment.

model = rt.InferenceSession('model.onnx')# path is relative

The route decorator in Flask is used to bind URL to a function. We add a route and the method for it and define a function for it. The data inside “”” is used by swagger for showing the description and parameter and sample responses. We then check for the type of image file and return 400 if the input is not supported. Otherwise, we load the image and predict the outputs. We send mask as png image, label and bounded box as headers.

Flasgger comes with embedded Swagger UI so you can access http://localhost:5001/apidocs and visualize and interact with your API resources. It also provides validation of the incoming data. We can add what type of responses can be expected too. A response is defined by its HTTP status code and the data returned in the response body and/or headers.

def predict():
  Upload Image and get mask, bounded box coordinates and label     
    - in: formData
      name: image
      type: file
      required: true
      description: gets output
      description: input not supported
  image_file = request.files[‘image’]
  if image and allowed_file(image.filename):
    pil_img =
    image = np.array(pil_img)
    img = img[np.newaxis, ...]
    pred =,{'input': img.astype(np.float32)})
    data = pred[0]
    data = cv2.normalize(data, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
    data = data.transpose((1,2,0))
    mask = Image.fromarray(data,'RGB')
    output = io.BytesIO(), format="PNG")
    response = Response(output.getvalue(), mimetype = 'image/png')
    response.headers['Label'] = labels[pred[1]]
    response.headers['Bounded_box-Coordinates'] = pred[2]  
    return response, 200
  return 'only png, jpg, jpeg file types are allowed', 400

We can send it as JSON by converting the mask image to base64string. We need to import base64 as a dependency. The purpose of converting to base64 string is to send multiple images as output. We can add heat maps or mask outline as outputs and send them alongside the mask image.

    encoded_mask = base64.b64encode(output.getvalue())
    response = jsonify({'mask':encoded_mask.decode('utf-8')})

The flask app, port, environment are mentioned in .flaskenv file

And then we run the app

if __name__ == '__main__': = True)

Swagger URL: http://localhost:5001/apidocs/

Source: Omdena Swagger Documentation for Inference Pipeline - Pathology Mobile App

Source: Omdena Swagger Documentation for Inference Pipeline


Parameters specify the input. In the multipart form data, the incoming request should have an image file in binary format. We can use the requests library to send requests and open to read data from the file and send it.

import requests'',files = {'image': open('file_path/file_name.png','rb')})

The input and output are shown

Source: Omdena Inference Pipeline Inputs and Outputs

Source: Omdena Inference Pipeline Inputs and Outputs

Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host. Containers are lightweight and contain everything needed to run the application, so you do not need to rely on what is currently installed on the host.

Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers. A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use. An image is a read-only template with instructions for creating a Docker container. A container is a runnable instance of an image.

Source: Docker Architecture

Source: Docker Architecture

We place all libraries and their versions in requirements.txt file and use Docker file to create docker image.

The requirements.txt


The Dockerfile

FROM python:3.8-slim-buster
WORKDIR /user/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "flask", "run" ]

Build the image. A docker_id is only required if you want to push the image

export DOCKERID=docker_id
docker image build -tag $DOCKERID/project_name:project_version .

Push the image after building it

docker login
docker push $DOCKERID/project_name:project_version

Run or pull the image

docker run -p 5001:5001 — network 'host' -rm docker_id/project_name:project_version

Pathology Mobile App

An YOLOYv5s model was trained using Pytorch, and later converted to torchscript for deployment in Android. Android Studio was used in the development and Pytorch Android API was used to load, run the model, and get the predictions within the Android environment. The deployment workflow followed PyTorch’s mobile development guides.

PyTorch’s workflow for Android development and deployment (

PyTorch’s workflow for Android development and deployment (

Steps to deploy a trained model to a pathology mobile app (Android)

Step 1: As shown in the figure above, the first step is to convert the pytorch trained model to torch script. The pytorch code is then converted to serializable and optimizable models as follows

ts_module = torch.jit.trace(model, data)"")

Step 2: In android development editor, include the android torchvision dependencies in gradle. pytorch_android is the main dependency and pytorch_android_torchvision has the utility to convert Image and Bitmap to tensors.

dependencies {
   implementation 'org.pytorch:pytorch_android:1.7.0'
   implementation 'org.pytorch:pytorch_android_torchvision:1.7.0'
final Tensor itensor = TensorImageUtils.bitmapToFloat32Tensor(bmap, PrePostProcessor.NO_MEAN_RGB, PrePostProcessor.NO_STD_RGB);

Step 3: Loading the model and run the inference

mdl = PyTorchAndroid.loadModuleFromAsset(getAssets(), "");
IValue[] outputTuple = mdl.forward(IValue.from(inputTensor)).toTuple();


The benefit of deploying the model itself to a pathology mobile app without using any API is to make it work in the offline mode. The purpose was to empower the health workers working in remote areas by providing a tool that can work without an internet connection. And mobile is the best option for it which can be carried and transported easily.

Source: Omdena EndPoint and Pathology Mobile App

Source: Omdena EndPoint and Mobile App

Source: Omdena EndPoint and Pathology Mobile App

Source: Omdena EndPoint and Mobile App


The article showed steps for deploying the model with Flask, creating a Docker container so that it can be easily deployed in the cloud, and creating an offline pathology mobile app so that it can be used in places without an internet connection like Africa.

This article is written by Gerald Okioma, Gowthami Wudaru, Shashi Gharti.

Want to work with us?

If you want to discuss a project or workshop, schedule a demo call with us by visiting:

media card
Unlocking Secrets of the Mind: AI’s Potential in Early Alzheimer’s Detection
media card
Improving Data Privacy Through Federated Machine Learning
media card
Using Satellite Imagery to Detect and Assess the Damage of Armyworms in Farming
media card
Top 66 Innovative Medical Imaging Companies to Follow in 2024