AI Insights

Optimising Water Consumption and Predicting Plant Health using Multispectral Drone Data

February 28, 2022

article featured image

The project partner: The use case stems from Brain Pool Tech which hosted an Omdena Challenge as part of Omdena´s AI Incubator for impact startups.

In 2016, The United Nations declared 2020 as the International Year of Plant Health (IYPH). Amongst a plethora of social issues, the UN decided to highlight plant health as one of the most pressing concerns of our time. Ever wondered why?

Plants make up 80% of the food we eat and produce 98% of the oxygen we breathe. It’s hard for me to summarise the different areas influenced by plants in a single sentence. From feeding a growing world population to protecting biodiversity & ecosystems, the significance of plants is tremendous! So is the protection of plants. In an era of changing climate, rising temperatures, and hungry locusts, the preservation of plants has never been more important.

Step 1 is to determine the status of plant health. To make this step easier and faster, Brain Pool Tech together with Omdena came up with an interesting AI Innovation Challenge.

The Problem Statement

The project aims to build an automated plant health prediction model using drone-derived multispectral and thermal data. The Brainpool team used the golf course as an ideal data set to develop such a model as there is a high degree of moisture/salinity variation across the field. Furthermore, sensors are already a part of the land infrastructure, which facilitates cross-referencing of outcomes. 

Source: Golfbit

Source: Golfbit

A golf course experiences a dynamic environment that exhibits numerous phenomena and effects due to seasons, weather patterns, provision of resources like water, nutrients (fertilisers), etc. Keeping the turf in optimal condition year-round is a challenging task.

Thus, our objective is to utilize multispectral drone imagery to accurately identify problematic regions on the field, optimizing water consumption, and thereby saving time that previously went into manual scanning of turf on the ground.


Soon after the kick-off call, most of us had a vague idea of what the end goal looked like and what we were supposed to do with the dataset in hand. The dataset comprised multispectral drone imagery (April – October), covering most of the year, spanning different seasons and weather conditions. The seven bands were captured using Altum sensors from MicaSense, which include Blue, Green, Red, Red Edge, Near-infrared, Thermal, Transparency. Additionally, field scouts of soil moisture survey data were provided.

Before you read ahead and find yourself uncomfortable with technical jargon. Let me quickly define a few terminologies we’ll be using over & over.

  • NDVI: Normalised difference vegetation index is calculated using a simple ratio of band Near Infrared (NIR) and Red. Ranges from -1 to +1. Higher values indicate healthy vegetation. Lower values indicate bare soil, rocks, etc.
  • DEM: Digital Elevation Model, helps understand elevation.
  • Water stressed: Regions on the golf course that experience stress due to a shortage of water.
  • Waterlogged: Regions on the golf course that are saturated with water.
  • Fairway: The part of a golf course that has short grass and lies between a tee and a green. Our area of interest also goes by ‘Holes’.


It’s intriguing to understand how our approach to addressing this problem has evolved with time. Initially, we viewed the project through the lens of a supervised learning problem. Thanks to ambiguous outcomes and the subjective nature of the data interpretation, we chose instead, to automate labelling the images and utilise unsupervised learning algorithms in order to accurately predict problematic regions on the ground. Our call with Craig completely changed our track of the project.

Craig is the superintendent of the Cypress Lakes Golf & Country Club, with a wealth of knowledge in this area. He was crystal clear on the outcomes and what he expected from us at the end of the project. He desired the AI model to point him to regions that showed water deficits & water surpluses so that he could optimise irrigation & save time in the manual inspection. Thus, considering the complexity of labels, we decided to shift our focus to drought control using an unsupervised approach. 

Further, the article comprises 3 major technical sections:

  • SMI
  • Threshold-based model
  • Clustering Model

SMI estimation

Soil Moisture Index (SMI) integrates thermal data with the Normalized Vegetation Difference Index (NDVI). In other words, it is a way to tie thermal data to vegetated areas. It was used in this project for its potential to identify waterlogged and water-stressed areas, as waterlogged areas should contain high soil moisture and water-stressed areas should have low soil moisture.

SMI was calculated using the Red, Near-Infrared, and Thermal bands of drone-collected data over the study area. The theory says that a Dry edge and Wet edge can be calculated from the scatterplot of Temperature vs NDVI based on the methodology below:

Figure 1: Dry edge and Wet edges in the Surface temperature vs NDVI plot

Figure 1: Dry edge and Wet edges in the Surface temperature vs NDVI plot (Source: Hydrology research)


Tmax =    a1NDVI + b1            warm/dry edge

Tmin =     a2NDVI +b2         cold/wet edge

where Ts is the observed surface temperature (°C) at a given pixel. Tmin is the minimum surface temperature observation for a given NDVI, which defines the wet edge; Tmax is the maximum surface temperature observation for a given NDVI, and ‘a’ and ‘b’ define the dry edge based on a linear fit to the data.


The scripts were written utilizing:

  • Geospatial and scientific python libraries: rasterio, geopandas, pandas, scikit-learn, scipy and numpy. 
  • Zonal_stats from rasterstats
  • QGIS

The SMI pipeline consists of two components:

  • The SMI estimation pipeline (see Figure 2)
  • The SMI validation pipeline (see Figure 3)

SMI estimation pipeline 

Figure 2: SMI estimation pipeline

Figure 2: SMI estimation pipeline (Source: Omdena)

  • NDVI, Thermal bands: NDVI raster grid and the Thermal grid (in Celsius), are the inputs.
  • Crop to features of interest: crops the inputs to water, greens, tees and fairways.
  • Downsample: raster grids are downsampled to smooth some unnecessary noise in the data.
  • Scatterplot to calculate the dry and wet edges: create a scatterplot of Temperature vs NDVI. Based on this scatterplot, the data is binned, sorted and the 95th percentile is calculated for each bin of temperature. Using linear regression, the 95th percentile points across the binned Temperature ranges are used to create the dry edge. The 5th percentile points are used to create the wet edge. 
  • SMI: the output SMI raster grid is exported in .tif format.

SMI validation pipeline

The SMI validation pipeline generates a scatterplot of predicted SMI vs Actual soil moisture with correlation metrics.

Figure 3: SMI validation pipeline

Figure 3: SMI validation pipeline (Source: Omdena)

  • Inputs: SMI and Field data. Both were collected near the same time as the drone image acquisition. 
  • Average SMI: Average SMI cell values around field data points to smooth the variability of an exact location. The projection of the SMI grid was WGS84, in lat long degrees, the radius of the area to smooth should be converted from meters to decimal degrees.  In this study area, a 0.5m radius was 0.000005 deg.
  • SMI validation scatterplot: generates a scatterplot of averaged SMI values vs the field data points with a 1:1 line.  The points are coloured by type (fairways, greens, tees) and the R and R-squared values are displayed.


The results of both pipelines, SMI estimation and validation, were tested with drone imagery and field data collected over Field 1 (Holes 2 – 8).

Figure 4: Temperature vs NDVI scatterplot of 2021.06.16 data, Field 1

Figure 4: Temperature vs NDVI scatterplot of 2021.06.16 data, Field 1 (Source: Omdena)

In the scatterplot for Temperature vs NDVI, the dry edge (also called warm edge) and wet edge (also called cold edge) are inclusive of the range of data points.  In 2021.06.16, the water temperature was cool, and some temperatures of the greens approached 0degC.  The dry edge was extended by a value of 5 to include the high NDVI valued greens.

Figure 5: SMI prediction of 2021.06.16 data, Field 1

Figure 5: SMI prediction of 2021.06.16 data, Field 1 (Source: Omdena)

In the map on 2021.06.16, the cropped area correctly shows the water bodies as blue, indicating SMI values of 1.  The fairways show SMI estimates in the 0.5-0.7 range and greens as erroneously highly variable from 0.3 to 0.9.  

Using the SMI validation results, the greens have the most spread in predicted values, although it would be expected they would be a tighter range.

Figure 6: Predicted vs observed soil moisture scatterplot of 2021.06.16 data, Field 1

Figure 6: Predicted vs observed soil moisture scatterplot of 2021.06.16 data, Field 1 (Source: Omdena)

Figure 7: Soil moisture field data points and SMI prediction of 2021.06.16 data, Field 1

Figure 7: Soil moisture field data points and SMI prediction of 2021.06.16 data, Field 1 (Source: Omdena)

Results of this analysis suggest poor correlation between SMI prediction and observed soil moisture with Field Scout probe data.  It may be attributable to numerous factors. One might be the calibration of the surface temperature Thermal band to the Field Scout ground temperature. In this example of the greens on Hole #2, NDVI is not measured in the field, however the drone-based NDVI is in expected ranges of 0.6 to 1, however the probe measured temperatures of 5 and 6 deg C and the Thermal band had ranges of 1.5 deg C in a tree shadow to 21 deg C.  The probe measured soil moisture ranged from 25% (0.25) to 40% (0.4) whereas the resulting SMI prediction had values of 0.1 to 0.8 SMI.  Could the poor prediction be a result of thermal/temperature mismatch?

Figure 8: Temperature and soil moisture field data points and NDVI, Thermal and SMI prediction of the greens on Hole #2, 2021.06.16

Figure 8: Temperature and soil moisture field data points and NDVI, Thermal and SMI prediction of the greens on Hole #2, 2021.06.16 (Source: Omdena)

Threshold-based model


The script utilized geospatial and scientific python libraries: rasterio, geopandas, pandas, scikit-learn, scipy and numpy. 

Figure 9. Thresholding model pipeline

Figure 9. Thresholding model pipeline (Source: Omdena)

The model comprises the following steps (see Figure 9):

• Downsample: images are reduced in resolution to allow a shorter processing time.

• Rescaling: images are rescaled to enable the polygonization step to be able to identify the pixels with an interval of 1 to be categorized individually.

  • NDVI images: from [-1 to 1] to [0 to 30]
  • Thermal images: from [0 to 382.35]  to [0 to 800]

• Masking: by applying a mask, the required ROIs are extracted from the images.

• Polygonization: the masked images are converted to vector format in order to generate geometries.

• Thresholding: the threshold values are estimated based on the ROIs, following the criteria:

  • Unhealthy: all values below the third quantile from NDVI data points are estimated as unhealthy.


  • Water-stressed: all values above the third quantile from thermal images are estimated as water-stressed.


  • Waterlogged: Both NDVI and Thermal images are thresholded to identify unhealthy and thermally waterlog prone regions, respectively. Then, both geometries are combined with a Depth one to find the commonly identified regions as waterlogged.



The results obtained with this model have been appraised with the information provided from the golf course superintendent.

The identification of  ROIs is obtained by applying this model on the pixel level of the drone images.

Figure 10 shows the identified waterlogged areas (on the right) on Fairway 2 that correspond to the noticeable brown regions displayed on the RGB image on the left. 

Figure 10. Waterlogged regions on Fairway 2 from 16-06-2021. Left: RGB image, Right: Identified waterlogged regions

Figure 10. Waterlogged regions on Fairway 2 from 16-06-2021. Left: RGB image, Right: Identified waterlogged regions (Source: Omdena)

In Figure 11 (on the right) it can be seen the identified water-stressed areas which correspond to the dark red regions on the thermal image (on the left). 

Figure 11. Water-stressed regions on Fairway 7 from 23-04-2021. Left: Thermal image, Right: Identified water-stressed regions

Figure 11. Water-stressed regions on Fairway 7 from 23-04-2021. Left: Thermal image, Right: Identified water-stressed regions (Source: Omdena)

Figure 12 shows the average threshold values for the thermal layer in the winter and autumn seasons. This step was to verify the trends in the threshold selection. The “maximum temperature” and “minimum temperature” curves show the average maximum  and minimum temperature in the image for each Hole during the season. Also, the “average season temp” curve represents the average day temperature recorded. Finally, it can be observed that the “average water stress” and “waterlogged” thresholds always remain between the minimum and maximum temperature range.

Figure 12. Average thermal threshold values in the winter (above) and autumn (below) season.

Figure 12. Average thermal threshold values in the winter (above) and autumn (below) season (Source: Omdena)

For more details on the applied theoretical concepts, check out Data-Centric AI for a Sustainable Water Irrigation System.

Clustering Model

The script utilised geospatial and scientific python libraries: geopandas, gdal, scikit-learn and numpy.

Several modelling experiments were carried on unsupervised methods like DBSCAN & K-Means clustering. We decided to go ahead with K-means as DBSCAN yielded unsatisfactory results.

Source: Omdena

Source: Omdena

The model runs one fairway at a time. We utilize NDVI, Thermal layers, and Slope in the final model. The slope is calculated from the DEM layer and is a helpful attribute that gives an account of flatter & steeper regions on the course. Areas at the bottom of an inclined slope are more prone to waterlogging.

As can be seen from the pipeline the model includes the following steps:

Masking the raster using fairway geometries to extract the Regions of Interest.

Filtering: Data points are filtered with value ranges on NDVI and Thermal features to remove outliers and focus on the more varying mid-range distribution of NDVI.

Scaling is performed using the scikit learn library to make it convenient to form clusters.

Clustering: Points are clustered using the K-Means algorithm with k=4. However, the elbow method gave k=3 as the optimal number of clusters for some fairways. Our requirement lied in identifying 4 problematic regions on the field. Besides, the optimal number of clusters varied with different fairways based on the variation in the band values. 

Our visual evaluation favored more graduated outputs and eliminated too small or too big clusters.

Post-processing involves further two steps:

  • Sieve is used to remove the noisy insignificant clusters that provide no value. The algorithm removes clustered regions that encapsulate a number of pixels less than a given threshold. For e.g, the following image compares the original clustered image (left) to the one that is filtered at a threshold of 500 (right image).


  • Polygonize: The sieved clusters are converted to vector format for more convenient interfacing with user-facing modules.


The clustering results from the model are inspected against water-logged regions previously identified in a meeting with the superintendent. The following image showcases the clustering results from the 17th of June. 

Source: Omdena

Source: Omdena

These results display balanced clustering sizes and spatial distribution. It becomes a functional tool for the superintendent that quickly highlights the areas that need closer inspection. The spatial distribution of the class also provides insights into emerging patterns.

Classes 0 to 3 in the clustered output are sorted based on NDVI. It generally implies the same order of progress in overall plant health.

We also obtain the cluster centroids for every hole. We can further derive insights into the clustering models operation by inspecting the centroids for a clustered field. The following figure shows centroid values for holes 1 through 8 on 2021-6-16. The NDVI layer has the highest clustering weight, so its centroid values are more spread out than the other two channels.



Limitations & Future Scope

As discussed earlier, the nature of the project’s goal and the available data made room only for qualitative evaluation. Thus, accomplishing the following work soon can be advantageous:

  • Formalizing the objective more with specific ground truth data
  • More data collection (At least covering a full year)
  • Using the SMI grid as an input to Threshold & Clustering models
  • Capturing more aspects of the system
    • Record water sprinkler configurations/amount of watering
    • On-ground soil moisture sensors
    • Capture information with higher frequency


In an attempt to automate plant health prediction & optimize water consumption in the field, we have performed extensive research into different approaches. Supervised Learning methods proved ineffective, so unsupervised algorithms were utilized.

The final output of the project are three models:

  • SMI estimation using the triangle method
  • Threshold-based model for identifying water stress and waterlogging
  • K-Means Clustering model for general health clustering on each hole of the course

Besides learning the tools & technologies used in this project, we realized how to effectively collaborate in a team, brainstorm to ask insightful questions and cooperate to join forces in different time zones. Undeniably, an experience we will never forget but forever cherish!

This article is written by Amir Memon.

Want to work with us too?

media card
Interactive Geospatial Mapping for Crime Prevention
media card
AI-Driven Sustainability Solutions in a Changing World
media card
Feasibility and ROI Analysis for Renewable Resources Infrastructure using Computer Vision
media card
Detecting Automatic Lake Encroachment using Machine Learning and Remote Sensing in Chennai