AI Insights

How We Leveraged Advanced Data Science and AI to Make Farms Greener

April 20, 2024

article featured image

Kelly, Agreed Earth’s CEO, is speaking to a UK Regenerative Farmer

The Problem

Farming, especially large scale farming, is not as green as you’d think. The activities encompassing modern-day farming like crop production, livestock farming, and land-use change, significantly contribute to greenhouse gas (GHG) emissions and other environmental impacts. 

The global imperative to address climate change and promote sustainable development has placed financial institutions, particularly banks, at the forefront of supporting environmentally responsible practices through their investment practices. Sustainable Finance, the practice of using the resources available to banks and other financial institutions in a way that encourages and rewards sustainable practices, is becoming increasingly popular.

In the realm of sustainable finance, accurately estimating and reporting on emissions from financed agricultural activities have emerged as a critical challenge. Understanding the emissions associated with agricultural financing is vital for banks to assess their environmental footprint, inform investment decisions, and drive positive change in the agricultural sector. However, this task is fraught with complexities and obstacles that necessitate the use of innovative approaches.

The Background

Ai in Agriculture

Agricultural activities contribute approximately 30 percent of the world’s overall greenhouse gas emissions, mainly due to the use of chemical fertilizers, pesticides and animal wastes. This rate is bound to further rise as a result of an increase in the demand for food by a growing global population, the stronger demand for dairy and meat products, and the intensification of agricultural practices. A lot of agriculture also includes the conversion of non-agricultural land such as forests into agricultural land. Emissions of nitrous oxide and methane make up over half of total greenhouse gas emission from agriculture.

The Goal

In this article, we will dive into the “Agreed Earth” Omdena AI Innovation Challenge in which we developed a machine learning model for estimating greenhouse gas emissions from farming by leveraging both synthetic data generated by simulating biochemical processes and ground-truth data obtained from actual emissions measurements. Improved availability and accuracy of estimates on greenhouse gas emissions can help farmers adopt sustainable practices and to support banks in their commitment to sustainable finance.

Challenges in Estimating and Reporting Agricultural Emissions

The Agricultural Sector is unique in its processes and workings. As a result, there are complexities involved in data collection, emissions quantification, and reporting. There is a critical need to promote regenerative agriculture and mitigate the adverse effects of synthetic fertilizers on the nitrogen cycle, soil health, water quality, and the environment. We identified the following key challenges based on insights from publications by the Principles for Responsible Investment [1], Ceres [2], and the Task Force on Climate-related Financial Disclosures (TCFD) [3,4]:

1. Data Availability and Quality

Banks face the obstacle of ensuring comprehensive and accurate emissions data from various actors in the agricultural supply chain. To enable reliable emissions estimation, it is essential that the data collection processes are robust, standardized, and transparent.

2. Scope and Boundaries

Defining the scope and boundaries for emissions estimation within agricultural supply chains is critical. Collaborating with stakeholders is necessary to establish consistent methodologies and determine the responsibilities of different actors in the supply chain.

3. Measurement and Verification

Accurately quantifying and verifying emissions across diverse agricultural supply chains requires appropriate measurement methodologies and consistent application. Verification of emissions data enhances credibility and reliability.

4. Supply Chain Complexity and Transparency

Agricultural supply chains are complex, involving numerous actors and intermediaries. Banks must navigate this complexity and foster transparency, collaboration, and data sharing among stakeholders to trace emissions effectively.

5. Integration of Emerging Technologies

Adopting and integrating emerging technologies such as remote sensing, satellite imagery, Internet of Things (IoT) sensors, and specialized tools for assessing GHG emissions, can improve estimation accuracy. Overcoming technical barriers, data compatibility issues, and cost considerations are necessary steps to leverage the potential of these technologies effectively.

6. Alignment with Reporting Standards

Aligning emissions estimation and reporting with recognized reporting standards, such as the TCFD guidelines, enhances transparency and comparability. Climate-related financial disclosures provide consistent and standardized information to stakeholders for informed decision-making and risk assessment.

Our Approach

This Omdena Challenge project aimed to address some of the challenges listed above by developing a machine learning-based system for estimating nitrous oxide (N2O) emissions from input data, such as soil properties, weather conditions, and crop information. The initial part of the project focused on exploring and analyzing available data sets, particularly satellite data. We identified available APIs for accessing satellite data and gathered information on their functionalities and wrote short summaries. We also acquired information on the characteristics of available satellites, including the spatial and temporal resolution and available bands.

Name API Link Satellite Data Available
Google Earth Engine Landsat
STAC STAC Catalogs
Satellite Imaging Corporation NDVI
Planet Explorer Includes imagery from Planet’s catalog (PlanetScope, SkySat, and RapidEye) as well as public imagery from Sentinel-2 and Landsat 8.
SentinelSat python API Sentinel satellite images

Available satellite data APIs

Characteristics of available satellites

Knowledge-Guided Machine Learning (KGML)

The project employed the Knowledge-Guided Machine Learning (KGML) framework to enhance N2O emission prediction, crucial for advancing sustainable farming practices. By blending synthetic data with scientific models, KGML aimed for more accurate forecasts by combining scientific principles with data-driven methods. It addressed limitations of existing systems like Ecosys and DNDC, known for their complexity and outdated code bases, thus enabling more effective agricultural practices.

KGML revolutionized model training by first learning from synthetic data generated through process-based simulations and then fine-tuning using ground-truth emissions data. Despite initial challenges in model architecture selection and data availability, the project adopted a sophisticated approach prioritizing direct mapping of relevant variables to N2O emissions. Through meticulous dataset preparation and architectural modifications, the project not only overcame data dependencies but also enhanced predictive accuracy, establishing a robust methodology for N2O flux prediction in agriculture.

Figure 1: Two KGML architectures. The left architecture stacks layers of GRU units and directly maps fertilizer rate, soil and crop properties, weather conditions, and IMVs to the N2O flux. The right architecture contains two independent modules of GRU layers, one for predicting IMVs and the other for predicting the N2O flux (reproduced and adapted from Ref. [6]).

Data Collection

Two types of data were required for the development of the KGML model for predicting N2O emissions: 

  1. Input data for DNDC to generate synthetic data for pre-training, and 
  2. Ground-truth data for fine-tuning.

DNDC Input Data for Pre-Training

To run DNDC to generate synthetic data, we used the GUI interface of the DNDC software to  supply location-specific input data on climate, soil characteristics, vegetation, and management practice. A tabular dataset was created including daily and annual climate data for the selected years, various soil properties for the specific locations, and crop and management practices for each year. Vegetation was assumed to be crops, and management practices (tillage and fertilizer application) were set specifically for the tests/experiments considered. Multiple DNDC runs were performed for a range of configurations to capture various scenarios.

Running DNDC simulations with these input data generated output files containing daily values for each variable for the selected site, including soil temperature, moisture, oxygen content, microbial activity, pools and fluxes of elements (carbon, nitrogen, phosphorus), soil water, field management, crop information, and grazing. These outputs were essential for further analysis, ensuring a robust dataset for training the KGML model to predict N2O emissions.

Ground-Truth Data for Fine-Tuning

We obtained the ground-truth N2O flux measurements for real UK sites from GHG Nitrous Oxide Datasets in the Agricultural and Environmental Data Archive (AEDA). The raw files from these datasets included most of the required input variables, such as geographical coordinates and daily values for soil moisture, soil mineral nitrogen, rainfall, and air temperature. However, they lacked some crucial variables, which we obtained from other sources. For example, we obtained wind and humidity data from NASA’s data access viewer for the specific sites of the experiments where the N2O flux was measured.

To prepare the dataset for fine-tuning, the raw data was split into time series samples based on location, block number, and treatment. The Harmonized World Soil Database (HWSD) provided sand and silt content, which was used for selecting, renaming, unit-converting, and calculating the variables needed for the model. The datasets were then arranged as a full-year time series for each input variable, with missing time steps generated and filled with known constants and values from the weather data sources. The missing NH4 flux values were filled using interpolation, and the missing N2O flux values were imputed with values predicted by DNDC.

Training and Results

With design choices and definitions as well as clean data, we had everything in place for the two steps required to train our KGML model. The synthetic data obtained from DNDC was used for the first training step, and the ground-truth data for the UK sites was used to fine-tune the model.

The results from our trained model indicate that the machine learning approach has the potential to overcome the limitations of relying solely on process-based models. By combining the ground-truth data with the synthetic data coming from DNDC, we can mitigate the challenges of data scarcity in certain locations. These findings are consistent with recent publications from the scientific community and aligned with the guidelines of the Intergovernmental Panel on Climate Change (IPCC).

By utilizing the KGML model, banks can improve their estimation and reporting of emissions from financed agricultural activities. This solution leverages synthetic data to train the machine learning model, allowing it to learn from existing scientific knowledge in addition to real-world observations.

Results from pre-training (top) and fine-tuning (bottom) after 1000 epochs.

Results from pre-training (top) and fine-tuning (bottom) after 1000 epochs.


The integration of AI technologies, such as the developed KGML model as a B2B SaaS platform, can empower banks to facilitate sustainable farming practices and support the transition to a low-carbon economy.

The solution offers remote sensing insights on farm-level N2O emissions. Through the utilization of satellite imagery, drones, and other remote sensing tools, the platform collects comprehensive data on agricultural activities, enabling banks to gain valuable insights into emissions hotspots and identify opportunities for emission reduction. The AI-powered analysis and visualization capabilities of the solution empower banks to navigate the complexities of sustainable farming by providing them with actionable information to support decision-making and risk assessment.

The KGML model and the B2B SaaS platform would work in tandem to enhance the accuracy of emissions estimation, improve risk assessment, and promote environmentally conscious lending practices. The use of AI in enabling sustainable farming can also extend beyond emissions estimation and risk assessment. AI technologies can be leveraged to optimize resource management, improve crop yield predictions, and support precision agriculture practices. By analyzing vast amounts of data and generating actionable insights, AI empowers farmers to make data-driven decisions, maximize resource efficiency, and minimize environmental impact.


Real World Applications

Risk Assessment

Understanding the emissions profile of agricultural supply chains can help banks identify potential risks from regulatory changes, market shifts, and climate change impacts. By implementing proactive risk mitigation strategies and supporting the transition to low-carbon agricultural systems, banks can contribute to a more resilient and sustainable agricultural sector.

Market Incentives

Transparent information on emissions intensity incentivizes practices that reduce carbon footprints and promote regenerative agriculture. Banks can play a crucial role in driving the shift towards a sustainable and low-carbon agricultural sector by encouraging farmers and agribusinesses to adopt sustainable practices, invest in renewable energy, and implement climate-smart technologies, 

Standardization and Transparency

Addressing the challenges of estimating and reporting agricultural emissions requires collaboration, knowledge sharing, and the integration of emerging technologies. The KGML model can be used to promote transparency in emissions estimates, enabling banks to work together with financial institutions, agricultural stakeholders, and scientific communities to develop standardized methodologies and share best practices. 

Influencing Policy and Regulations

Banks can leverage their insights and data to support the development of policies that incentivize regenerative agricultural practices, promote carbon pricing mechanisms, and facilitate the transition to a low-carbon economy. Through policy influence and advocacy, banks can create an enabling environment for sustainable finance and drive systemic changes in the agricultural sector.

Other Applications of This Model

1. Transportation

Businesses can optimize routes, minimize emissions, and improve public health by knowing more accurate emission levels in logistics and transit planning.

2. Environmental Monitoring

Environmental Protection organizations can use this model to predict air and water quality, track biodiversity changes, and forecast environmental impacts. It can also analyze sensor data to provide insights into pollution levels, habitat degradation, and climate change trends.

3. Energy Production

The model can be used by energy companies to optimize energy generation processes, improve renewable energy system efficiency, and minimize environmental impacts. 

4. Manufacturing

Manufacturers can optimize manufacturing processes, reduce waste, and minimize pollution.

5. Urban Planning

The model can support sustainable urban planning efforts by analyzing population growth, land use, and infrastructure data. It can also predict urbanization’s environmental impact, assess planning policies, and inform decision-making for sustainable development.

Want to work with us too?

media card
Revolutionizing Short-term Traffic Congestion Prediction with Machine Learning
media card
Harnessing AI to Monitor and Optimize Reforestation Efforts in Madagascar
media card
Using AI to Make Supply Chains More Sustainable While Also Saving Costs
media card
A Beginner’s Guide to Exploratory Data Analysis with Python