Flood Risk Assessment Using Analytical Hierarchy Process (AHP) and Machine Learning Models
Step by step case study on Flood Risk Assessment using Analytical Hierarchy Process and Machine Learning. Applied in Togo.

A step by step case study on using analytical hierarchy process (AHP) and machine learning models to help decision makers with flood risk assessment. The models quantify the risk and percentage as to how much damage or destruction has been done to assets like buildings and crops.
This project was hosted with impact-driven startup Finz.
Introduction
Natural disasters are regarded as the most pressing issue that must be addressed on a global, regional, and local level. Climate change may increase the frequency and magnitude of catastrophic events like floods, droughts, and wildfires.
Togo, Africa, is highly vulnerable to these natural calamities. Flooding and drought are common occurrences in the country, having negative socioeconomic consequences for the inhabitants, the environment, and the economy. Floods have been extremely devastating in recent years, wrecking infrastructure and destroying cultivated land.
While excessive rainfall is the primary cause of flooding, there are numerous other factors that contribute to flooding which includes deforestation, land degradation, rapid population growth, urbanization, poor land use planning and inadequate drainage and discharge management.

Figure 2: Monthly prediction of temperature and Precipitation [1]
Analytical Hierarchy Process (AHP)Â
The Analytical Hierarchy Process (AHP) model has been built to identify and map areas of high flood risk in Togo. AHP is a multi-criteria decision-making method which integrates several features/conditioning factors like drainage density, slope, type of soil, precipitation, population density, Euclidean distance and land use to map flood risk. Vulnerability map and hazard map has been generated from various factors.
Hazard map
- Drainage density (D): – Drainage density is the length of all channels within the basin divided by the area of the basin.
- Drainage Density =Â Â Length of all channels / Area of basin
- If the drainage network is dense at any area, it is a good indicator that the area is more likely to get flooded as it would have a high flow accumulation path.
- Precipitation (Isohyet ): It is a major determining factor while creating hazard maps.
- Slope – Slope is one of the important conditioning factors in floods. The danger from floods increases as the slope increases.
- Soil type – The type of soil and the texture are very important factors in determining the infiltration and water holding capacity of an area which affects flood susceptibility. The runoff from intense rainfall is likely to be more rapid with clay soils than with sand.
Vulnerability map
- Euclidean distance- Areas located close to the main channel and flow accumulation path are more likely to get flooded
- Land use land cover: The amount and type of vegetation, which reflects its use, environment, cultivation, and seasonal phenology, is used to classify the landscape.
- Population density (PD): Rapid population growth demands severe land use change/uncontrolled urbanization
Analytical Hierarchy Process uses hierarchical structures to represent a problem and then develop priorities for alternatives based on user judgement (Saaty, 1980). The process consists of the following steps:
- Break down the problem into its component factors
- Develop the hierarchy
- Develop the paired comparison matrix based on subjective judgements
- Calculate the relative weights of each criterion
- Check consistency of subjective judgement
The AHP process is broadly divided into the following steps:
- Data collection
- Data pre-processing
- AHP modelling

Figure 3: AHP Pipeline [2]
Data CollectionÂ
The following datasets were used to create the vulnerability and hazard maps.
- Country boundary shape file is downloaded from DIVA GIS website [3]
- Digital Elevation Model(DEM) is generated from Advanced Land Observing Satellite (ALOS) PULSAR, ALOS Global Digital Surface Model ” ALOS World 3D” which has a resolution of 30 m. [4]
- Land Use Land Cover (LULC) is generated from the Copernicus Global Land Service website. [5]
- Precipitation data is generated from University of Californiaâs (UCI) Centre for Hydro-meteorology and Remote Sensing (CHRS) website. [6]
- Population density data is generated from Facebookâs Data for Good program [7]
- Soil map is downloaded from Food and Agriculture Organization of United Nations (FAO) website [8]
- Stream network shape-file is downloaded from Stanford University [9]

Figure 4: Data collected from various sources [4] [9] [7] [6] [5]
Data Pre-processing
Data pre-processing includes:
- Generating the layers from collected data: The layers are generated using QGIS/ArcGIS
The slope map is created from DEM and Euclidean distance and drainage density map is created from the river network.

Figure 5: Maps generated from collected data [2]
- Reclassifying layers
The generated layers cannot be used for further analysis because they are defined in different units. They must be classified and converted to the same units.
AHP modelling
Creating hierarchy:
In AHP, there are different levels set up as a hierarchy:
- Level 0: main objective, which in our case is the flood risk map
- Level 1: Different Criteria which are hazard map and vulnerability map
- Level 2: Elements (Parameters) to be considered in each criterion. We try to measure their influence on the criteria

Figure 6: AHP hierarchy [10]
Pairwise Comparison Matrix:
- Generating pairwise comparison matrix and checking consistency ratio
For each criterion, a pairwise comparison matrix is created. The scores to be used in the matrix are based on the Saaty scale (Saaty 1980) as shown below:
Scale | Meaning |
1 | Equally important |
3 | Moderately important |
5 | Important |
7 | Very strongly important |
9 | Extremely important |
2, 4, 6, 8 | Intermediate values between adjacent scales |
For every pair in the hazard comparison matrix, the better option is assigned between 1 (equally good) and 9 (better), while the other option is assigned the reciprocal of the value. For example, for the pair D (Row)-ST(Column), we assign a value of 3 while for the pair ST(Column)-R(Row), we assign a value of â . Applying this operation for each pair gives the matrix:
D | ST | S | P | |
D | 1 | 3 | 1/3 | 1/5 |
ST | 1/3 | 1 | 1/3 | 1/5 |
S | 3 | 3 | 1 | 1/3 |
P | 5 | 5 | 3 | 1 |
Please note the values used are based on literature.
Then, for each row the eigenvector Vp is determined using the formula below:
Vp = (W1 X ⌠X Wk)1/k
Vp = eigenvector, Wk = element, k = number of elements
We then get the following:
D | ST | S | P | Vp | |
D | 1 | 3 | 1/3 | 1/5 | 0.67 |
ST | 1/3 | 1 | 1/3 | 1/5 | 0.39 |
S | 3 | 3 | 1 | 1/3 | 1.32 |
P | 5 | 5 | 3 | 1 | 2.94 |
We then calculate the weighting coefficients Cp using the equation below:
Cp = Vp / (Vp1 + âŚ. + Vpk)
The sum of Cp of all the parameters must equal to 1. We then get the following:
D | ST | S | P | Vp | Cp | |
D | 1 | 3 | 1/3 | 1/5 | 0.67 | 0.13 |
ST | 1/3 | 1 | 1/3 | 1/5 | 0.39 | 0.07 |
S | 3 | 3 | 1 | 1/3 | 1.32 | 0.25 |
P | 5 | 5 | 3 | 1 | 2.94 | 0.55 |
Sum | 5.32 | 1 |
Check Consistency:
Now that we have our weights, we need to check if the weights are correct. In other words, we need to check whether the scores we assigned to the pairwise comparison matrix based on our subjective judgement are acceptable.
We create a matrix, let’s call it A3, by doing matrix multiplication on the pairwise matrix (which is a 4×4 matrix) and the weights matrix (which is a 4×1 matrix).We then create another matrix, call it A4, by dividing every value above by the corresponding weight. For example, for row âDâ, we will divide 2.87/0.67. We get the following matrix:
D | 4.29 |
ST | 4.21 |
S | 4.15 |
P | 4.15 |
We then average the above values to get 4.1975. This value is known as the maximum Eigenvalue (Ćmax).
We then calculate the consistency index (CI) using the formula:
CI = (Ćmax – k)/(k-1), k=number of parameters
CI = (4.1975-4)/(4-1) = 0.066
We then determine the consistency ratio (CR) by the formula:
CR = CI/RI, RI = random index
The random index value is taken from the following table:
(Saaty, 1980)
Number of parameters | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
RI | 0 | 0 | 0.58 | 0.9 | 1.12 | 1.24 | 1.32 | 1.41 | 1.45 | 1.49 |
CR = 0.066/0.9 = 0.073
If the value of the Consistency Ratio is less than or equal to 10%, the weights are acceptable. If the value is greater than 10%, we need to revise our subjective judgment.
- Generating vulnerability and hazard map
AHP Hazard Map: Hazard is defined as a natural and man-made phenomenon that occurs with intensity that can cause harm due to a stream overflow.
The hazard map can identify all regions that are at risk of flooding. The spatial extent and possibly vulnerable locations to climatic threats that can induce floods are mapped by combining conditioning factors. Different weights are assigned to determine hazard.
We can calculate the hazard map using the formula:
Hazard index = 0.13*D + 0.07*ST + 0.25*S + 0.55*P

Figure 7: AHP workflow for generating hazard map [2]
AHP vulnerability map
Vulnerability represents the extent of expected repercussions of a natural phenomenon, while risk is the most important component of vulnerability because it decides whether or not someone is exposed to a hazard.
Flood vulnerability mapping is the process of determining a given areaâs flooding susceptibility and exposure.
Using the same process, we used for generating the hazard map, we calculate the weights of the vulnerability map and use the given formula to get the vulnerability map:
Vulnerability index = 0.26*PD + 0.64*LULC + 0.1*ED

Figure 8: AHP workflow for generating vulnerability map [2]
Flood risk map
The flood risk map is a combination of hazard map and vulnerability map.
Flood risk = Hazard index * Vulnerability index

Figure 9: AHP workflow for generating flood risk map [2]

Figure 10: Hazard, vulnerability and flood risk map [2]

Figure 11: Feature importance or heatmap for machine learning models [2]
- Linear regression model
- Decision tree
- Random forest
- XGboost
- Neural network
- Ensemble model
Model validation is done using the ROC curve. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
- True Positive Rate
- False Positive Rate

Figure 12: ROC curves comparisons of Machine Learning Models [2]


Figure 13: Confusion matrix for ensemble and Xgboost models [2]
This shows that the ensemble model performed best considering the time factor.
Comparison between AHP and Machine Learning models:

Figure 14: Flood risk map created using machine learning model [2]

Figure 15: Flood risk map created using AHP model [2]
Conclusion
While generating the hazard map, the team found out that precipitation is the most dominating factor. The hazard map shows how much the region is prone to floods. The regions in red indicate that they are more prone to flood because of high precipitation in that region. The regions in green are very less prone to floods because of low precipitation.
While generating vulnerability maps, land use land cover maps were given highest weight-age. When we combined hazard map and vulnerability map to generate flood risk map, the most dominating factors were precipitation, land use, land cover and population density.
The study shows that stringent action needs to be taken. There is a need for proper land use planning, drainage and discharge management is necessary in order to mitigate flood risk.
These risk maps can be further improved by adding more relevant information like flow accumulation and lithology etc. There is a wide spectrum of research opportunities available in which AHP modelling could be applied.
Alternatively, the AHP model could be used for target countries. And a machine learning model to create a risk score for the surrounding areas without having to create an AHP model. Using a machine learning model directly on the input of the AHP model reduces the computational steps to create a risk score. Expert Knowledge can be used to set up a regional AHP model to refine the scoring for the areas where the machine learning model is not estimating credible scores.
References
- worldbank.org. 2021. World Bank Climate Change Knowledge Portal. [online] Available at: https://climateknowledgeportal.worldbank.org/country/togo/climate-data-historical [Accessed 26 September 2021].
- diva-gis.org. 2021. Download data by country | DIVA-GIS. [online] Available at: http://diva-gis.org/gdata [Accessed 26 September 2021].
- eorc.jaxa.jp/ALOS/ 2021 Advanced Land Observing Satellite[online] Available at: https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm [Accessed 26 September 2021].
- land.copernicus.eu/global/products/lc 2021. Copernicus Global Land Service[online] Available at: https://land.copernicus.eu/en/products/global-dynamic-land-cover [Accessed 26 September 2021].
- chrsdata.eng.uci.edu/ 2021. CHRS Data Portal Service[online] Available at: https://chrsdata.eng.uci.edu/Â [Accessed 26 September 2021].
- Data.humdata.org 2021 HDX Facebook data Service [online] Available at: https://data.humdata.org/Â [Accessed 26 September 2021].
- fao.org/ 2021 Food and Agriculture Organization of United Nations Service[online] Available at: http://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/faounesco-soil-map-of-the-world/en/Â [Accessed 26 September 2021].
- maps.princeton.edu/ 2021 Princeton University Library Digital Maps and Geospatial Data [online] Available at: https://maps.princeton.edu/catalog/stanford-jr133wm5800Â [Accessed 26 September 2021].
- https://geoenvironmental-disasters.springeropen.com/articles/10.1186/s40677-016-0044-y
- MLjar library: https://mljar.com/automl/