Bayesian Network Integration with GIS for the Analysis of Areas Vulnerable to the Outbreak of COVID-19 in Bangkok, Thailand

B. Klanreungsang
W. Suppawimut

The COVID-19 pandemic prompted a search for a new method of preventing the spread of this virus. This study established a model of the areas in Bangkok which were vulnerable to the COVID-19 pandemic by using a combination of the Bayesian network (BN) and the geographic information system (GIS). The model was developed using a data-driven approach and was evaluated with 10-fold cross validation and ROC analysis. The results demonstrated that the proposed method effectively predicted the vulnerability of disease outbreak. The most vulnerable areas to the pandemic were around the center and in the west of Bangkok, while the areas of low vulnerability were found in the north and east of the city. Population density and the aerosol index were highly influential factors in the outbreaks, affirmed by sensitivity analysis. Furthermore, the model used to conduct a scenario analysis resulted in the identification of vulnerability management strategies.


 


Keywords: COVID-19, Vulnerable disease area, Bayesian network, Geographic information system

Bayesian Network Integration with GIS for the Analysis of Areas Vulnerable to the Outbreak of COVID-19 in Bangkok, Thailand

Klanreungsang, B.1 and Suppawimut, W.2*

Geography and Geoinformatics, Faculty of Humanities and Social Sciences, Chiang Mai RajabhatUniversity, Thailand, E-mail: baromasak_kla@g.cmru.ac.th1, worawit_sup@cmru.ac.th2*

*Corresponding Author



Abstract

The COVID-19 pandemic prompted a search for a new method of preventing the spread of this virus. This study established a model of the areas in Bangkok which were vulnerable to the COVID-19 pandemic by using a combination of the Bayesian network (BN) and the geographic information system (GIS). The model was developed using a data-driven approach and was evaluated with 10-fold cross validation and ROC analysis. The results demonstrated that the proposed method effectively predicted the vulnerability of disease outbreak. The most vulnerable areas to the pandemic were around the center and in the west of Bangkok, while the areas of low vulnerability were found in the north and east of the city. Population density and the aerosol index were highly influential factors in the outbreaks, affirmed by sensitivity analysis. Furthermore, the model used to conduct a scenario analysis resulted in the identification of vulnerability management strategies.

Keywords: COVID-19, Vulnerable disease area, Bayesian network, Geographic information systems

1. Introduction

The coronavirus disease of 2019, also known as COVID-19, was an infection of the SARS-CoV-2 virus that causes respiratory and gastrointestinal symptoms. Since the disease can be transmitted directly from person to person, it was categorized as an emerging disorder in the pandemic. Such transmission allows it to spread quickly across geographic boundaries (Mourmouris et al., 2021). Many countries in the world had to set up restrictions in order to limit the outbreak by enforcing lockdowns (Naveen and Gurtoo, 2022). These regulations and mandates have had very serious effects on world trade and the global economy because of border closures and travel restrictions. (Mouratidis, 2021). Additionally, the pandemic also greatly affected the daily lives of the world’s citizens. People of all ages needed to rapidly adapt to the changes, whether they were working from home, learning online, social distancing or adopting the new behavior that was called the “new normal” (Barbour et al., 2021and Tomikawa et al., 2021).

Thailand’s first COVID-19 case was detected at the beginning of 2020. It was contracted by a Chinese tourist who arrived in Thailand from Wuhan, China. Afterwards the outbreak expanded to Bangkok where it occurred in clusters such as boxing arenas and entertainment places (Tantrakarnapa and Bhopdhornangkul, 2020). The disease then spread throughout the country until the Thai government declared COVID-19 to be the 14th dangerous communicable disease according to the Communicable Disease Act 2020 on February 26, 2020 (Ministry of Public Health Notification Regarding of the Names and Significant Symptoms of Dangerous Communicable Disease (No.3) 2020, 2020). According to a report by WHO (2021) on the 31st October 2021, Thailand was ranked 24th on the number of cases worldwide with an overall total of 1,912,024 patients. The area with the largest number of patients was Bangkok with 332,236 patients. This was equivalent to 17.38 percent of the number of patients in the country as a whole ((DGA, 2021).

An assessment of spatial spread or the areas vulnerable to the disease, based on the exposure of communities to various georeferenced factors, was an essential basis for an analysis of the pandemic.

The most well-used and acceptable method was to simulate a geographic information system (GIS) model in order to obtain a simplified representation of what happens in the real world for the purpose of predicting the possibility or severity of the outbreak of the disease (Jerrett et al., 2010 and Khashoggi and Murad, 2020). A weighted scoring method is commonly used for estimating the vulnerability index of a disease. Also, it is necessary to rely on the opinion of experts or the existing domain knowledge to distinguish the criteria (Malczewski, 2000). In addition, the analytic hierarchy process (AHP) has been used to assist in systematic weighting decisions in a spatial model of the COVID-19 pandemic (Mahato et al., 2020, Rahman et al., 2021 and Shadeed and Alawna, 2021). However, the results may vary because of the complex decision making involved and they cannot be assumed to be completely correct (Malczewski, 2006 and (Feizizadeh et al., 2014).

In order to avoid such problems, a Bayesian network (BN), a causal probabilistic model that can compute the likelihood of the spread of COVID-19 based on expert determination and the assessment of observation data, had been used in research (Wei et al., 2020). For example, studies of the possibility of not contracting or contracting mild or severe the COVID-19 virus depended on patients’ symptoms, background, and previous illnesses (Alsuwat et al., 2021 and Wu et al., 2021). It was also necessary to estimate the occurrence rate of COVID-19 infections and the fatality rates using disease statistics (Neil et al., 2020). This study, therefore, exploited spatial elements, as well as the integration of BN and GIS, to bridge the existing gap in the current methodology. BN assisted in establishing relevant geographical links and estimated the possibility of outbreaks by learning from empirical data, while GIS was used to manage spatial variables and to define a spatially explicit model.

All of these factors raise the question of whether BN and GIS can be utilized to conceptualize the interaction of spatial indicators related to COVID-19 transmission and which locations in Bangkok could be affected by the pandemic. The objective of this research is to establish a model of the area vulnerable to the spread of COVID-19 using BN with GIS. A vulnerable area refers to the degree of susceptibility of an area to the infection of COVID-19 due to physical and socioeconomic factors. This research identifies the characteristics of the spread of COVID-19 in terms of the affected areas, which can then be used as data to support the planning of the health services, surveillance operations, and proper disease control procedures for each area, which should result in improvements in the health of the population, the economy, and people’s quality of life.

2. Study Area

The capital city of Thailand, Bangkok, was selected as the research area. It is located between 13° 26′ and 13° 57′ northern latitude and 100° 19′ and 100° 58′ eastern latitude. As illustrated in Figure 1, the total area of the study is approximately 1,570 square kilometers, which is subdivided into 50 different administrative regions. The reason Bangkok was used for this research was because it was the city with the highest number of detected COVID-19 patients. Moreover, Bangkok was the first city to identify a COVID-19 cluster which then spread all over the country. In this investigation, the number of people infected with COVID-19 was determined by the number of confirmed cases. A total of 332,236 persons were detected between January 12, 2020, and October 31, 2021, which was the period from the beginning to the third stage of the outbreak in Bangkok.

3. Material and Methods

3.1 Data

The data used in this study were obtained from various sources, including the number of confirmed COVID-19 cases, classified by district from the Open Government Data Center, Digital Government Development Agency (Public Organization). Population statistics, the number of elderly people, the poverty rate, and crowded places are all from part of the socioeconomic data. The data was acquired from Bangkok’s Office of Strategy and Evaluation. The aerosol index was calculated using data from the European Space Agency’s Sentinel-5 satellite. The land use of the study area was obtained from the Policy and Land Use Planning Division, Department of Land Development, Bangkok. The data about building locations and the transportation network was obtained from the Department of City Planning in Bangkok, using a geographic information system database with a scale of 1:4,000.

3.2 Conceptual Framework

The vulnerability of Bangkok to the COVID-19 pandemic was assessed using BN and GIS in this study. The study’s framework is depicted in Figure 2. Modeling in GIS was done on a grid dataset, each cell of which corresponds to an independent individual that has predictive indicators and a target variable. A table of attributes can be extracted from a construct BN model.

Figure 1 Geographical location of the study area

Figure 2 Conceptual framework

In BN, a structural graph was used to show the potential relationships between factors and parameter learning was used to estimate the conditional probability distribution among the connected variables. The BN model was then used to establish causal inference to obtain COVID-19 pandemic probability values for each grid square, allowing GIS to develop a model reflecting COVID-19 pandemic vulnerability.

In this study, the independent variables included environmental characteristics (aerosol index, land use, building density, and transportation network density) and socioeconomic aspects (population density, density of the elderly, poverty rate, and the density of crowded places which is where social gatherings occurred). The COVID-19 morbidity rate was one of the dependent variables, and it was utilized for both training and testing data to evaluate the model’s performance. The research tools used in the study include: GeNIe 3.0.R2 (BayeFusion, 2020), graphic network interface software, was used to perform the BN model operations and ArcGIS 10.8 ((ESRI, 2019), GIS software, was applied for processing geospatial data and mapping.

3.3 Methodology

3.3.1 Morbidity rate

The morbidity rate was used to determine the proportion of people in a specific geographical location who suffered from a particular disease during a specific period of time. It indicates the frequency of COVID-19 found in the population. The COVID-19 morbidity rate per 1,000 persons in Bangkok and each district was calculated using the Equation (1) (Robert and Thomas, 2020).

Eqaution 1

Where M is the COVID-19 morbidity rate per 1,000 persons, C is the number of existing cases of COVID-19 during a time period, and P is the population during the same period of time. The data was then represented using a choropleth map in GIS. High morbidity of COVID-19 indicates a high prevalence or prolonged survival without cure or both. On the other hand, low morbidity suggests a low incidence, a rapid fatal process, or a quick recovery.

3.3.2 Bayesian network

A BN is a combination of a graph theory and probability theory. A comprehensive BN, as shown in Equation (2), has both a qualitative and quantitative component (Li et al., 2018).

Equation 2

Where G stands for BN structure, which is a directed acyclic graph (DAG) with a set of nodes corresponding to each variable in X and a set of direct edges linking the nodes. In particular, an edge from node Xi to node Xj represents a statistical dependence between the corresponding variables. Node Xi is referred to as a parent of, and Xj, is called the child of Xi. As a result, variable Xi influences variable Xj. Whereas Θ denote a set of parameters which quantify the network while the parameters of BN are condensed into conditional probability tables (CPTs), which contain the probability distribution of each node for each combination of its parent nodes. The CPT of node Xi is expressed as {Pi= Pi ( Xi | pa (Xi))}, which displays the mutual relationship between Xi and its parent nodes.

3.3.3 The Development of the BN-GIS Vulnerability Model

This study defined a spatially probabilistic model of the COVID-19 pandemic, which consists of five procedures: (1) identification of the factors that influence the spread of COVID-19; (2) development of the BN topology; (3) construction of a BN prediction model; (4) evaluation of the BN model performance and (5) Inferencing a posterior distribution to the map.

(1) Identification of the Factors: To identify the main factors which directly or indirectly affect the COVID-19 infection occurrence, the literature was reviewed to gather ideas on the recent impact of the pandemic. According to the findings, there are eight possible factors.

Aerosol Index: The aerosol index refers to the concentration of suspended aerosols in the atmosphere, which will act as a carrier for the COVID-19 virus to condense or attach, potentially leading to a virus buildup (Hassan et al., 2020 and Sahu et al., 2021). Furthermore, the aerosol index may cause a reduced immune response, making it easier for viruses to penetrate and reproduce (Conticini et al., 2020, Martelletti and Martelletti, 2020 and Zhang et al., 2021).

Land Use: Land use is one component linked to COVID-19 exposure. Different land uses may increase in unequal infections (Huang et al., 2020 and Li et al., 2020). Built-up areas have a high epidemic potential, a high industrial potential, a moderate agricultural potential, a low wetland potential, and a shallow open space potential, according to Kanga et al., (2021).

Building Density: Building density is the ratio between the total gross floor area and the parcel area upon which the building is located. It positively impacts the propagation of COVID-19 at the community level. Building areas of a higher density have experienced a faster spread of COVID-19 (Choerunnisa et al., 2020 and Mokhtari and Jahangir, 2021). The building density was calculated using the Kernel density approach in this research.

Transportation Network Density: The total length of streets and transit networks for each area was calculated by multiplying the location of the respective site by the Kernel density. This can be linked to a greater risk of contact, exposure, and interaction between people inside and outside the area, all of which can lead to an easier epidemic spread (Mollalo et al., 2020, Kwok et al., 2021 and Malakar, 2021).

Population Density : Population density is the average number of individuals in a population per unit of a district. It has a significant effect on the spread of COVID-19. According to several studies, the more densely populated an area, the greater the risk of disease transmission (Ganasegeran et al., 2021 and Ilardi et al., 2021).

Density of the Elderly: Old persons, classified as those aged 60 or above, are at higher risk of contracting COVID-19 and dying due to their weakened physical condition, which causes their immunity system to deteriorate. The higher the density of older adults, the greater the risk of virus diffusion is expected to be (Ali et al., 2020 and (Cutrini and Salvati, 2021).

Poverty Rate: The poverty rate is defined as the percentage of people living below the upper poverty line as a percentage of the total district population. The majority of low-income earners live in unstable homes with insanitary conditions. Inequality of access to basic services is another barrier that raises the probability of COVID-19 infection and leads to poor outcomes (Pourghasemi et al., 2020 and Kianfar and Mesgari, 2022).

Density of Crowded Places: COVID-19 can quickly spread in situations where many people assemble to undertake activities or come into contact with the same objects, such as public handrails, doorknobs, coins or bank notes (Ramadan and Ramadan, 2021 and Razavi-Termeh et al., 2021). This study looked at the Kernel density of transit stations, markets, temples, churches, mosques, and petrol stations.

In GIS, all the factors were integrated to create an aggregated raster used to insert attribute data into the BN learning process in steps (2) and (3).

(2) Development of BN Topology: BN topology depicts the potential relationships between the factors influencing the COVID-19 pandemic through a DAG. The aforementioned factors were regarded as nodes in the BN structure graph with the edges that show the relationships among the different nodes. In this study, BN topology was determined by the COVID-19 morbidity data, called structural learning. The greedy thick thinning (GTT) algorithm was selected to evaluate if there should be a connection between two nodes based on a conditional independence test. It has been tested several times until it is recognized as a highly efficient algorithm ((Kelangath et al., 2012, (Ruangudomsakul et al., 2018 and (Fan et al., 2019). The formula is Equation (3).

Eqaution 3

Where is a set of nodes, and and are two nodes that can be deemed independent when is less than a threshold, implying no connections between them; otherwise, a connection exists.

(3) Construction of the BN Prediction Model: The BN model was developed to predict the probability of the spreading of COVID-19. The parameter Θ, or CPT, given the G obtained from structural learning, is estimated using the expectation-maximization (EM) algorithm, which was widely adopted in previous studies (Ruggieri et al., 2020 and Huang et al., 2021). The maximum posterior probabilities are the output. It is determined by Equation (4).

Eqaution 4

Where is the posterior probability of an event with evidence , is the prior probability of , and the denominator is the probability of evidence .

(4) Evaluation of the BN Model Performance: The developed model’s prediction ability was verified using k-fold cross validation. This procedure randomly splits the entire dataset into k groups, called folds. Each group was considered the validation set of model building, while the remaining groups were treated as a training dataset for evaluating the built model. This process was repeated k times, and the model’s performance was then summarized by accuracy and F-1 score. The commonly used value for k in such a case is 10 (Marcot and Hanea, 2021).

Moreover, the Receiver Operating Characteristics (ROC) curve was carried out to measure the overall efficiency of the model. It is a graphical presentation of sensitivity versus 1 – specificity as the threshold varies. The area under the ROC curve is referred to as Area Under the Curve (AUC), which values range from 0 to 1. When AUC is equal to 1, it indicates the best performance. However, when AUC is below 0.5, the model performs no better than random guesses (Wixted et al., 2017).

(5) Inferencing Posterior Distribution to Map: This step involved importing the posterior probability computed from the BN model into GIS. These metrics reflected the likelihood or risks of COVID-19 exposure in the presence of certain variables. The result was the map depicting the degree of COVID-19 vulnerability in places where the pandemic could occur in each grid cell of input data.

3.3.4 Model Application

(1) Sensitivity Analysis: A sensitivity analysis was performed to measure the sensitivity of the changes in the posterior probabilities pf query nodes when the parameters and input changed. Let x be a probability parameter, y be a query, and e be evidence entered into the BN model. The posterior probability is a fraction of two linear functions of x as follows:

Eqaution 5

Then the partial derivative of on x can be expressed as:

Equation 6

When calculating the value of x in Equation (6), the result of the sensitivity value of query y will be at x given e (Wang et al., 2002). The higher the sensitivity value obtained, the more significantly the parameters affect the COVID-19 outbreak.

(2) Scenario Analysis: Scenario analysis is a process of examining and evaluating possible events that could take place in the future. It also predicts various feasible results or possible outcomes of the scenario. Using the BN model is a way to select state combinations for the relevant nodes, and it leaves all other nodes in the default state. In this study, scenarios were designed specially from nodes processed by the sensitivity analysis and found to have a high impact on disease vulnerability.

4. Results

4.1 The Morbidity of COVID-19

After analyzing the overall COVID-19 morbidity rate, the result showed that Bangkok’s rate reached 59.78 per 1,000 persons. When this result was analyzed per district via a choropleth map, and separated into four equal interval levels, it was revealed that the highest morbidity rate occurred in the inner area, as shown in Figure 3. The highest number was in Pathum Wan district, which recorded 146.89 per 1,000 persons, followed by the Ratchathewi area, with 120.40 per 1,000 persons, the Pom Prap Sattru Phai area, with 117.51 per 1,000 persons, and the Samphanthawong area, with 114.14 per 1,000 persons, respectively. The lowest morbidity rate was in Saphan Sung district with only 23.27 per 1,000 persons and the Bueng Kum area with 28.80 per 1,000 persons, respectively.

4.2 Processing of Spatial Datasets

The value of seven quantitative variables was divided into four levels: low, moderate, high, and very high. They were classified automatically using equal interval levels, as shown in Table 1. Whereas land use is naturally grouped into eight categories: (1) agriculture (2) commerce, (3) industry, (4) institutions, (5) forest, (6) open spaces, (7) residential areas, and (8) water.

Figure 3 COVID-19 morbidity rate per 1,000 persons in Bangkok

Table 1 Classification of different quantitative vulnerability factors

Factors

Class es

Low

Moderate

High

Very High

Aerosol Index
(no unit)

< -0.41

-0.41 – -0.23

-0.22 – -0.04

> -0.04

Building Density

(unit per square kilometer)

< 2,406

2,406 – 4,812

4,813 – 7,219

> 7,219

Transportation Network Density

(kilometer per square kilometer)

< 8

8 – 16

17 – 25

> 25

Population Density

(person per square kilometer)

< 6,194

6,194 – 11,634

11,635 – 17,075

> 17,075

Density of Elderly

(person per square kilometer)

< 1,600

1,600 – 3,110

3,111 – 4,621

> 4,621

Poverty Rate

(percent)

< 1.88

1.88 – 2.82

2.83 – 3.77

> 3.77

Density of Crowded Places

(unit per square kilometer)

< 1.50

1.51 – 3.00

3.01 – 4.50

> 4.50

All factors were converted into raster, and the city of Bangkok was divided into 39,250 grids with a size of 200m x 200m as presented in Figure 4.

4.3 BN Prediction Model

The structure of the BN model was computed using the structure learning process elaborated above, and it is illustrated in Figure 5. Each node in the network represents one factor and the arrows depict the connections among different factors. The states of these nodes are portrayed by bars, and the percentage of the bar reflects the degree of vulnerability of each state. For example, the node for the density of the elderly is related to the vulnerability classification mentioned in Table 1, in which “low” means that density of the elderly is below 1,600 persons per square kilometer and its probability is 19 percent; “moderate” indicates that density of the elderly is between 1,600 and 3,110 persons per square kilometer with a probability of 23%; more than two fifths of the density of the elderly is between 3,111 and 4,621 persons per square kilometer which is represented using the “high” state, and less than a fifth is over 4,621 persons per square kilometer which is classified as “very high”.

Figure 4 Factors influencing the spread of COVID-19 in Bangkok, (a) Aerosol index; (b) land use, (c) Building density; (d) Transportation network density; (e) Population density; (f) Density of the elderly; (g) Poverty rate; (h) Density of crowded places

Figure 5 BN model for predicting vulnerability to the COVID-19 pandemic

This network describes all the eight factors which are relevant to COVID-19 morbidity in Bangkok of which six factors are directly connected to morbidity i.e. the aerosol index, transportation network density, population density, density of the elderly, poverty rate, and density of crowded places. Two factors, land use and building density, have an indirect effect on COVID-19 morbidity as a result of other parameters that are directly related to morbidity. For instance, land use influences the aerosol index, and the aerosol index affects morbidity, therefore, it can be claimed that land use affects the COVID-19 morbidity by having an effect on the aerosol index which is directly tied to morbidity.

4.4 Combination of Probability Distribution

The posterior probabilities of the COVID-19 pandemic under different combinations of factor levels in Bangkok were processed in the parameter learning procedure. Since there were six factors causally associated with morbidity, and each variable had four alternative levels, the probability distribution of the COVID-19 pandemic generated a total of 46 = 4,096 possible combinations.

Evidence of most of the combinations do not exist on the map, for example, there are no samples of places with a very crowded density, high poverty rate, moderate road density, low aerosol index, low population density, and low density of the elderly. All CPTs of missing combinations of evidence are considered meaningless, because these CPTs are not trained by the samples and any predictions from these combinations are either random or constant. After eliminating the useless CPTs, there were 210 valid combinations of evidence and thus 210 valid CPTs. In Table 2, for example, according to the last column, when the aerosol index at a certain place was very high, the road density moderate, the population density very high, the density of elderly very high, the poverty rate low, and the density of crowded places low, the risk of COVID-19 morbidity being very high was 88.55 percent, high morbidity 11.19 percent, while there was a 0.13 percent risk of moderate morbidity and 0.13 percent of low morbidity.

Table 2 Examples of some CPTs of COVID-19 morbidity

Evidence

Aerosol Index

Moderate

Very High

Very High

Very High

Very High

Transportation Network Density

Low

High

High

Moderate

Moderate

Population Density

Low

Moderate

Moderate

High

Very High

Density of the Elderly

Low

Low

Low

Moderate

Very High

Poverty Rate

High

Moderate

Low

Moderate

Low

Density of Crowded Places

Low

Low

High

Low

Low

Probabilities of different morbidity levels

Low

81.25%

27.92%

0.54%

0.21%

0.13%

Moderate

6.25%

71.94%

98.38%

54.37%

0.13%

High

6.25%

0.07%

0.54%

45.21%

11.19%

Very High

6.25%

0.07%

0.54%

0.21%

88.55%

Table 3 Accuracy and F1 score of 10-fold cross validation

Fold

Accuracy

F1-score

1

0.7942

0.7718

2

0.8457

0.7694

3

0.8624

0.7711

4

0.7932

0.7682

5

0.7844

0.7951

6

0.8260

0.7719

7

0.7962

0.7826

8

0.7846

0.7637

9

0.8521

0.7842

10

0.7849

0.7697

Mean

0.8124

0.7748

Table 4 Classification of the vulnerability of the COVID-19 outbreak in Bangkok

Level

Number of Grids

Area

(square kilometers)

Percentage

Low

21,125

845

53.82

Moderate

16,500

660

42.04

High

1,099

44

2.80

Very High

526

21

1.34

4.5 BN Prediction Model Performance

A 10-fold cross validation method was used to further test the BN model. The results are summarized in Table 3. It can be seen that the accuracies of each fold varied between 0.7844 and 0.8624. The small differences in the performance and training sets suggested that BN model overfitting is minimal. The average accuracy of the model is 0.8124, which means this model is 81.24% accurate in making a correct prediction. The F1 average score equals 0.7748, which demonstrates that it can precisely capture the proportion of COVID-19 cases that were correctly identified. When considering ROC cures according to Figure 6, this model has high overall efficiency with an average AUC of 0.90 with a generally high accuracy, as the AUC score in all levels of morbidity is higher than 0.85. This means that the model can efficiently distinguish between this level and other levels. Therefore, the BN model can perform quite consistently in predicting morbidity.

4.6 Predictive Vulnerable Map

Inferencing the posterior probabilities of the COVID-19 pandemic from the BN model into GIS, the levels of vulnerability of COVID-19 in terms of the pandemic map were predicted as shown in Figure 7, and the predicted values for each vulnerable class are summarized in Table 4. Just over half of the study area was classified as having a low vulnerability, and 42.04 percent as moderately vulnerable, indicating that these areas were less likely to be affected; however, 2.80 percent was determined to be highly vulnerable, with another 1.34 percent as very highly vulnerable.

Figure 6 Roc curve comparing different levels of vulnerability prediction

Figure 7 Model of vulnerability to the COVID-19 outbreak in Bangkok

The COVID-19 vulnerability level in the center of Bangkok was relatively higher than in the surrounding districts, while the northern and eastern sectors, with minimal property, appeared less vulnerable.

4.7 Sensitivity Analysis

According to Figure 8, the sensitivity analysis showed that population density and aerosol index have a much higher contribution in predicting the pandemic vulnerability, especially when compared to variables such as the density of the elderly and crowded places, transportation network density, poverty rate, respectively.

Figure 8 Sensitivity analysis of individual nodes on the probability of the COVID-19 outbreak

Figure 9 Areas vulnerable to the outbreak of COVID-19 under scenario 1 when population

density was at a low level

Figure 10 Areas vulnerable to the outbreak of COVID-19 under scenario 2 when the aerosol index was very high level

4.8 Scenario Analysis

The scenario analysis was employed by taking the factors most influencing the outbreak to design two different scenarios. It aims to identify changes in the probability distribution of the relevant variables. In the current study, the following scenarios were analyzed:

The first was an optimistic scenario, with population density at the lowest level. It was shown in Figure 9, which indicated that if the population density in the area were controlled, the probability of virus outbreaks would be reduced. After applying this scenario, the epidemic’s vulnerability level dropped overall, especially in the inner parts of Bangkok. The normal vulnerability levels in this region were often high, reaching 0.51 percent in Pathum Wan district. Meanwhile, the forecasted COVID-19 vulnerability in other districts remained relatively unchanged.

The second was the pessimistic scenario, where the aerosol index rose very high in simulated situations. This scenario was displayed in Figure 10, which showed that disastrous events would occur when the number of aerosols increased. The entire region recorded an elevated vulnerability level. There was 60.19 percent and 37.01 percent of the areas with moderate and high vulnerability, respectively. Nonetheless, the inner region continued the highest in terms of vulnerability and had recorded a slight increase from previous rates.

5. Discussion

As a means of determining probability inference of the COVID-19 pandemic, BN offers several specific advantages over the traditional approach, especially when operated within the context of GIS. BN supports model construction based on the real-world training dataset without requiring expert opinions. This would reduce bias and avoid inadequate system knowledge issues. When evaluating the performance of the BN prediction model with 10-fold cross validation, it was found that the model had an accuracy of 801.24 percent, an average F1 score of 0.7748, and an aggregate AUC of 0.90. The results indicated that the model has good predictive performance. Therefore, it was proven that the BN model is a feasible and reasonable method of assessing outbreak vulnerability, similar to (Fuster-Parra et al., (2016) and (Ruangudomsakul et al., (2018). Heterogeneous data sources can be integrated into the model to represent the causal probabilistic relationship among a set of random variables in the BN model. This study found that population density is the most influential factor in the COVID-19 outbreak. This study yielded similar findings to the reports of Malaysia (Ganasegeran et al., 2021), Mexico (Benita and Gasca-Sanchez, 2021), Italy (Ilardi et al., 2021) and Indonesia (Widiawaty et al., 2022). However, the findings contradicted studies of outbreaks in China, where population density cannot affect COVID-19 spreading under strict lockdown policies (Sun et al., 2020), and several cities in the USA due to superior health care systems (Hamidi et al., 2020). In addition, aerosols, which act as a carrier for the virus to be transmitted in the air, are also factors that correlate with the COVID-19 outbreak. This result was consistent with Hassan et al Integration of probability approach into GIS allowed the researcher to quantify and visualize uncertainties in a spatially explicit manner and ability to locate vulnerability of COVID-19 outbreak (Figure 7). The study results revealed that central Bangkok was the most vulnerable area, especially in Pathum Wan (2021).

Ratchathewi and Pom Prap Sattru Phai districts. As the distance from the urban center increased, the vulnerability would be reduced. When comparing with the actual morbidity distribution (Figure 3), the results were found to be consistent. This indicated that the analysis models were spatially accurate. The BN model can also update the prediction of the likelihood of the COVID-19 outbreak by simulating scenarios that may occur when the variables are changed. The study results enabled the researchers to propose two approaches to handling the COVID-19 pandemic situation, namely (1) Mitigation strategies - these strategies aimed at controlling population density by minimizing close contact between individuals, such as social distancing measure, which is very effective in reducing outbreaks in many areas (Girum et al., 2021), lockdown, as well as controlling population density via city planning law, such as the allocation of public utilities to be sufficient for population density in each area; (2) Environmental control strategies involve the introduction of a law to control aerosol-generating activities to eliminate transmission disease carriers and review new city planning to improve air circulation in the area. This strategy should be strictly implemented and enforced, especially in the inner-city and southwestern areas, with very high outbreak vulnerability, to suppress the epidemic clusters from spreading to other parts of the area. Nevertheless, one of the limitations of this study is data ability. Some of the potentially valuable factors such as literacy rate and the percentage of people with health insurance were unavailable due to a lack of individual or district level data. The locations of COVID-19 infected patients would also have been beneficial for training and testing data. Unfortunately, this data was not available at the time of undertaking the study owing to privacy concerns.

There is, however, always scope for improvement. Future researchers could apply additional structure learning or parameter learning algorithms to deduce such a COVID-19 vulnerability model by updating the data. More characteristics, such as ethnic variables, healthcare facilities, and other additional indicators, could strengthen the study. The subsequent study could also include the spatial-temporal components or comparisons between places to provide more detailed results.

6. Conclusion

The BN-GIS model was developed in this study to estimate the probability of COVID-19 outbreak vulnerability using an epidemiological dataset. The BN quantifies the impact of causal factors on the probability of contracting the virus based on a 10-fold cross validation and ROC curve test. The case study revealed that the developed BN model could successfully predict the likelihood of an outbreak. Based on the spatial distribution of different vulnerability levels in GIS, it was found that COVID-19 outbreak vulnerability in Bangkok decreased from the central area to the surrounding areas. The sensitivity analysis results revealed that population density and the aerosol index had the highest impact on the COVID-19 outbreak in Bangkok. Furthermore, the model was also used to conduct scenario analysis to evaluate the impact of potential future situations to allow management of the area vulnerability to be more effective. Therefore, the model proved itself to be worthwhile and beneficial for this investigation.

References

Ali, T., Mortula, M. and Sadiq, R., 2020, GIS-based Vulnerability Analysis of the United States to COVID-19 Occurrence. Journal of Risk Research, Vol. 24(3-4), 416-431. https://doi.org/10.1080/13669877.2021.1881991.

Alsuwat, E., Alzahrani, S. and Alsuwat, H., 2021, Detecting COVID-19 Utilizing Probabilistic Graphical Models. International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 12(6). http://dx.doi.org/10.14569/IJACSA.2021.0120692.

Barbour, N., Menon, N. and Mannering, F., 2021, A Statistical Assessment of Work-From-Home Participation During Different Stages of the COVID-19 Pandemic. Transportation Research Interdisciplinary Perspectives, Vol. 11, https://doi.org/10.1016/j.trip.2021.100441.

BayesFusion, 2020, GeNIe 3.0.R.2. Pittsburgh: BayesFusion.

Benita, F. and Gasca-Sanchez, F., 2021, The Main Factors Influencing COVID-19 Spread and Deaths in Mexico: A Comparison between Phase I and II. Applied Geography, Vol. 134, https://doi.org/10.1016/j.apgeog.2021.102523.

Choerunnisa, D. N., Maula, F. K. and Iman, H. K., 2020, The Vulnerability of COVID-19 Pandemic Based on Urban Density (A Case Study of the Core Urban Area in Cirebon City, West Java). IOP Conference Series: Earth and Environmental Science, Vol. 592, https://doi.org/10.1088/1755-1315/592/1/012036.

Conticini, E., Frediani, B. and Caro, D., 2020, Can Atmospheric Pollution be Considered a Co-Factor in Extremely High Level of SARS-CoV-2 Lethality in Northern Italy?. Environmental Pollution, Vol. 261, https://doi.org/10.1016/j.envpol.2020.114465.

Cutrini, E. and Salvati, L., 2021, Unraveling Spatial Patterns of COVID-19 in Italy: Global Forces and Local Economic Drivers.Regional Science Policy & Practice, Vol. 13(S1), 73-108. https://doi.org/10.1111/rsp3.12465.

DGA, 2021, Daily Report of COVID-19 in Thailand. Retrieved from https://data.go.th/en/-dataset/covid-19-daily.

ESRI, 2019, ArcGIS Desktop: Release 10.8. Redlands: Environmental Systems Research Institute.

Fan, L., Zhang, Z., Yin, J. and Wang, X., 2019, The Efficiency Improvement of Port State Control Based on Ship Accident Bayesian Networks. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, Vol. 233(1), 71-83. https://doi.org/10.1177/1748006X18811199.

Feizizadeh, B., Jankowski, P. and Blaschke, T., 2014, A GIS-based Spatially-Explicit Sensitivity and Uncertainty Analysis Approach For Multi-Criteria Decision Analysis. Computer and Geosciences, Vol. 64, 81-95. https://doi.org/10.1016/j.cageo.2013.11.009.

Fuster-Parra, P., Tauler, P., Bennasar-Veny, M., Ligęza, A., López-González, A. A. and Aguiló, A., 2016, A Bayesian Network Modeling: A Case Study of An Epidemiologic System Analysis of Cardiovascular Risk. Computer Methods and Programs in Biomedicine, Vol. 126, 128-142. https://doi.org/10.1016/j.cmpb.2015.12.010.

Ganasegeran, K., Jamil, M. F. A., Ch'ng, A. S. H. and Looi, I., 2021, Influence of Population Density for COVID-19 Spread in Malaysia: An Ecological Study. International Journal of Environmental Research and Public Health, Vol. 18(18), https://doi.org/10.3390/ijerph18189866.

Girum, T., Lentiro, K., Geremew, M., Migora, B., Shewamare, S. and Shimbre, M. S., 2021, Optimal Strategies for COVID-19 Prevention from Global Evidence Achieved through Social Distancing, Stay at Home, Travel Restriction and Lockdown: A Systematic Review. Archives of Public Health, Vol. 79, https://doi.org/10.1186/s13690-021-00663-8.

Hamidi, S., Ewing, R. and Sabouri, S., 2020, Longitudinal Analyses of the Relationship Between Development Density and the COVID-19 Morbidity and Mortality Rates: Early Evidence from 1,165 Metropolitan Countries in the United States. Health and Place, Vol. 64, https://doi.org/10.1016/j.healthplace.2020.102378.

Hassan, M. S., Bhuiyan, M. A. H., Tareq, F., Bodrud-Doza, M., Tanu, S. M. and Rabbani, K. A., 2021, Relationship between COVID-19 Infection Rates and Air Pollution, Geo-Meteorological, and Social Parameters. Environmental Monitoring and Assessment, Vol. 193, https://doi.org/10.1007/s10661-020-08810-4.

Huang, J., Kwan, M. P., Kan, Z., Wong, M. S., Kwok, C. Y. T. and Yu, X., 2020, Investigating the Relationship between the Built Environment and Relative Risk of COVID-19 in Hong Kong. ISPRS International Journal of Geo-Information, Vol. 9(11), https://doi.org/10.3390/ijgi9110624.

Huang, S., Wang, H., Xu, Y., She, J. and Huang, J., 2021, Key Disaster-Causing Factors Chains on Urban Flood risk Based on Bayesian Network. Land, Vol. 10(2). https://doi.org/10.3390/land10020210.

Ilardi, A., Chieffi, S., Iavarone, A. and Ilardi, C. R., 2021, SARS-CoV-2 in Italy: Population Density Correlates with Morbidity and Mortality. Japanese Journal of Infectious Disease, Vol. 74(1), 61-64. https://doi.org/10.7883/yoken.JJID.2020.200.

Jerrett, M., Gale, S. and Kontgis, C., 2010, Spatial Modeling in Environmental and Public Health Research. International Journal of Environmental Research and Public Health, Vol. 7(4), 1302-1329. https://doi.org/10.3390/ijerph7041302.

Kanga, S., Meraj, G., Farooq, M., Nathawat, M.S. and Singh, S. K., 2021, Analyzing the Risk to COVID-19 Infection Using Remote Sensing and GIS. Risk Analysis, Vol. 41(5), 801-813. https://doi.org/10.3390/ijerph7041302.

Kelangath, S., Das, P. K., Quigley, J. and Hirdaris, S. E., 2012, Risk Analysis of Damaged Ships – A Data-Driven Bayesian Approach. Ship and Offshore Structures, Vol. 7(3), 333-347. https://doi.org/10.1080/17445302.2011.592358.

Khashoggi, B. F. and Murad, A., 2020, Issues of Healthcare Planning and GIS: A REVIEW. ISPRS International Journal of Geo-Information, Vol. 9(6), https://doi.org/10.3390/ijgi9060352.

Kianfar, N. and Mesgari, M. S., 2022, GIS-based Spatio-Temporal Analysis and Modeling of COVID-19 Incidence Rates in Europe. Spatial and Spatio-temporal Epidemiology, Vol. 41, https://doi.org/10.1016/j.sste.2022.100498.

Kwok, C. Y. T., Wong, M. S., Chan, K. L., Kwan, M. P., Nichol, J. E., Liu, C. H., Wong, J. Y. H., Wai, A. K. C., Chan, L. W. C., Xu, Y., Li, H. and Huang, J. and Kan, Z., 2021, Spatial Analysis of the Impact of Urban Geometry and Socio-Demographic Characteristics on COVID-19, A Study in Hong Kong. Science of the Total Environment, Vol. 764, https://doi.org/10.1016/j.scitotenv.2020.144455.

Li, M., Hong, M. and Zhang, R., 2018, Improved Bayesian Network-Based Risk Model and its Application in Disaster Risk Assessment. International Journal of Disaster Risk Science, Vol. 9, 237-248. https://doi.org/10.1007/s13753-018-0171-z.

Li, W., Zhao, S. C., Ji, X. F. and Ma, J. W., 2020, Impact of Traffic Exposure and Land Use Patterns on the Risk of COVID-19 Spread at the Community Level. China Journal of Highway and Transport, Vol. 33(11), 43-54. https://doi.org/10.19721/j.cnki.1001-7372.2020.11.005.

Mahato, R., Bushi, D. and Nimasow, G., 2020, AHP and GIS-Based Risk Zonation of COVID-19 in North East India. Current World Environment, Vol. 15(3), 640-652. http://dx.doi.org/10.12944/CWE.15.3.29.

Malakar, S., 2021, Geospatial Modelling of COVID-19 Vulnerability Using an Integrated Fuzzy MCDM Approach: A Case Study of West Bengal, India. Modeling Earth Systems and Environment, 1-14. https://doi.org/10.1007/s40808-021-01287-1.

Malczewski, J., 2000, On the Use of Weighted Linear Combination Method in GIS: Common and Best Practice Approaches. Transactions in GIS, Vol. 4(1), 5-22. https://doi.org/10.1111/1467-9671.00035.

Malczewski, J., 2006, GIS-based Multicriteria Decision Analysis: A Survey of the Literature. International Journal of Geographical Information Science, Vol. 20(7), 703-726. https://doi.org/10.1080/13658810600661508.

Marcot, B. G. and Hanea, A. M., 2021, What is an Optimal Value of k in K-Fold Cross-Validation in Discrete Bayesian Network Analysis?. Computational Statistics, Vol. 36, 2009-2031. https://doi.org/10.1007/s00180-020-00999-9.

Martelletti, L. and Martelletti, P., 2020, Air Pollution and The Novel COVID-19 Disease: A Putative Disease Risk Factor. SN Comprehensive Clinical Medicine, Vol. 2(4), 383-387. https://doi.org/10.1007/s42399-020-00274-4.

Mokhtari, R. and Jahangir, M. H., 2021, The Effect of Occupant Distribution on Energy Consumption and COVID-19 Infection in Buildings: A Case Study of University Building. Building and Environment, Vol. 190, https://doi.org/10.1016/j.buildenv.2020.107561.

Mollalo, A., Vahedi, B. and Rivera, K. M., 2020, GIS-Based Spatial Modeling of COVID-19 Incidence Rate in the Continental United States. Science of the Total Environment, Vol. 728, https://doi.org/10.1016/j.scitotenv.2020.138884.

Mouratidis, K., 2021, How COVID-19 Reshaped Quality of Life in Cities: A Synthesis and Implications for Urban planning. Land Use Policy, Vol. 111, https://doi.org/10.1016/j.landusepol.2021.105772.

Mourmouris, P., Tzelves, L., Roidi, C. and Fotsali, A., 2021, COVID-19 Transmission: A Rapid Systematic Review of Current Knowledge. OSONG Public Health and Research Perspectives, Vol. 12(2), 54-63. https://doi.org/10.1016/j.landusepol.2021.105772.

Neil, M., Fenton, N., Osman, M. and McLachlan S., 2020, Bayesian Network Analysis of COVID-19 Data Reveals Higher Infection Prevalence Rates and Lower Fatality Rates than Widely Reported. Journal of Risk Research, Vol. 23(7-8), 866-879. https://doi.org/10.1080/13669877.2020.1778771.

Notifications of the Ministry of Public Health Re: Names and Significant Symptoms of Dangerous Communicable Disease (No. 3) 2020. (2020, 29 February). Government Gazette. Vol. 137, Special Section 48 D.

Pourghasemi, H. R., Pouyan, S., Heidari, B., Farajzadeh, Z., Fallah Shamsi, S. R., Babaei, S., Khosravi, R., Etemadi, M., Ghanbarian, G., Farhadi, A., Safaeian, R., Heidari, Z., Tarazkar, M. H., Tiefenbacher, J. P., Azmi, A. and Sadeghian, F., 2020, Spatial Modeling, Risk Mapping, Change Detection, and Outbreak Trend Analysis of Coronavirus (COVID-19) in Iran (days between February 19 and June 14, 2020). International Journal of Infectious Diseases : IJID : Official Publication of the International Society for Infectious Disease , Vol. 98, 90-108. https://doi.org/10.1016/j.ijid.2020.06.058.

Rahman, M. R., Islam, A. H. M. H. and Islam, M. N., 202, Geospatial Modelling on the Spread and Dynamics of 154 Day Outbreak of the Novel Coronavirus (COVID-19) Pandemic in Bangladesh Towards Vulnerability Zoning and Management Approaches. Modeling Earth Systems and Environment, Vol. 7(3), 2059-2087. https://doi.org/10.1007/s40808-020-00962-z.

Ramadan, R. H. and Ramadan, M. S., 2021, Prediction of Highly Vulnerable Areas to COVID-19 Outbreaks Using Spatial Model: Case Study of Cairo Governorate, Egypt. The Egyptian Journal of Remote Sensing and Space Sciences, Vol. 25(1), 233-247. https://doi.org/10-.1016/j.ejrs.2021.08.003.

Razavi-Termeh, S. V., Sadeghi-Niaraki, A., Farhangi, F. and Choi, S. M., 2021, COVID-19 Risk Mapping with Considering Socio-Economic Criteria Using Machine Learning Algorithms. International Journal of Environmental Research and Public Health, Vol. 18(18), 10.3390/ijerph18189657.

Robert, H. F. and Thomas, A. S., 2020, Epidemiology for Public Health Practice. (6th ed). Burlington: Jones & Bartlett Learning.

Ruangudomsakul, C., Duangsin, A., Kerdprasop, K. and Kerdprasop, N., 2018, Application of Remote Sensing Data for Dengue Outbreak Estimation Using Bayesian Network. International Journal of Machine Learning and Computing, Vol. 8(4), 394-398. https://doi.org/10.18178/ijmlc.2018.8.4.718.

Ruggieri, A., Stranieri, F., Stella, F. and Scutari, M., 2020, Hard and Soft EM in Bayesian Network Learning from Incomplete Data. Algorithms, Vol. 13(12), https://doi.org/10.3390/a13120329.

Sahu, S. K., Mangaraj, P., Beig, G., Tyagi, B., Tikle, S. and Vinoj, V., 2021, Establishing a Link Between Fine Particulate Matter (PM2.5 ) Zones and COVID-19 Over India Based on Anthropogenic Emission Sources and Air Quality Data. Urban Climate, Vol. 38, https://doi.org/10.1016/j.uclim.2021.100883.

Shadeed, S. and Alawna, S., 2021, GIS-based COVID-19 Vulnerability Mapping in the West Bank, Palestine. International Journal of Disaster Risk Reduction, Vol. 64, https://doi.org/10.1016/j.ijdrr.2021.102483.

Sun, Z., Zhang, H., Yang, Y., Wan, H. and Wang, Y., 2020, Impacts of Geographic Factors and Population Density on the COVID-19 Spreading Under the Lockdown Policies of China. Science of the Total Environment, Vol. 746, https://doi.org/10.1016/j.scitotenv.2020.141347.

Tantrakarnapa, K. and Bhopdhornangkul, B., 2020, Challenging the Spread of COVID-19 in Thailand. One Health, Vol. 11, https://doi.org/10.1016/j.onehlt.2020.100173.

Tomikawa, S., Niwa, Y., Lim, H. and Kida, M., 2021, The Impact of “COVID-19 Life” on the Tokyo Metropolitan Area Households with Primary School-Aged Children: A Study Based on Spatial Characteristics. Journal of Urban Management, Vol. 10(2), 139-154. https://doi.org/10.1016/j.jum.2021.03.003.

Wang, H., Rish, I. and Ma, S., 2020, Using Sensitivity Analysis For Selective Parameter Update in Bayesian Network Learning. Proceedings of the AAAI Spring Symposium on Information Refinement and Revision for Decision Making: Modeling for Diagnostics, Prognostics and Prediction, 29-36.

Wei, J., Li, Y. and Nie, Y., 2020, A Risk Assessment System of COVID-19 Based On Bayesian Inference. Journal of Physics: Conference Series , Vol. 1634(1), 1-7, https://doi.org/10.1088/1742-6596/1634/1/012084.

WHO, 2021, WHO Coronavirus (COVID-19) Dashboard. Retrieved from :https://covid19.who.int/

Widiawaty, M. A., Lam, K. C., Dede, M. and Asnawi, N. H., 2022, Spatial Differentiation and Determinants of COVID-19 in Indonesia. BMC Public Health, Vol. 22, https://doi.org/10.1186/s12889-022-13316-4.

Wixted, J. T., Mickes, L., Wetmore, S. A., Gronlund, S. D. and Neuschatz, J. S., 2017, ROC Analysis in Theory and Practice. Journal of Applied Research in Memory and Cognition, Vol. 6(3), 343-351. https://doi.org/10.1016/j.jarmac.2016.12.002.

Wu, Y., Foley, D., Ramsay, J., Woodberry, O., Mascaro, S., Nicholson, A. and Snelling, T., 2021, Bridging the Gaps in Test Interpretation of SARS-CoV-2 through Bayesian Network Modelling. Epidemiology and Infection, Vol. 149, 10.1017/S0950268821001357.

Zhang, X., Ji, Z., Yue, Y., Liu, H. and Wang, J., 2021, Infection Risk Assessment of COVID-19 through Aerosol Transmission: A Case Study of South China Seafood Market. Environmental, Science & Technology., Vol. 55(7), 4123-4133. https://doi.org/10.1021/acs.est.0c02895.