International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 389-398
Published online December 25, 2023
https://doi.org/10.5391/IJFIS.2023.23.4.389
© The Korean Institute of Intelligent Systems
Herlawati Herlawati1,2 , Edi Abdurachman1, Yaya Heryadi1, and Haryono Soeparno1
1Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
2Department of Informatics, Universitas Bhayangkara Jakarta Raya, Jakarta, Indonesia
Correspondence to :
Herlawati Herlawati (herlawati@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The availability of land-cover segmentation and classification maps at multiple time frames is crucial for designing spatial and regional planning. At present, remote sensing and geographic information system practitioners rely on object-based image analysis for land-cover segmentation/classification. Although deep learning methods are available, their use remains limited to satellite imagery datasets. DeepLabV3+ and U-Net are popular methods owing to their accuracy and speed. In this study, we propose a method for enhancing the accuracy of DeepLabV3+ to closely match ground truth datasets by integrating the normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), and normalized difference water index (NDWI) on the decoder side to correct land-cover segmentation. Testing of the proposed method in Karawang Regency, West Java, Indonesia demonstrated a 0.3% improvement in accuracy when the NDVI, NDBI, and NDWI were incorporated on the output side of DeepLabV3+.
Keywords: DeepLabV3+, U-Net, Semantic segmentation, Atrous spatial pyramid pooling, Multispectral dataset
Planners rely on satellite imagery to analyze land-cover conditions, which generates large amounts of data that require effective models for processing useful information. Methods include land-cover segmentation, land-cover change prediction, and clustering [1]. Remote sensing and geographic information system (RS-GIS) practitioners have traditionally used statistical methods such as object-based image analysis (OBIA), which are accurate but time-consuming and require skilled users [2, 3]. Therefore, several researchers have developed deep neural network-based methods for image segmentation, such as SegNet [4], U-Net [5], PSPNet [6], and DeepLab [7], which use various basic deep learning (DL) methods including the convolutional neural network (CNN), ResNet, MobileNet, Inception, and Xception. Most of these methods use benchmark datasets that are not obtained from satellite imagery, and most of these are RGB and not multispectral datasets.
Several researchers have attempted to enhance the accuracy and processing speed of deep neural networks by incorporating hybrid methods or by adding specific components to their network layers, as demonstrated in [8–19]. However, most proposed models use benchmark datasets that comprise solely non-satellite imagery and unmanned aerial vehicle (UAV) images, which have limited coverage areas, such as Indian Pines [20, 21], Hamlin Beach State (RIT) Datasets [22], and Eurosat [23]. These datasets may not offer sufficient coverage for urban planners who require data encompassing city/district-wide areas.
Satellite imagery such as Sentinel, Landsat, Terra, and Envisat, that are equipped with remote sensing technology as sensors possess multispectral characteristics with a broader range of frequencies, which is beneficial for segmentation and classification. Composite images comprising various band combinations, such as the normalized difference vegetation index (NDVI), green NDVI (GNDVI), normalized difference built-up index (NDBI), normalized difference water index (NDWI), and optimized soil-adjusted vegetation index (OSAVI), have advantages in detecting certain objects, such as plant diseases [24–26], and are frequently employed by researchers in segmentation. A drone or UAV is required to capture multispectral images, along with data preparation techniques such as orthomosaic and cropping methods.
The integration of DL methods with remote sensing technology offers the potential to enhance land-cover segmentation and classification performance significantly. Therefore, this study employed both DL methods and RS-GIS techniques. A multispectral dataset from Karawang, West Java, Indonesia, which was downloaded from Landsat 8, was used as a case study. Two baseline methods, namely U-Net and DeepLabV3+ [27], were compared with our proposed method. This study contributes to improving the performance of segmentation models by integrating DL models with spectral features; specifically, normalized satellite indices.
This study discusses the current popular models for land-cover segmentation, including U-Net and DeepLabV3+. We describe the process of preparing the datasets for training, validation, and testing and explain the framework for improving land-cover segmentation methods. Subsequently, we evaluate the accuracy of the proposed method using a simple prototype. Finally, the paper concludes with a summary of the experimental results.
The datasets used in this study were Landsat 8 images downloaded from the United States Geological Survey website [28]. The study area selected was around Karawang, West Java, Indonesia (107°02′–107°40′ E and 5°56′ – 6°34′ S) (Figure 1), with a capture date of May 11, 2021. Landsat 8 has a resolution of 30 m and is categorized as Level 1 of the land-cover classification standard, which includes urban land, agricultural, rangeland, forestland, water, barren land, tundra, and ice, according to the Anderson classification system [29]. Three classes were selected for the experiment: urban, vegetation (combined agricultural land and forestland), and water.
Landsat 8 imagery offers 11 bands; however, in this study, only six bands were used as input data: Band 2 (blue), Band 3 (green), Band 4 (red), Band 5 (near infrared), Band 6 (short wave infrared-1), and Band 7 (short infrared wave-2). These six bands were sufficient to meet the requirements for land-cover segmentation and classification [30].
Various geoprocessing tasks were conducted to match the study areas. These tasks included cropping Landsat tiles based on the study area, which is the Karawang District, dissolving them, and converting them into ready-to-use datasets in the MAT-file format. Suitable images were obtained from these tasks (Figure 2). Three datasets were created, namely training, validation, and testing data, which were evenly distributed to represent each segment class. These three datasets were used in the training, validation, and testing phases of the DL model for segmentation.
Two baseline semantic segmentation models, namely U-Net and DeepLabV3+, were used in this study. The tools for the experiment included ArcGIS, TerrSet, and the MATLAB programming language. To evaluate the model, the ground truth of Karawang District was prepared semi-automatically using the OBIA method and validated with local government documents.
Figure 3 shows the encoder and decoder structures of U-Net and DeepLabV3+. The encoder performs feature extraction/downsampling on the input side. In U-Net, the encoder blocks are concatenated with decoders of the same size, whereas in DeepLabV3+, this occurs only at a 4
The proposed method integrates DeepLabV3+ with normalized satellite indices, including NDVI, NDBI, and NDWI. Figure 4 depicts the proposed framework, which involves constructing the DeepLabV3+ model and using three datasets for training, validation, and testing. The U-Net and DeepLabV3+ models were trained and tested using the same datasets for comparison. The proposed method not only performs segmentation, but also generates composite images (NDVI, NDBI, and NDWI) to correct the segmentation result.
The training process is computationally intensive and requires significant resources, particularly the graphics processing unit (GPU). Therefore, the framework is divided into two models: the training and evaluation models. The training process may take several hours, whereas the testing process usually takes only 1 to 5 minutes. The results obtained from the training process are stored and subsequently used in the testing and evaluation processes.
The proposed model generates composite maps, followed by matrix conversion from decimals to integers using a certain threshold limit. Following the improvement step, evaluations are conducted with and without NDVI, NDBI, and NDWI corrections for analysis. The composite map generation step uses an appropriate band to calculate the index. For this purpose, the segmented study area must be equipped with multispectral satellite imagery data, particularly in Bands 2 to 6. In summary, the proposed model performs three activities during the prediction process: land-cover segmentation, composite map generation (NDVI, NDBI, and NDWI), and correction of the segmentation results to obtain the final land-cover segmentation.
The NDVI, NDBI, and NDWI are generated using the corresponding spectral bands based on
The blue, green, red, near-infrared, and short-wave infrared bands are represented based on the Landsat 8 channels of Bands 2, 3, 4, 5, and 6. For other satellites (e.g., Sentinel, Terra, etc.), it is necessary to adjust the band numbering for the same sensor as that of Landsat 8.
After the proposed model produces the final land-cover segmentation and classification, the evaluation stage uses a confusion matrix that compares the prediction results with the ground truth data. The accuracy is calculated using
where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negatives, respectively. Several additional processes are required to perform correction on the initial land-cover segmentation and classification results with the three normalized satellite indices. As normalized satellite indices have a continuous range from −1 to +1, a process is required to convert the NDVI, NDBI, and NDWI composite images into the appropriate data format.
Algorithm 1 was applied to the three indices NDVI, NDBI, and NDWI, which produced three matrices with integer elements. The matrices represent class and non-class segments with values of 1 and 0, respectively. The algorithm allows for an increase or decrease in a class owing to the addition of another segment of the normalized satellite index. The hypothesis of this study is that the use of the NDVI, NDBI, and NDWI can increase the accuracy of identifying vegetation, urban areas, and water.
From the multispectral satellite imagery, each band was cropped according to the study area. The training, validation, and testing datasets (2425
MATLAB was used to perform the training processes for U-Net and DeepLabV3+ [33]. This process requires a powerful GPU with parallel processing capabilities in MATLAB, which can be achieved using four processors known as workers. During the training process, graphs were generated to evaluate the model performance. The models exhibited satisfactory accuracy after two epochs and 2, 000 iterations, as indicated by the generated graphs.
The results of the training process were saved for use in the testing and evaluation processes. The entire study area was used as the testing dataset. The performances of U-Net and DeepLabV3+ were compared with that of the proposed model, which implemented enhancements using the normalized satellite indices (NDVI, NDBI, and NDWI). As shown in Figure 6 and Table 1, the proposed model achieved an accuracy of 94.473% using U-Net, with a processing time of approximately 4 minutes.
Prior to the enhancement process, the normalized satellite indices, which range from −1 to +1, must be converted into integers 0 and 1. The optimal threshold value for the normalized satellite indices was determined for comparison with the proposed approach. Several threshold values were tested and a graph was generated, as shown in Figure 7. Threshold values that resulted in the highest accuracy were used for the proposed model. The threshold values for the NDVI, NDBI, and NDWI were 0.29, 0, and 0.03, respectively.
Another experiment was conducted by creating a graphical user interface (GUI) that compared the performance of DeepLabV3+ with the improved DeepLabV3+ models. Figure 7 illustrates that the incorporation of the NDVI, NDBI, and NDWI enhanced the accuracy of DeepLabV3+ by 0.221%, 0.023%, and 0.022%, respectively, compared with the original accuracy of DeepLabV3+ (95.183%). Furthermore, an even higher accuracy of 95.438% was achieved by combining the NDVI, NDBI, and NDWI.
The trained models can be applied to other areas surrounding the study site with similar geographical conditions (two seasons). However, if there is a change in the segment classes or ground truth data are available, the training process can be repeated. The U-Net and DeepLabV3+ models use existing Landsat 8 bands for the feature extraction; however, these bands do not provide information that is specific to the detection of certain segments such as vegetation, built-up areas, or water. Therefore, normalized satellite indices can be incorporated to improve the accuracy of these specific segments. However, owing to limitations in detecting certain features such as forests, agriculture, and wetlands, a DL model is still necessary. The proposed model is also useful for non-multispectral image segmentation, such that of as RGB images, which is usually faster but less accurate. By incorporating normalized satellite indices, the accuracy of the model can be improved with only three bands (red, green, and blue).
To demonstrate the performance of the proposed model, a prototype was created to perform the land-cover segmentation and classification processes (Figure 8). The prototype demonstrated the accuracy of DeepLabV3+ compared to the proposed model, which involved normalized satellite indices.
The experimental results indicated that DeepLabV3+ was slightly more accurate and faster than U-Net, as shown in Table 1. The number of parameters used in this study was based on the success of previous studies using pre-trained MATLAB models [34]. In general, the number of parameters in a DL model is correlated with the inference time; when a model has more parameters, it runs slower. The proposed model had better accuracy than DeepLabV3+ and U-Net when using both each normalized satellite index and a combination of all three indices (NDVI, NDBI, and NDWI).
The proposed approach, which integrates a normalized satellite index improvement module at the end of the DL model, is applicable to any segmentation model that uses multispectral satellite imagery. Furthermore, the experimental results indicated an increase in accuracy with the NDVI, NDBI, and NDWI, without a significant increase in the inference time. This technique can also be used for RGB images if Landsat 8 images from Bands 2 to 6 are accessible.
The implementation of DL in RS-GIS is currently being carried out by many researchers. Many new methods exhibit good performance, including DeepLabV3+, which is fast and accurate. Despite the potential of remote sensing techniques to enhance segmentation model performance, research using these methods remains limited. In this study, a normalized satellite index approach, which includes the NDVI, NDBI, and NDWI, was used to improve the DeepLabV3+ performance. The proposed model achieved a 0.3% increase in the accuracy of DeepLabV3+ in the research area, demonstrating its feasibility without compromising other performance aspects such as speed. Future studies will explore the use of other normalized satellite indices and modifications to the baseline methods to improve the accuracy. These studies will also focus on complex land-cover segments such as wetlands, barren land, agriculture, and forests.
No potential conflict of interest relevant to this article is reported.
Comparison of DeepLabV3+ and proposed model. such as wetlands, barren land, agriculture, and forests.
Table 1. Performance of U-Net, DeepLabV3+, and proposed model.
Model | #Params | Accuracy (%) | Size in memory (MB) | Inference (s) |
---|---|---|---|---|
U-Net | 31.0 M | 94.473 | 110.1 | 439.08 |
DeepLabV3+ | 20.6 M | 95.184 | 48.4 | 20.77 |
Proposed model | 20.6 M | 48.4 | 20.77 |
The results in bold text represent the best values..
Algorithm 1. Normalized satellite index improvement.
INPUT: Segmented image, normalized index | |
1: | New Segmented = Segmented image result + normalized index |
2: | For all matrix elements in New Segmented |
3: | If there is a class segment change |
4: | Change to appropriate class of segment |
5: | End |
6: | End |
OUTPUT: New Segmented |
E-mail: herlawati@binus.ac.id
E-mail: edia@binus.ac.id
E-mail: yayaheryadi@binus.edu
E-mail: haryono@binus.edu
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 389-398
Published online December 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.4.389
Copyright © The Korean Institute of Intelligent Systems.
Herlawati Herlawati1,2 , Edi Abdurachman1, Yaya Heryadi1, and Haryono Soeparno1
1Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
2Department of Informatics, Universitas Bhayangkara Jakarta Raya, Jakarta, Indonesia
Correspondence to:Herlawati Herlawati (herlawati@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The availability of land-cover segmentation and classification maps at multiple time frames is crucial for designing spatial and regional planning. At present, remote sensing and geographic information system practitioners rely on object-based image analysis for land-cover segmentation/classification. Although deep learning methods are available, their use remains limited to satellite imagery datasets. DeepLabV3+ and U-Net are popular methods owing to their accuracy and speed. In this study, we propose a method for enhancing the accuracy of DeepLabV3+ to closely match ground truth datasets by integrating the normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), and normalized difference water index (NDWI) on the decoder side to correct land-cover segmentation. Testing of the proposed method in Karawang Regency, West Java, Indonesia demonstrated a 0.3% improvement in accuracy when the NDVI, NDBI, and NDWI were incorporated on the output side of DeepLabV3+.
Keywords: DeepLabV3+, U-Net, Semantic segmentation, Atrous spatial pyramid pooling, Multispectral dataset
Planners rely on satellite imagery to analyze land-cover conditions, which generates large amounts of data that require effective models for processing useful information. Methods include land-cover segmentation, land-cover change prediction, and clustering [1]. Remote sensing and geographic information system (RS-GIS) practitioners have traditionally used statistical methods such as object-based image analysis (OBIA), which are accurate but time-consuming and require skilled users [2, 3]. Therefore, several researchers have developed deep neural network-based methods for image segmentation, such as SegNet [4], U-Net [5], PSPNet [6], and DeepLab [7], which use various basic deep learning (DL) methods including the convolutional neural network (CNN), ResNet, MobileNet, Inception, and Xception. Most of these methods use benchmark datasets that are not obtained from satellite imagery, and most of these are RGB and not multispectral datasets.
Several researchers have attempted to enhance the accuracy and processing speed of deep neural networks by incorporating hybrid methods or by adding specific components to their network layers, as demonstrated in [8–19]. However, most proposed models use benchmark datasets that comprise solely non-satellite imagery and unmanned aerial vehicle (UAV) images, which have limited coverage areas, such as Indian Pines [20, 21], Hamlin Beach State (RIT) Datasets [22], and Eurosat [23]. These datasets may not offer sufficient coverage for urban planners who require data encompassing city/district-wide areas.
Satellite imagery such as Sentinel, Landsat, Terra, and Envisat, that are equipped with remote sensing technology as sensors possess multispectral characteristics with a broader range of frequencies, which is beneficial for segmentation and classification. Composite images comprising various band combinations, such as the normalized difference vegetation index (NDVI), green NDVI (GNDVI), normalized difference built-up index (NDBI), normalized difference water index (NDWI), and optimized soil-adjusted vegetation index (OSAVI), have advantages in detecting certain objects, such as plant diseases [24–26], and are frequently employed by researchers in segmentation. A drone or UAV is required to capture multispectral images, along with data preparation techniques such as orthomosaic and cropping methods.
The integration of DL methods with remote sensing technology offers the potential to enhance land-cover segmentation and classification performance significantly. Therefore, this study employed both DL methods and RS-GIS techniques. A multispectral dataset from Karawang, West Java, Indonesia, which was downloaded from Landsat 8, was used as a case study. Two baseline methods, namely U-Net and DeepLabV3+ [27], were compared with our proposed method. This study contributes to improving the performance of segmentation models by integrating DL models with spectral features; specifically, normalized satellite indices.
This study discusses the current popular models for land-cover segmentation, including U-Net and DeepLabV3+. We describe the process of preparing the datasets for training, validation, and testing and explain the framework for improving land-cover segmentation methods. Subsequently, we evaluate the accuracy of the proposed method using a simple prototype. Finally, the paper concludes with a summary of the experimental results.
The datasets used in this study were Landsat 8 images downloaded from the United States Geological Survey website [28]. The study area selected was around Karawang, West Java, Indonesia (107°02′–107°40′ E and 5°56′ – 6°34′ S) (Figure 1), with a capture date of May 11, 2021. Landsat 8 has a resolution of 30 m and is categorized as Level 1 of the land-cover classification standard, which includes urban land, agricultural, rangeland, forestland, water, barren land, tundra, and ice, according to the Anderson classification system [29]. Three classes were selected for the experiment: urban, vegetation (combined agricultural land and forestland), and water.
Landsat 8 imagery offers 11 bands; however, in this study, only six bands were used as input data: Band 2 (blue), Band 3 (green), Band 4 (red), Band 5 (near infrared), Band 6 (short wave infrared-1), and Band 7 (short infrared wave-2). These six bands were sufficient to meet the requirements for land-cover segmentation and classification [30].
Various geoprocessing tasks were conducted to match the study areas. These tasks included cropping Landsat tiles based on the study area, which is the Karawang District, dissolving them, and converting them into ready-to-use datasets in the MAT-file format. Suitable images were obtained from these tasks (Figure 2). Three datasets were created, namely training, validation, and testing data, which were evenly distributed to represent each segment class. These three datasets were used in the training, validation, and testing phases of the DL model for segmentation.
Two baseline semantic segmentation models, namely U-Net and DeepLabV3+, were used in this study. The tools for the experiment included ArcGIS, TerrSet, and the MATLAB programming language. To evaluate the model, the ground truth of Karawang District was prepared semi-automatically using the OBIA method and validated with local government documents.
Figure 3 shows the encoder and decoder structures of U-Net and DeepLabV3+. The encoder performs feature extraction/downsampling on the input side. In U-Net, the encoder blocks are concatenated with decoders of the same size, whereas in DeepLabV3+, this occurs only at a 4
The proposed method integrates DeepLabV3+ with normalized satellite indices, including NDVI, NDBI, and NDWI. Figure 4 depicts the proposed framework, which involves constructing the DeepLabV3+ model and using three datasets for training, validation, and testing. The U-Net and DeepLabV3+ models were trained and tested using the same datasets for comparison. The proposed method not only performs segmentation, but also generates composite images (NDVI, NDBI, and NDWI) to correct the segmentation result.
The training process is computationally intensive and requires significant resources, particularly the graphics processing unit (GPU). Therefore, the framework is divided into two models: the training and evaluation models. The training process may take several hours, whereas the testing process usually takes only 1 to 5 minutes. The results obtained from the training process are stored and subsequently used in the testing and evaluation processes.
The proposed model generates composite maps, followed by matrix conversion from decimals to integers using a certain threshold limit. Following the improvement step, evaluations are conducted with and without NDVI, NDBI, and NDWI corrections for analysis. The composite map generation step uses an appropriate band to calculate the index. For this purpose, the segmented study area must be equipped with multispectral satellite imagery data, particularly in Bands 2 to 6. In summary, the proposed model performs three activities during the prediction process: land-cover segmentation, composite map generation (NDVI, NDBI, and NDWI), and correction of the segmentation results to obtain the final land-cover segmentation.
The NDVI, NDBI, and NDWI are generated using the corresponding spectral bands based on
The blue, green, red, near-infrared, and short-wave infrared bands are represented based on the Landsat 8 channels of Bands 2, 3, 4, 5, and 6. For other satellites (e.g., Sentinel, Terra, etc.), it is necessary to adjust the band numbering for the same sensor as that of Landsat 8.
After the proposed model produces the final land-cover segmentation and classification, the evaluation stage uses a confusion matrix that compares the prediction results with the ground truth data. The accuracy is calculated using
where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negatives, respectively. Several additional processes are required to perform correction on the initial land-cover segmentation and classification results with the three normalized satellite indices. As normalized satellite indices have a continuous range from −1 to +1, a process is required to convert the NDVI, NDBI, and NDWI composite images into the appropriate data format.
Algorithm 1 was applied to the three indices NDVI, NDBI, and NDWI, which produced three matrices with integer elements. The matrices represent class and non-class segments with values of 1 and 0, respectively. The algorithm allows for an increase or decrease in a class owing to the addition of another segment of the normalized satellite index. The hypothesis of this study is that the use of the NDVI, NDBI, and NDWI can increase the accuracy of identifying vegetation, urban areas, and water.
From the multispectral satellite imagery, each band was cropped according to the study area. The training, validation, and testing datasets (2425
MATLAB was used to perform the training processes for U-Net and DeepLabV3+ [33]. This process requires a powerful GPU with parallel processing capabilities in MATLAB, which can be achieved using four processors known as workers. During the training process, graphs were generated to evaluate the model performance. The models exhibited satisfactory accuracy after two epochs and 2, 000 iterations, as indicated by the generated graphs.
The results of the training process were saved for use in the testing and evaluation processes. The entire study area was used as the testing dataset. The performances of U-Net and DeepLabV3+ were compared with that of the proposed model, which implemented enhancements using the normalized satellite indices (NDVI, NDBI, and NDWI). As shown in Figure 6 and Table 1, the proposed model achieved an accuracy of 94.473% using U-Net, with a processing time of approximately 4 minutes.
Prior to the enhancement process, the normalized satellite indices, which range from −1 to +1, must be converted into integers 0 and 1. The optimal threshold value for the normalized satellite indices was determined for comparison with the proposed approach. Several threshold values were tested and a graph was generated, as shown in Figure 7. Threshold values that resulted in the highest accuracy were used for the proposed model. The threshold values for the NDVI, NDBI, and NDWI were 0.29, 0, and 0.03, respectively.
Another experiment was conducted by creating a graphical user interface (GUI) that compared the performance of DeepLabV3+ with the improved DeepLabV3+ models. Figure 7 illustrates that the incorporation of the NDVI, NDBI, and NDWI enhanced the accuracy of DeepLabV3+ by 0.221%, 0.023%, and 0.022%, respectively, compared with the original accuracy of DeepLabV3+ (95.183%). Furthermore, an even higher accuracy of 95.438% was achieved by combining the NDVI, NDBI, and NDWI.
The trained models can be applied to other areas surrounding the study site with similar geographical conditions (two seasons). However, if there is a change in the segment classes or ground truth data are available, the training process can be repeated. The U-Net and DeepLabV3+ models use existing Landsat 8 bands for the feature extraction; however, these bands do not provide information that is specific to the detection of certain segments such as vegetation, built-up areas, or water. Therefore, normalized satellite indices can be incorporated to improve the accuracy of these specific segments. However, owing to limitations in detecting certain features such as forests, agriculture, and wetlands, a DL model is still necessary. The proposed model is also useful for non-multispectral image segmentation, such that of as RGB images, which is usually faster but less accurate. By incorporating normalized satellite indices, the accuracy of the model can be improved with only three bands (red, green, and blue).
To demonstrate the performance of the proposed model, a prototype was created to perform the land-cover segmentation and classification processes (Figure 8). The prototype demonstrated the accuracy of DeepLabV3+ compared to the proposed model, which involved normalized satellite indices.
The experimental results indicated that DeepLabV3+ was slightly more accurate and faster than U-Net, as shown in Table 1. The number of parameters used in this study was based on the success of previous studies using pre-trained MATLAB models [34]. In general, the number of parameters in a DL model is correlated with the inference time; when a model has more parameters, it runs slower. The proposed model had better accuracy than DeepLabV3+ and U-Net when using both each normalized satellite index and a combination of all three indices (NDVI, NDBI, and NDWI).
The proposed approach, which integrates a normalized satellite index improvement module at the end of the DL model, is applicable to any segmentation model that uses multispectral satellite imagery. Furthermore, the experimental results indicated an increase in accuracy with the NDVI, NDBI, and NDWI, without a significant increase in the inference time. This technique can also be used for RGB images if Landsat 8 images from Bands 2 to 6 are accessible.
The implementation of DL in RS-GIS is currently being carried out by many researchers. Many new methods exhibit good performance, including DeepLabV3+, which is fast and accurate. Despite the potential of remote sensing techniques to enhance segmentation model performance, research using these methods remains limited. In this study, a normalized satellite index approach, which includes the NDVI, NDBI, and NDWI, was used to improve the DeepLabV3+ performance. The proposed model achieved a 0.3% increase in the accuracy of DeepLabV3+ in the research area, demonstrating its feasibility without compromising other performance aspects such as speed. Future studies will explore the use of other normalized satellite indices and modifications to the baseline methods to improve the accuracy. These studies will also focus on complex land-cover segments such as wetlands, barren land, agriculture, and forests.
Location of study area and datasets.
Preprocessing satellite images into usable datasets.
(a) U-Net and (b) DeepLabV3+.
Proposed model framework.
Training performance of U-Net and DeepLabV3+.
U-Net model experiment.
Optimal threshold for (a) NDVI, (b) NDBI, and (c) NDWI.
Comparison of DeepLabV3+ and proposed model. such as wetlands, barren land, agriculture, and forests.
Table 1 . Performance of U-Net, DeepLabV3+, and proposed model.
Model | #Params | Accuracy (%) | Size in memory (MB) | Inference (s) |
---|---|---|---|---|
U-Net | 31.0 M | 94.473 | 110.1 | 439.08 |
DeepLabV3+ | 20.6 M | 95.184 | 48.4 | 20.77 |
Proposed model | 20.6 M | 48.4 | 20.77 |
The results in bold text represent the best values..
Algorithm 1. Normalized satellite index improvement.
INPUT: Segmented image, normalized index | |
1: | New Segmented = Segmented image result + normalized index |
2: | For all matrix elements in New Segmented |
3: | If there is a class segment change |
4: | Change to appropriate class of segment |
5: | End |
6: | End |
OUTPUT: New Segmented |
Xinzhi Hu, Wang-Su Jeon, Grezgorz Cielniak, and Sang-Yong Rhee
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 1-9 https://doi.org/10.5391/IJFIS.2024.24.1.1Wang-Su Jeon and Sang-Yong Rhee
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(4): 401-408 https://doi.org/10.5391/IJFIS.2021.21.4.401Location of study area and datasets.
|@|~(^,^)~|@|Preprocessing satellite images into usable datasets.
|@|~(^,^)~|@|(a) U-Net and (b) DeepLabV3+.
|@|~(^,^)~|@|Proposed model framework.
|@|~(^,^)~|@|Training performance of U-Net and DeepLabV3+.
|@|~(^,^)~|@|U-Net model experiment.
|@|~(^,^)~|@|Optimal threshold for (a) NDVI, (b) NDBI, and (c) NDWI.
|@|~(^,^)~|@|Comparison of DeepLabV3+ and proposed model. such as wetlands, barren land, agriculture, and forests.