Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 389-398

Published online December 25, 2023

https://doi.org/10.5391/IJFIS.2023.23.4.389

© The Korean Institute of Intelligent Systems

Improving DeepLabV3+ Using Normalized Satellite Indices in Land-Cover Segmentation

Herlawati Herlawati1,2 , Edi Abdurachman1, Yaya Heryadi1, and Haryono Soeparno1

1Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
2Department of Informatics, Universitas Bhayangkara Jakarta Raya, Jakarta, Indonesia

Correspondence to :
Herlawati Herlawati (herlawati@binus.ac.id)

Received: June 30, 2023; Accepted: September 8, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The availability of land-cover segmentation and classification maps at multiple time frames is crucial for designing spatial and regional planning. At present, remote sensing and geographic information system practitioners rely on object-based image analysis for land-cover segmentation/classification. Although deep learning methods are available, their use remains limited to satellite imagery datasets. DeepLabV3+ and U-Net are popular methods owing to their accuracy and speed. In this study, we propose a method for enhancing the accuracy of DeepLabV3+ to closely match ground truth datasets by integrating the normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), and normalized difference water index (NDWI) on the decoder side to correct land-cover segmentation. Testing of the proposed method in Karawang Regency, West Java, Indonesia demonstrated a 0.3% improvement in accuracy when the NDVI, NDBI, and NDWI were incorporated on the output side of DeepLabV3+.

Keywords: DeepLabV3+, U-Net, Semantic segmentation, Atrous spatial pyramid pooling, Multispectral dataset

Planners rely on satellite imagery to analyze land-cover conditions, which generates large amounts of data that require effective models for processing useful information. Methods include land-cover segmentation, land-cover change prediction, and clustering [1]. Remote sensing and geographic information system (RS-GIS) practitioners have traditionally used statistical methods such as object-based image analysis (OBIA), which are accurate but time-consuming and require skilled users [2, 3]. Therefore, several researchers have developed deep neural network-based methods for image segmentation, such as SegNet [4], U-Net [5], PSPNet [6], and DeepLab [7], which use various basic deep learning (DL) methods including the convolutional neural network (CNN), ResNet, MobileNet, Inception, and Xception. Most of these methods use benchmark datasets that are not obtained from satellite imagery, and most of these are RGB and not multispectral datasets.

Several researchers have attempted to enhance the accuracy and processing speed of deep neural networks by incorporating hybrid methods or by adding specific components to their network layers, as demonstrated in [819]. However, most proposed models use benchmark datasets that comprise solely non-satellite imagery and unmanned aerial vehicle (UAV) images, which have limited coverage areas, such as Indian Pines [20, 21], Hamlin Beach State (RIT) Datasets [22], and Eurosat [23]. These datasets may not offer sufficient coverage for urban planners who require data encompassing city/district-wide areas.

Satellite imagery such as Sentinel, Landsat, Terra, and Envisat, that are equipped with remote sensing technology as sensors possess multispectral characteristics with a broader range of frequencies, which is beneficial for segmentation and classification. Composite images comprising various band combinations, such as the normalized difference vegetation index (NDVI), green NDVI (GNDVI), normalized difference built-up index (NDBI), normalized difference water index (NDWI), and optimized soil-adjusted vegetation index (OSAVI), have advantages in detecting certain objects, such as plant diseases [2426], and are frequently employed by researchers in segmentation. A drone or UAV is required to capture multispectral images, along with data preparation techniques such as orthomosaic and cropping methods.

The integration of DL methods with remote sensing technology offers the potential to enhance land-cover segmentation and classification performance significantly. Therefore, this study employed both DL methods and RS-GIS techniques. A multispectral dataset from Karawang, West Java, Indonesia, which was downloaded from Landsat 8, was used as a case study. Two baseline methods, namely U-Net and DeepLabV3+ [27], were compared with our proposed method. This study contributes to improving the performance of segmentation models by integrating DL models with spectral features; specifically, normalized satellite indices.

This study discusses the current popular models for land-cover segmentation, including U-Net and DeepLabV3+. We describe the process of preparing the datasets for training, validation, and testing and explain the framework for improving land-cover segmentation methods. Subsequently, we evaluate the accuracy of the proposed method using a simple prototype. Finally, the paper concludes with a summary of the experimental results.

The datasets used in this study were Landsat 8 images downloaded from the United States Geological Survey website [28]. The study area selected was around Karawang, West Java, Indonesia (107°02′–107°40′ E and 5°56′ – 6°34′ S) (Figure 1), with a capture date of May 11, 2021. Landsat 8 has a resolution of 30 m and is categorized as Level 1 of the land-cover classification standard, which includes urban land, agricultural, rangeland, forestland, water, barren land, tundra, and ice, according to the Anderson classification system [29]. Three classes were selected for the experiment: urban, vegetation (combined agricultural land and forestland), and water.

Landsat 8 imagery offers 11 bands; however, in this study, only six bands were used as input data: Band 2 (blue), Band 3 (green), Band 4 (red), Band 5 (near infrared), Band 6 (short wave infrared-1), and Band 7 (short infrared wave-2). These six bands were sufficient to meet the requirements for land-cover segmentation and classification [30].

Various geoprocessing tasks were conducted to match the study areas. These tasks included cropping Landsat tiles based on the study area, which is the Karawang District, dissolving them, and converting them into ready-to-use datasets in the MAT-file format. Suitable images were obtained from these tasks (Figure 2). Three datasets were created, namely training, validation, and testing data, which were evenly distributed to represent each segment class. These three datasets were used in the training, validation, and testing phases of the DL model for segmentation.

3.1 Baseline Model

Two baseline semantic segmentation models, namely U-Net and DeepLabV3+, were used in this study. The tools for the experiment included ArcGIS, TerrSet, and the MATLAB programming language. To evaluate the model, the ground truth of Karawang District was prepared semi-automatically using the OBIA method and validated with local government documents.

Figure 3 shows the encoder and decoder structures of U-Net and DeepLabV3+. The encoder performs feature extraction/downsampling on the input side. In U-Net, the encoder blocks are concatenated with decoders of the same size, whereas in DeepLabV3+, this occurs only at a 4× zoom. Furthermore, DeepLabV3+ uses atrous spatial pyramid pooling (ASPP) at the end of the feature extraction/downsampling process to improve the object detection for all sizes [31]. DeepLabV3+ employs the ResNet-50 model, which offers advantages over U-Net that uses CNNs to address the vanishing gradient problem [32]. In addition, the ResNet characteristics in DeepLabV3+ improve the processing speed compared to the CNNs in U-Net. In this study, both U-Net and DeepLabV3+ used Landsat 8 multispectral datasets as inputs and produced a segmented image as the output.

3.2 Proposed Model

The proposed method integrates DeepLabV3+ with normalized satellite indices, including NDVI, NDBI, and NDWI. Figure 4 depicts the proposed framework, which involves constructing the DeepLabV3+ model and using three datasets for training, validation, and testing. The U-Net and DeepLabV3+ models were trained and tested using the same datasets for comparison. The proposed method not only performs segmentation, but also generates composite images (NDVI, NDBI, and NDWI) to correct the segmentation result.

The training process is computationally intensive and requires significant resources, particularly the graphics processing unit (GPU). Therefore, the framework is divided into two models: the training and evaluation models. The training process may take several hours, whereas the testing process usually takes only 1 to 5 minutes. The results obtained from the training process are stored and subsequently used in the testing and evaluation processes.

The proposed model generates composite maps, followed by matrix conversion from decimals to integers using a certain threshold limit. Following the improvement step, evaluations are conducted with and without NDVI, NDBI, and NDWI corrections for analysis. The composite map generation step uses an appropriate band to calculate the index. For this purpose, the segmented study area must be equipped with multispectral satellite imagery data, particularly in Bands 2 to 6. In summary, the proposed model performs three activities during the prediction process: land-cover segmentation, composite map generation (NDVI, NDBI, and NDWI), and correction of the segmentation results to obtain the final land-cover segmentation.

The NDVI, NDBI, and NDWI are generated using the corresponding spectral bands based on Eqs. (1)(3). The output is a composite image obtained from the fusion of the Landsat 8 spectral bands.

NDVI=NIR-REDNIR+RED,NDBI=SWIR-NIRSWIR+NIR,NDWI=GREEN-NIRGREEN+NIR.

The blue, green, red, near-infrared, and short-wave infrared bands are represented based on the Landsat 8 channels of Bands 2, 3, 4, 5, and 6. For other satellites (e.g., Sentinel, Terra, etc.), it is necessary to adjust the band numbering for the same sensor as that of Landsat 8.

3.3 Model Evaluation

After the proposed model produces the final land-cover segmentation and classification, the evaluation stage uses a confusion matrix that compares the prediction results with the ground truth data. The accuracy is calculated using Eq. (4):

Accuracy=TP+TNTP+TN+FP+FN,

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negatives, respectively. Several additional processes are required to perform correction on the initial land-cover segmentation and classification results with the three normalized satellite indices. As normalized satellite indices have a continuous range from −1 to +1, a process is required to convert the NDVI, NDBI, and NDWI composite images into the appropriate data format.

Algorithm 1 was applied to the three indices NDVI, NDBI, and NDWI, which produced three matrices with integer elements. The matrices represent class and non-class segments with values of 1 and 0, respectively. The algorithm allows for an increase or decrease in a class owing to the addition of another segment of the normalized satellite index. The hypothesis of this study is that the use of the NDVI, NDBI, and NDWI can increase the accuracy of identifying vegetation, urban areas, and water.

From the multispectral satellite imagery, each band was cropped according to the study area. The training, validation, and testing datasets (2425 × 2075 pixels) were converted into MAT-file before the training process. The training process was performed in two epochs (2, 000 iterations). Figure 5 shows the training performances of U-Net and DeepLabV3+, which were executed for approximately 8 hours.

MATLAB was used to perform the training processes for U-Net and DeepLabV3+ [33]. This process requires a powerful GPU with parallel processing capabilities in MATLAB, which can be achieved using four processors known as workers. During the training process, graphs were generated to evaluate the model performance. The models exhibited satisfactory accuracy after two epochs and 2, 000 iterations, as indicated by the generated graphs.

The results of the training process were saved for use in the testing and evaluation processes. The entire study area was used as the testing dataset. The performances of U-Net and DeepLabV3+ were compared with that of the proposed model, which implemented enhancements using the normalized satellite indices (NDVI, NDBI, and NDWI). As shown in Figure 6 and Table 1, the proposed model achieved an accuracy of 94.473% using U-Net, with a processing time of approximately 4 minutes.

Prior to the enhancement process, the normalized satellite indices, which range from −1 to +1, must be converted into integers 0 and 1. The optimal threshold value for the normalized satellite indices was determined for comparison with the proposed approach. Several threshold values were tested and a graph was generated, as shown in Figure 7. Threshold values that resulted in the highest accuracy were used for the proposed model. The threshold values for the NDVI, NDBI, and NDWI were 0.29, 0, and 0.03, respectively.

Another experiment was conducted by creating a graphical user interface (GUI) that compared the performance of DeepLabV3+ with the improved DeepLabV3+ models. Figure 7 illustrates that the incorporation of the NDVI, NDBI, and NDWI enhanced the accuracy of DeepLabV3+ by 0.221%, 0.023%, and 0.022%, respectively, compared with the original accuracy of DeepLabV3+ (95.183%). Furthermore, an even higher accuracy of 95.438% was achieved by combining the NDVI, NDBI, and NDWI.

The trained models can be applied to other areas surrounding the study site with similar geographical conditions (two seasons). However, if there is a change in the segment classes or ground truth data are available, the training process can be repeated. The U-Net and DeepLabV3+ models use existing Landsat 8 bands for the feature extraction; however, these bands do not provide information that is specific to the detection of certain segments such as vegetation, built-up areas, or water. Therefore, normalized satellite indices can be incorporated to improve the accuracy of these specific segments. However, owing to limitations in detecting certain features such as forests, agriculture, and wetlands, a DL model is still necessary. The proposed model is also useful for non-multispectral image segmentation, such that of as RGB images, which is usually faster but less accurate. By incorporating normalized satellite indices, the accuracy of the model can be improved with only three bands (red, green, and blue).

To demonstrate the performance of the proposed model, a prototype was created to perform the land-cover segmentation and classification processes (Figure 8). The prototype demonstrated the accuracy of DeepLabV3+ compared to the proposed model, which involved normalized satellite indices.

The experimental results indicated that DeepLabV3+ was slightly more accurate and faster than U-Net, as shown in Table 1. The number of parameters used in this study was based on the success of previous studies using pre-trained MATLAB models [34]. In general, the number of parameters in a DL model is correlated with the inference time; when a model has more parameters, it runs slower. The proposed model had better accuracy than DeepLabV3+ and U-Net when using both each normalized satellite index and a combination of all three indices (NDVI, NDBI, and NDWI).

The proposed approach, which integrates a normalized satellite index improvement module at the end of the DL model, is applicable to any segmentation model that uses multispectral satellite imagery. Furthermore, the experimental results indicated an increase in accuracy with the NDVI, NDBI, and NDWI, without a significant increase in the inference time. This technique can also be used for RGB images if Landsat 8 images from Bands 2 to 6 are accessible.

The implementation of DL in RS-GIS is currently being carried out by many researchers. Many new methods exhibit good performance, including DeepLabV3+, which is fast and accurate. Despite the potential of remote sensing techniques to enhance segmentation model performance, research using these methods remains limited. In this study, a normalized satellite index approach, which includes the NDVI, NDBI, and NDWI, was used to improve the DeepLabV3+ performance. The proposed model achieved a 0.3% increase in the accuracy of DeepLabV3+ in the research area, demonstrating its feasibility without compromising other performance aspects such as speed. Future studies will explore the use of other normalized satellite indices and modifications to the baseline methods to improve the accuracy. These studies will also focus on complex land-cover segments such as wetlands, barren land, agriculture, and forests.

Fig. 1.

Location of study area and datasets.


Fig. 2.

Preprocessing satellite images into usable datasets.


Fig. 3.

(a) U-Net and (b) DeepLabV3+.


Fig. 4.

Proposed model framework.


Fig. 5.

Training performance of U-Net and DeepLabV3+.


Fig. 6.

U-Net model experiment.


Fig. 7.

Optimal threshold for (a) NDVI, (b) NDBI, and (c) NDWI.


Fig. 8.

Comparison of DeepLabV3+ and proposed model. such as wetlands, barren land, agriculture, and forests.


Table. 1.

Table 1. Performance of U-Net, DeepLabV3+, and proposed model.

Model#ParamsAccuracy (%)Size in memory (MB)Inference (s)
U-Net31.0 M94.473110.1439.08
DeepLabV3+20.6 M95.18448.420.77
Proposed model20.6 M95.43848.420.77

The results in bold text represent the best values..


Table. 2.

Algorithm 1. Normalized satellite index improvement.

INPUT: Segmented image, normalized index
1:New Segmented = Segmented image result + normalized index
2:For all matrix elements in New Segmented
3: If there is a class segment change
4:  Change to appropriate class of segment
5: End
6:End
OUTPUT: New Segmented

  1. Wu, H, Liu, Q, and Liu, X (2019). A review on deep learning approaches to image classification and object segmentation. Computers, Materials and Continua. 60, 575-597. https://doi.org/10.32604/cmc.2019.03595
    CrossRef
  2. Anders, N, Seijmonsbergen, H, Bouten, W, and Smith, M. Optimizing object-based image analysis for semiautomated geomorphological mapping., Proceedings of Geomorphometry, Redlands, 2016, CA, USA, pp.117-120.
  3. Martini, BF, and Miller, DA (2021). Using object-based image analysis to detect laughing gull nests. GIScience and Remote Sensing. 58, 1497-1517. https://doi.org/10.1080/15481603.2021.1999376
    CrossRef
  4. Badrinarayanan, V, Kendall, A, and Cipolla, R (2017). SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39, 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
    Pubmed CrossRef
  5. Ronneberger, O, Fischer, P, and Brox, T (2015). U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham, Switzerland: Springer, pp. 234-241 https://doi.org/10.1007/978-3-319-24574-4_28
    CrossRef
  6. Zhao, H, Shi, J, Qi, X, Wang, X, and Jia, J . Pyramid scene parsing network., Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Honolulu, HI, USA, Array, pp.6230-6239. https://doi.org/10.1109/CVPR.2017.660
    CrossRef
  7. Chen, LC, Zhu, Y, Papandreou, G, Schrof, F, and Adam, H (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Computer Vision - ECCV 2018. Cham, Switzerland: Springer, pp. 833-851 https://doi.org/10.1007/978-3-030-01234-2_49
    CrossRef
  8. Pang, K, Weng, L, Zhang, Y, Liu, J, Lin, H, and Xia, M (2022). SGBNet: an ultra light-weight network for real-time semantic segmentation of land cover. International Journal of Remote Sensing. 43, 5917-5939. https://doi.org/10.1080/01431161.2021.2022805
    CrossRef
  9. Priyanka, SN, Lal, S, Nalini, J, Reddy, CS, and Dell’Acqua, F (2023). DPPNet: an efficient and robust deep learning network for land cover segmentation from high-resolution satellite images. IEEE Transactions on Emerging Topics in Computational Intelligence. 7, 127-139. https://doi.org/10.1109/TETCI.2022.3182414
    CrossRef
  10. Zhang, X, Wang, Z, Zhang, J, and Wei, A (2022). MSANet: an improved semantic segmentation method using multiscale attention for remote sensing images. Remote Sensing Letters. 13, 1249-1259. https://doi.org/10.1080/2150704X.2022.2142075
    CrossRef
  11. Li, Y, Zhou, Y, Zhang, Y, Zhong, L, Wang, J, and Chen, J (2022). DKDFN: domain knowledge-guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification. ISPRS Journal of Photogrammetry and Remote Sensing. 186, 170-189. https://doi.org/10.1016/j.isprsjprs.2022.02.013
    CrossRef
  12. Wei, H, Xu, X, Ou, N, Zhang, X, and Dai, Y (2021). DEANet: dual encoder with attention network for semantic segmentation of remote sensing imagery. Remote Sensing. 13, article no 3900. https://doi.org/10.3390/rs13193900
    CrossRef
  13. Wang, D, Yang, R, Liu, H, He, H, Tan, J, Li, S, Qiao, Y, Tang, K, and Wang, X (2022). HFENet: hierarchical feature extraction network for accurate landcover classification. Remote Sensing. 14, article no 4244. https://doi.org/10.3390/rs14174244
    CrossRef
  14. Wang, L, Li, R, Zhang, C, Fang, S, Duan, C, Meng, X, and Atkinson, PM (2022). UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing. 190, 196-214. https://doi.org/10.1016/j.isprsjprs.2022.06.008
    CrossRef
  15. Huang, J, Weng, L, Chen, B, and Xia, M (2021). DFFAN: Dual function feature aggregation network for semantic segmentation of land cover. ISPRS International Journal of Geo-Information. 10, article no 125. https://doi.org/10.3390/ijgi10030125
    CrossRef
  16. Li, X, Zhang, G, Cui, H, Hou, S, Wang, S, Li, X, Chen, Y, Li, Z, and Zhang, L (2022). MCANet: a joint semantic segmentation framework of optical and SAR images for land use classification. International Journal of Applied Earth Observation and Geoinformation. 106, article no 102638. https://doi.org/10.1016/j.jag.2021.102638
    CrossRef
  17. Gao, J, Weng, L, Xia, M, and Lin, H (2022). MLNet: multichannel feature fusion lozenge network for land segmentation. Journal of Applied Remote Sensing. 16, article no 016513. https://doi.org/10.1117/1.JRS.16.016513
    CrossRef
  18. Zhang, Z, Lu, W, Cao, J, and Xie, G (2022). MKANet: an efficient network with Sobel boundary loss for land-cover classification of satellite remote sensing imagery. Remote Sensing. 14, article no 4514. https://doi.org/10.3390/rs14184514
    CrossRef
  19. Liu, R, Tao, F, Liu, X, Na, J, Leng, H, Wu, J, and Zhou, T (2022). RAANet: a residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sensing. 14, article no 3109. https://doi.org/10.3390/rs14133109
    CrossRef
  20. Bhosle, K, and Musande, V (2019). Evaluation of deep learning CNN model for land use land cover classification and crop identification using hyperspectral remote sensing images. Journal of the Indian Society of Remote Sensing. 47, 1949-1958. https://doi.org/10.1007/s12524-019-01041-2
    CrossRef
  21. Bai, J, Wen, Z, Xiao, Z, Ye, F, Zhu, Y, Alazab, M, and Jiao, L (2022). Hyperspectral image classification based on multi-branch attention transformer networks. IEEE Transactions on Geoscience and Remote Sensing. 60, article no 5535317. https://doi.org/10.1109/TGRS.2022.3196661
    CrossRef
  22. Kemker, R, Salvaggio, C, and Kanan, C (2018). Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing. 145, 60-77. https://doi.org/10.1016/j.isprsjprs.2018.04.014
    CrossRef
  23. Helber, P, Bischke, B, Dengel, A, and Borth, D (2019). EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 12, 2217-2226. https://doi.org/10.1109/JSTARS.2019.2918242
    CrossRef
  24. Ramos, APM, Osco, LP, Furuya, DEG, Gonçalves, WN, Santana, DC, and Teodoro, LPR (2020). A random forest ranking approach to predict yield in maize with UAV-based vegetation spectral indices. Computers and Electronics in Agriculture. 178, article no 105791. https://doi.org/10.1016/j.compag.2020.105791
    CrossRef
  25. Selvaraj, MG, Vergara, A, Montenegro, F, Ruiz, HA, Safari, N, and Raymaekers, D (2020). Detection of banana plants and their major diseases through aerial images and machine learning methods: a case study in DR Congo and Republic of Benin. ISPRS Journal of Photogrammetry and Remote Sensing. 169, 110-124. https://doi.org/10.1016/j.isprsjprs.2020.08.025
    CrossRef
  26. Zhang, H, Ma, J, Chen, C, and Tian, X (2020). NDVI-Net: a fusion network for generating high-resolution normalized difference vegetation index in remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing. 168, 182-196. https://doi.org/10.1016/j.isprsjprs.2020.08.010
    CrossRef
  27. Scepanovic, S, Antropov, O, Laurila, P, Rauste, Y, Ignatenko, V, and Praks, J (2021). Wide-area land cover mapping with Sentinel-1 imagery using deep learning semantic segmentation models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 14, 10357-10374. https://doi.org/10.1109/JSTARS.2021.3116094
    CrossRef
  28. Zhu, Z, Wulder, MA, Roy, DP, Woodcock, CE, Hansen, MC, and Radeloff, VC (2019). Benefits of the free and open Landsat data policy. Remote Sensing of Environment. 224, 382-385. https://doi.org/10.1016/j.rse.2019.02.016
    CrossRef
  29. Giri, CP (2012). Remote Sensing of Land Use and Land Cover. Boca Raton, FL: CRC Press
  30. Kemker, R, Salvaggio, C, and Kanan, C. (2017) . High-resolution multispectral dataset for semantic segmentation. [Online]. Available: https://arxiv.org/abs/1703.01918
  31. Chen, LC, Papandreou, G, Kokkinos, I, Murphy, K, and Yuille, AL (2018). DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40, 834-848. https://doi.org/10.1109/TPAMI.2017.2699184
    Pubmed CrossRef
  32. Shah, A, Kadam, E, Shah, H, Shinde, S, and Shingade, S 2016. Deep residual networks with exponential linear unit., Proceedings of the 3rd International Symposium on Computer Vision and the Internet, 2016, Jaipur, India, Array, pp.59-65. https://doi.org/10.1145/2983402.2983406
    CrossRef
  33. MathWorks. (c2023) . Semantic segmentation of multispectral images using deep learning. [Online]. Available: https://www.mathworks.com/help/images/multispectral-semantic-segmentation-using-deep-learning.html
  34. MathWorks. (c2023) . Pretrained deep neural networks. [Online]. Available: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html

Herlawati Herlawati is a lecturer at the Faculty of Computer Science, Bhayangkara Jakarta Raya University, Jakarta, Indonesia. She is currently completing her doctoral studies in the Computer Science Department, Binus Graduate Program, specializing in Computer Science, at Bina Nusantara University, Jakarta, Indonesia. In addition, she holds a competency certificate as a database programmer and is a book writer. Her research interests include data mining, deep learning, and spatial data.

E-mail: herlawati@binus.ac.id

Edi Abdurachman is a professor in statistics, inaugurated at Bina Nusantara University in 2009. He earned a doctorate in statistics in 1986 and a master of science in statistical surveying in 1983 from Iowa State University, Ames, USA. He received an M.S. degree and a degree in agricultural engineering (Ir) (cum laude) in 1978 from the Bogor Agricultural University (IPB). He has lengthy experience as a statistical consultant at the national and international levels. He has published substantial research in the field of statistics. He also received the Presidential Award for 10 years Satyalencana, 20 years Satyalencana, and 30 years Satyalencana. He has been the recipient of teaching awards and the Best Lecturer Award.

E-mail: edia@binus.ac.id

Yaya Heryadi is a lecturer and researcher at the Doctor of Computer Science Program, Binus Graduate Program, Bina Nusantara University, Jakarta, Indonesia, where he conducts research in data science, computer vision, machine learning, and natural language processing. In addition, as a certified data scientist and book writer, he is actively involved in developing data science training and certification in Indonesia. He received his doctoral degree in computer science from Universitas Indonesia in 2014, master of science in Computer Science from Indiana University at Bloomington, and bachelor’s degree in statistics and computation from Institut Pertanian Bogor in 1984.

E-mail: yayaheryadi@binus.edu

Haryono Soeparno currently serves as an associate professor in the field of computer science and head of concentration in computer science, Doctor of Computer Science Program, Binus Graduate Program, Bina Nusantara University, Jakarta. He received his doctoral degree from the Division of Computer Science, School of Engineering and Technology, Asian Institute of Technology, Bangkok, Thailand, in 1995. His research interests include artificial intelligence, machine learning, deep learning, knowledge-based system, knowledge management, and enterprise architecture and systems.

E-mail: haryono@binus.edu

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 389-398

Published online December 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.4.389

Copyright © The Korean Institute of Intelligent Systems.

Improving DeepLabV3+ Using Normalized Satellite Indices in Land-Cover Segmentation

Herlawati Herlawati1,2 , Edi Abdurachman1, Yaya Heryadi1, and Haryono Soeparno1

1Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
2Department of Informatics, Universitas Bhayangkara Jakarta Raya, Jakarta, Indonesia

Correspondence to:Herlawati Herlawati (herlawati@binus.ac.id)

Received: June 30, 2023; Accepted: September 8, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The availability of land-cover segmentation and classification maps at multiple time frames is crucial for designing spatial and regional planning. At present, remote sensing and geographic information system practitioners rely on object-based image analysis for land-cover segmentation/classification. Although deep learning methods are available, their use remains limited to satellite imagery datasets. DeepLabV3+ and U-Net are popular methods owing to their accuracy and speed. In this study, we propose a method for enhancing the accuracy of DeepLabV3+ to closely match ground truth datasets by integrating the normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), and normalized difference water index (NDWI) on the decoder side to correct land-cover segmentation. Testing of the proposed method in Karawang Regency, West Java, Indonesia demonstrated a 0.3% improvement in accuracy when the NDVI, NDBI, and NDWI were incorporated on the output side of DeepLabV3+.

Keywords: DeepLabV3+, U-Net, Semantic segmentation, Atrous spatial pyramid pooling, Multispectral dataset

1. Introduction

Planners rely on satellite imagery to analyze land-cover conditions, which generates large amounts of data that require effective models for processing useful information. Methods include land-cover segmentation, land-cover change prediction, and clustering [1]. Remote sensing and geographic information system (RS-GIS) practitioners have traditionally used statistical methods such as object-based image analysis (OBIA), which are accurate but time-consuming and require skilled users [2, 3]. Therefore, several researchers have developed deep neural network-based methods for image segmentation, such as SegNet [4], U-Net [5], PSPNet [6], and DeepLab [7], which use various basic deep learning (DL) methods including the convolutional neural network (CNN), ResNet, MobileNet, Inception, and Xception. Most of these methods use benchmark datasets that are not obtained from satellite imagery, and most of these are RGB and not multispectral datasets.

Several researchers have attempted to enhance the accuracy and processing speed of deep neural networks by incorporating hybrid methods or by adding specific components to their network layers, as demonstrated in [819]. However, most proposed models use benchmark datasets that comprise solely non-satellite imagery and unmanned aerial vehicle (UAV) images, which have limited coverage areas, such as Indian Pines [20, 21], Hamlin Beach State (RIT) Datasets [22], and Eurosat [23]. These datasets may not offer sufficient coverage for urban planners who require data encompassing city/district-wide areas.

Satellite imagery such as Sentinel, Landsat, Terra, and Envisat, that are equipped with remote sensing technology as sensors possess multispectral characteristics with a broader range of frequencies, which is beneficial for segmentation and classification. Composite images comprising various band combinations, such as the normalized difference vegetation index (NDVI), green NDVI (GNDVI), normalized difference built-up index (NDBI), normalized difference water index (NDWI), and optimized soil-adjusted vegetation index (OSAVI), have advantages in detecting certain objects, such as plant diseases [2426], and are frequently employed by researchers in segmentation. A drone or UAV is required to capture multispectral images, along with data preparation techniques such as orthomosaic and cropping methods.

The integration of DL methods with remote sensing technology offers the potential to enhance land-cover segmentation and classification performance significantly. Therefore, this study employed both DL methods and RS-GIS techniques. A multispectral dataset from Karawang, West Java, Indonesia, which was downloaded from Landsat 8, was used as a case study. Two baseline methods, namely U-Net and DeepLabV3+ [27], were compared with our proposed method. This study contributes to improving the performance of segmentation models by integrating DL models with spectral features; specifically, normalized satellite indices.

This study discusses the current popular models for land-cover segmentation, including U-Net and DeepLabV3+. We describe the process of preparing the datasets for training, validation, and testing and explain the framework for improving land-cover segmentation methods. Subsequently, we evaluate the accuracy of the proposed method using a simple prototype. Finally, the paper concludes with a summary of the experimental results.

2. Datasets

The datasets used in this study were Landsat 8 images downloaded from the United States Geological Survey website [28]. The study area selected was around Karawang, West Java, Indonesia (107°02′–107°40′ E and 5°56′ – 6°34′ S) (Figure 1), with a capture date of May 11, 2021. Landsat 8 has a resolution of 30 m and is categorized as Level 1 of the land-cover classification standard, which includes urban land, agricultural, rangeland, forestland, water, barren land, tundra, and ice, according to the Anderson classification system [29]. Three classes were selected for the experiment: urban, vegetation (combined agricultural land and forestland), and water.

Landsat 8 imagery offers 11 bands; however, in this study, only six bands were used as input data: Band 2 (blue), Band 3 (green), Band 4 (red), Band 5 (near infrared), Band 6 (short wave infrared-1), and Band 7 (short infrared wave-2). These six bands were sufficient to meet the requirements for land-cover segmentation and classification [30].

Various geoprocessing tasks were conducted to match the study areas. These tasks included cropping Landsat tiles based on the study area, which is the Karawang District, dissolving them, and converting them into ready-to-use datasets in the MAT-file format. Suitable images were obtained from these tasks (Figure 2). Three datasets were created, namely training, validation, and testing data, which were evenly distributed to represent each segment class. These three datasets were used in the training, validation, and testing phases of the DL model for segmentation.

3. Model Building

3.1 Baseline Model

Two baseline semantic segmentation models, namely U-Net and DeepLabV3+, were used in this study. The tools for the experiment included ArcGIS, TerrSet, and the MATLAB programming language. To evaluate the model, the ground truth of Karawang District was prepared semi-automatically using the OBIA method and validated with local government documents.

Figure 3 shows the encoder and decoder structures of U-Net and DeepLabV3+. The encoder performs feature extraction/downsampling on the input side. In U-Net, the encoder blocks are concatenated with decoders of the same size, whereas in DeepLabV3+, this occurs only at a 4× zoom. Furthermore, DeepLabV3+ uses atrous spatial pyramid pooling (ASPP) at the end of the feature extraction/downsampling process to improve the object detection for all sizes [31]. DeepLabV3+ employs the ResNet-50 model, which offers advantages over U-Net that uses CNNs to address the vanishing gradient problem [32]. In addition, the ResNet characteristics in DeepLabV3+ improve the processing speed compared to the CNNs in U-Net. In this study, both U-Net and DeepLabV3+ used Landsat 8 multispectral datasets as inputs and produced a segmented image as the output.

3.2 Proposed Model

The proposed method integrates DeepLabV3+ with normalized satellite indices, including NDVI, NDBI, and NDWI. Figure 4 depicts the proposed framework, which involves constructing the DeepLabV3+ model and using three datasets for training, validation, and testing. The U-Net and DeepLabV3+ models were trained and tested using the same datasets for comparison. The proposed method not only performs segmentation, but also generates composite images (NDVI, NDBI, and NDWI) to correct the segmentation result.

The training process is computationally intensive and requires significant resources, particularly the graphics processing unit (GPU). Therefore, the framework is divided into two models: the training and evaluation models. The training process may take several hours, whereas the testing process usually takes only 1 to 5 minutes. The results obtained from the training process are stored and subsequently used in the testing and evaluation processes.

The proposed model generates composite maps, followed by matrix conversion from decimals to integers using a certain threshold limit. Following the improvement step, evaluations are conducted with and without NDVI, NDBI, and NDWI corrections for analysis. The composite map generation step uses an appropriate band to calculate the index. For this purpose, the segmented study area must be equipped with multispectral satellite imagery data, particularly in Bands 2 to 6. In summary, the proposed model performs three activities during the prediction process: land-cover segmentation, composite map generation (NDVI, NDBI, and NDWI), and correction of the segmentation results to obtain the final land-cover segmentation.

The NDVI, NDBI, and NDWI are generated using the corresponding spectral bands based on Eqs. (1)(3). The output is a composite image obtained from the fusion of the Landsat 8 spectral bands.

NDVI=NIR-REDNIR+RED,NDBI=SWIR-NIRSWIR+NIR,NDWI=GREEN-NIRGREEN+NIR.

The blue, green, red, near-infrared, and short-wave infrared bands are represented based on the Landsat 8 channels of Bands 2, 3, 4, 5, and 6. For other satellites (e.g., Sentinel, Terra, etc.), it is necessary to adjust the band numbering for the same sensor as that of Landsat 8.

3.3 Model Evaluation

After the proposed model produces the final land-cover segmentation and classification, the evaluation stage uses a confusion matrix that compares the prediction results with the ground truth data. The accuracy is calculated using Eq. (4):

Accuracy=TP+TNTP+TN+FP+FN,

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negatives, respectively. Several additional processes are required to perform correction on the initial land-cover segmentation and classification results with the three normalized satellite indices. As normalized satellite indices have a continuous range from −1 to +1, a process is required to convert the NDVI, NDBI, and NDWI composite images into the appropriate data format.

Algorithm 1 was applied to the three indices NDVI, NDBI, and NDWI, which produced three matrices with integer elements. The matrices represent class and non-class segments with values of 1 and 0, respectively. The algorithm allows for an increase or decrease in a class owing to the addition of another segment of the normalized satellite index. The hypothesis of this study is that the use of the NDVI, NDBI, and NDWI can increase the accuracy of identifying vegetation, urban areas, and water.

4. Experimental Results and Discussion

From the multispectral satellite imagery, each band was cropped according to the study area. The training, validation, and testing datasets (2425 × 2075 pixels) were converted into MAT-file before the training process. The training process was performed in two epochs (2, 000 iterations). Figure 5 shows the training performances of U-Net and DeepLabV3+, which were executed for approximately 8 hours.

MATLAB was used to perform the training processes for U-Net and DeepLabV3+ [33]. This process requires a powerful GPU with parallel processing capabilities in MATLAB, which can be achieved using four processors known as workers. During the training process, graphs were generated to evaluate the model performance. The models exhibited satisfactory accuracy after two epochs and 2, 000 iterations, as indicated by the generated graphs.

The results of the training process were saved for use in the testing and evaluation processes. The entire study area was used as the testing dataset. The performances of U-Net and DeepLabV3+ were compared with that of the proposed model, which implemented enhancements using the normalized satellite indices (NDVI, NDBI, and NDWI). As shown in Figure 6 and Table 1, the proposed model achieved an accuracy of 94.473% using U-Net, with a processing time of approximately 4 minutes.

Prior to the enhancement process, the normalized satellite indices, which range from −1 to +1, must be converted into integers 0 and 1. The optimal threshold value for the normalized satellite indices was determined for comparison with the proposed approach. Several threshold values were tested and a graph was generated, as shown in Figure 7. Threshold values that resulted in the highest accuracy were used for the proposed model. The threshold values for the NDVI, NDBI, and NDWI were 0.29, 0, and 0.03, respectively.

Another experiment was conducted by creating a graphical user interface (GUI) that compared the performance of DeepLabV3+ with the improved DeepLabV3+ models. Figure 7 illustrates that the incorporation of the NDVI, NDBI, and NDWI enhanced the accuracy of DeepLabV3+ by 0.221%, 0.023%, and 0.022%, respectively, compared with the original accuracy of DeepLabV3+ (95.183%). Furthermore, an even higher accuracy of 95.438% was achieved by combining the NDVI, NDBI, and NDWI.

The trained models can be applied to other areas surrounding the study site with similar geographical conditions (two seasons). However, if there is a change in the segment classes or ground truth data are available, the training process can be repeated. The U-Net and DeepLabV3+ models use existing Landsat 8 bands for the feature extraction; however, these bands do not provide information that is specific to the detection of certain segments such as vegetation, built-up areas, or water. Therefore, normalized satellite indices can be incorporated to improve the accuracy of these specific segments. However, owing to limitations in detecting certain features such as forests, agriculture, and wetlands, a DL model is still necessary. The proposed model is also useful for non-multispectral image segmentation, such that of as RGB images, which is usually faster but less accurate. By incorporating normalized satellite indices, the accuracy of the model can be improved with only three bands (red, green, and blue).

To demonstrate the performance of the proposed model, a prototype was created to perform the land-cover segmentation and classification processes (Figure 8). The prototype demonstrated the accuracy of DeepLabV3+ compared to the proposed model, which involved normalized satellite indices.

The experimental results indicated that DeepLabV3+ was slightly more accurate and faster than U-Net, as shown in Table 1. The number of parameters used in this study was based on the success of previous studies using pre-trained MATLAB models [34]. In general, the number of parameters in a DL model is correlated with the inference time; when a model has more parameters, it runs slower. The proposed model had better accuracy than DeepLabV3+ and U-Net when using both each normalized satellite index and a combination of all three indices (NDVI, NDBI, and NDWI).

The proposed approach, which integrates a normalized satellite index improvement module at the end of the DL model, is applicable to any segmentation model that uses multispectral satellite imagery. Furthermore, the experimental results indicated an increase in accuracy with the NDVI, NDBI, and NDWI, without a significant increase in the inference time. This technique can also be used for RGB images if Landsat 8 images from Bands 2 to 6 are accessible.

5. Conclusion

The implementation of DL in RS-GIS is currently being carried out by many researchers. Many new methods exhibit good performance, including DeepLabV3+, which is fast and accurate. Despite the potential of remote sensing techniques to enhance segmentation model performance, research using these methods remains limited. In this study, a normalized satellite index approach, which includes the NDVI, NDBI, and NDWI, was used to improve the DeepLabV3+ performance. The proposed model achieved a 0.3% increase in the accuracy of DeepLabV3+ in the research area, demonstrating its feasibility without compromising other performance aspects such as speed. Future studies will explore the use of other normalized satellite indices and modifications to the baseline methods to improve the accuracy. These studies will also focus on complex land-cover segments such as wetlands, barren land, agriculture, and forests.

Fig 1.

Figure 1.

Location of study area and datasets.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 2.

Figure 2.

Preprocessing satellite images into usable datasets.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 3.

Figure 3.

(a) U-Net and (b) DeepLabV3+.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 4.

Figure 4.

Proposed model framework.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 5.

Figure 5.

Training performance of U-Net and DeepLabV3+.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 6.

Figure 6.

U-Net model experiment.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 7.

Figure 7.

Optimal threshold for (a) NDVI, (b) NDBI, and (c) NDWI.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Fig 8.

Figure 8.

Comparison of DeepLabV3+ and proposed model. such as wetlands, barren land, agriculture, and forests.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 389-398https://doi.org/10.5391/IJFIS.2023.23.4.389

Table 1 . Performance of U-Net, DeepLabV3+, and proposed model.

Model#ParamsAccuracy (%)Size in memory (MB)Inference (s)
U-Net31.0 M94.473110.1439.08
DeepLabV3+20.6 M95.18448.420.77
Proposed model20.6 M95.43848.420.77

The results in bold text represent the best values..


Algorithm 1. Normalized satellite index improvement.

INPUT: Segmented image, normalized index
1:New Segmented = Segmented image result + normalized index
2:For all matrix elements in New Segmented
3: If there is a class segment change
4:  Change to appropriate class of segment
5: End
6:End
OUTPUT: New Segmented

References

  1. Wu, H, Liu, Q, and Liu, X (2019). A review on deep learning approaches to image classification and object segmentation. Computers, Materials and Continua. 60, 575-597. https://doi.org/10.32604/cmc.2019.03595
    CrossRef
  2. Anders, N, Seijmonsbergen, H, Bouten, W, and Smith, M. Optimizing object-based image analysis for semiautomated geomorphological mapping., Proceedings of Geomorphometry, Redlands, 2016, CA, USA, pp.117-120.
  3. Martini, BF, and Miller, DA (2021). Using object-based image analysis to detect laughing gull nests. GIScience and Remote Sensing. 58, 1497-1517. https://doi.org/10.1080/15481603.2021.1999376
    CrossRef
  4. Badrinarayanan, V, Kendall, A, and Cipolla, R (2017). SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39, 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
    Pubmed CrossRef
  5. Ronneberger, O, Fischer, P, and Brox, T (2015). U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham, Switzerland: Springer, pp. 234-241 https://doi.org/10.1007/978-3-319-24574-4_28
    CrossRef
  6. Zhao, H, Shi, J, Qi, X, Wang, X, and Jia, J . Pyramid scene parsing network., Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Honolulu, HI, USA, Array, pp.6230-6239. https://doi.org/10.1109/CVPR.2017.660
    CrossRef
  7. Chen, LC, Zhu, Y, Papandreou, G, Schrof, F, and Adam, H (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Computer Vision - ECCV 2018. Cham, Switzerland: Springer, pp. 833-851 https://doi.org/10.1007/978-3-030-01234-2_49
    CrossRef
  8. Pang, K, Weng, L, Zhang, Y, Liu, J, Lin, H, and Xia, M (2022). SGBNet: an ultra light-weight network for real-time semantic segmentation of land cover. International Journal of Remote Sensing. 43, 5917-5939. https://doi.org/10.1080/01431161.2021.2022805
    CrossRef
  9. Priyanka, SN, Lal, S, Nalini, J, Reddy, CS, and Dell’Acqua, F (2023). DPPNet: an efficient and robust deep learning network for land cover segmentation from high-resolution satellite images. IEEE Transactions on Emerging Topics in Computational Intelligence. 7, 127-139. https://doi.org/10.1109/TETCI.2022.3182414
    CrossRef
  10. Zhang, X, Wang, Z, Zhang, J, and Wei, A (2022). MSANet: an improved semantic segmentation method using multiscale attention for remote sensing images. Remote Sensing Letters. 13, 1249-1259. https://doi.org/10.1080/2150704X.2022.2142075
    CrossRef
  11. Li, Y, Zhou, Y, Zhang, Y, Zhong, L, Wang, J, and Chen, J (2022). DKDFN: domain knowledge-guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification. ISPRS Journal of Photogrammetry and Remote Sensing. 186, 170-189. https://doi.org/10.1016/j.isprsjprs.2022.02.013
    CrossRef
  12. Wei, H, Xu, X, Ou, N, Zhang, X, and Dai, Y (2021). DEANet: dual encoder with attention network for semantic segmentation of remote sensing imagery. Remote Sensing. 13, article no 3900. https://doi.org/10.3390/rs13193900
    CrossRef
  13. Wang, D, Yang, R, Liu, H, He, H, Tan, J, Li, S, Qiao, Y, Tang, K, and Wang, X (2022). HFENet: hierarchical feature extraction network for accurate landcover classification. Remote Sensing. 14, article no 4244. https://doi.org/10.3390/rs14174244
    CrossRef
  14. Wang, L, Li, R, Zhang, C, Fang, S, Duan, C, Meng, X, and Atkinson, PM (2022). UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing. 190, 196-214. https://doi.org/10.1016/j.isprsjprs.2022.06.008
    CrossRef
  15. Huang, J, Weng, L, Chen, B, and Xia, M (2021). DFFAN: Dual function feature aggregation network for semantic segmentation of land cover. ISPRS International Journal of Geo-Information. 10, article no 125. https://doi.org/10.3390/ijgi10030125
    CrossRef
  16. Li, X, Zhang, G, Cui, H, Hou, S, Wang, S, Li, X, Chen, Y, Li, Z, and Zhang, L (2022). MCANet: a joint semantic segmentation framework of optical and SAR images for land use classification. International Journal of Applied Earth Observation and Geoinformation. 106, article no 102638. https://doi.org/10.1016/j.jag.2021.102638
    CrossRef
  17. Gao, J, Weng, L, Xia, M, and Lin, H (2022). MLNet: multichannel feature fusion lozenge network for land segmentation. Journal of Applied Remote Sensing. 16, article no 016513. https://doi.org/10.1117/1.JRS.16.016513
    CrossRef
  18. Zhang, Z, Lu, W, Cao, J, and Xie, G (2022). MKANet: an efficient network with Sobel boundary loss for land-cover classification of satellite remote sensing imagery. Remote Sensing. 14, article no 4514. https://doi.org/10.3390/rs14184514
    CrossRef
  19. Liu, R, Tao, F, Liu, X, Na, J, Leng, H, Wu, J, and Zhou, T (2022). RAANet: a residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sensing. 14, article no 3109. https://doi.org/10.3390/rs14133109
    CrossRef
  20. Bhosle, K, and Musande, V (2019). Evaluation of deep learning CNN model for land use land cover classification and crop identification using hyperspectral remote sensing images. Journal of the Indian Society of Remote Sensing. 47, 1949-1958. https://doi.org/10.1007/s12524-019-01041-2
    CrossRef
  21. Bai, J, Wen, Z, Xiao, Z, Ye, F, Zhu, Y, Alazab, M, and Jiao, L (2022). Hyperspectral image classification based on multi-branch attention transformer networks. IEEE Transactions on Geoscience and Remote Sensing. 60, article no 5535317. https://doi.org/10.1109/TGRS.2022.3196661
    CrossRef
  22. Kemker, R, Salvaggio, C, and Kanan, C (2018). Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing. 145, 60-77. https://doi.org/10.1016/j.isprsjprs.2018.04.014
    CrossRef
  23. Helber, P, Bischke, B, Dengel, A, and Borth, D (2019). EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 12, 2217-2226. https://doi.org/10.1109/JSTARS.2019.2918242
    CrossRef
  24. Ramos, APM, Osco, LP, Furuya, DEG, Gonçalves, WN, Santana, DC, and Teodoro, LPR (2020). A random forest ranking approach to predict yield in maize with UAV-based vegetation spectral indices. Computers and Electronics in Agriculture. 178, article no 105791. https://doi.org/10.1016/j.compag.2020.105791
    CrossRef
  25. Selvaraj, MG, Vergara, A, Montenegro, F, Ruiz, HA, Safari, N, and Raymaekers, D (2020). Detection of banana plants and their major diseases through aerial images and machine learning methods: a case study in DR Congo and Republic of Benin. ISPRS Journal of Photogrammetry and Remote Sensing. 169, 110-124. https://doi.org/10.1016/j.isprsjprs.2020.08.025
    CrossRef
  26. Zhang, H, Ma, J, Chen, C, and Tian, X (2020). NDVI-Net: a fusion network for generating high-resolution normalized difference vegetation index in remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing. 168, 182-196. https://doi.org/10.1016/j.isprsjprs.2020.08.010
    CrossRef
  27. Scepanovic, S, Antropov, O, Laurila, P, Rauste, Y, Ignatenko, V, and Praks, J (2021). Wide-area land cover mapping with Sentinel-1 imagery using deep learning semantic segmentation models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 14, 10357-10374. https://doi.org/10.1109/JSTARS.2021.3116094
    CrossRef
  28. Zhu, Z, Wulder, MA, Roy, DP, Woodcock, CE, Hansen, MC, and Radeloff, VC (2019). Benefits of the free and open Landsat data policy. Remote Sensing of Environment. 224, 382-385. https://doi.org/10.1016/j.rse.2019.02.016
    CrossRef
  29. Giri, CP (2012). Remote Sensing of Land Use and Land Cover. Boca Raton, FL: CRC Press
  30. Kemker, R, Salvaggio, C, and Kanan, C. (2017) . High-resolution multispectral dataset for semantic segmentation. [Online]. Available: https://arxiv.org/abs/1703.01918
  31. Chen, LC, Papandreou, G, Kokkinos, I, Murphy, K, and Yuille, AL (2018). DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40, 834-848. https://doi.org/10.1109/TPAMI.2017.2699184
    Pubmed CrossRef
  32. Shah, A, Kadam, E, Shah, H, Shinde, S, and Shingade, S 2016. Deep residual networks with exponential linear unit., Proceedings of the 3rd International Symposium on Computer Vision and the Internet, 2016, Jaipur, India, Array, pp.59-65. https://doi.org/10.1145/2983402.2983406
    CrossRef
  33. MathWorks. (c2023) . Semantic segmentation of multispectral images using deep learning. [Online]. Available: https://www.mathworks.com/help/images/multispectral-semantic-segmentation-using-deep-learning.html
  34. MathWorks. (c2023) . Pretrained deep neural networks. [Online]. Available: https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html

Share this article on :

Related articles in IJFIS