International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 272-282
Published online December 25, 2019
https://doi.org/10.5391/IJFIS.2019.19.4.272
© The Korean Institute of Intelligent Systems
Eka Miranda1;2, Achmad Benny Mutiara3, Ernastuti3 and Wahyu Catur Wibowo4
1Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Doctoral Programs in Information Technology, Gunadarma University, Jakarta, Indonesia
3Faculty of Computer Science and Information Technology, Gunadarma University, Jakarta, Indonesia
4Faculty of Computer Science, Universitas Indonesia, Jakarta, Indonesia
Correspondence to :
Eka Miranda (ekamiranda@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The objective of this study is to develop a classification method based on convolutional neural network (CNN) and Sentinel-2 satellite imagery including the spectral feature, spectral index and spatial feature together as an input to answer forest monitoring problem. This research also used contextual information on Indonesia National Standard Agency’s document for Land cover classification as a baseline for feature extraction to get the appropriate classifier feature. The test set was located in Semarang, Central Java, Indonesia. The research workflow consists of defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image features from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. Image segmentation produced 1,211 segments/objects by using eCognition software. Subsequently, these objects were used as a dataset. Overall accuracy was used to evaluate the performance of the classification result. The result showed the classification method results in this study yielded high overall accuracy (97.66%) when using CNN with the image features like NDVI, Brightness, GLCM homogeneity and Rectangular fit. Small improvement of overall accuracy was also achieved when it was compared to GBT with an overall accuracy of 95.50%.
Keywords: Forest classification, Contextual information, Satellite imagery, Sentinel-2, Convolutional Neural Network.
Forest monitoring is an effort to maintain forest sustainability. Forest monitoring was accomplished by observing the existence of forest class, for example, primary forest, secondary forest or planting forest. The existence of each forest class supports the environmental sustainability and stability. Since the forests spread widely in the Land covers, a reliable monitoring technique is required to help the government to monitor the forest. Forest classification application requires very fine forest types mapping. A promising technique for the direct mapping of forest types is the usage of satellite imagery [1]. This technology enables faster data acquisition and wider data coverage compared with field observations [2].
Multispectral data on satellite imagery consist of several different bandwidths of image data. Using the band either one band or combined more than one band enables to identify various target types [3]. The widely used technique in remote sensing image to facilitate classification of land covers is defining of various indexes [3].
The prominent index that was used as the measure of natural space is the satellite-based Normalized Difference Vegetation Index (NDVI). This index was calculated using the different wavelengths of light that were absorbed by vegetation [4]. A study by [1] has successfully used Sentinel-2 satellite imagery for forest classification in Mediterranean environments using spectral index (NDVI) as a classifier attribute. The study achieved > 83% accuracy.
The development of remote sensing technology was followed by the development of digital image analysis and classification. The challenge in addressing this task is to duplicate human abilities in image understanding which purpose is the computer could recognize an object like a human. Deep learning algorithm has reached a massive rise in popularity for remote sensing image analysis over the past few years [5, 6]. Deep learning has been employed for satellite image analysis including object classification, object detection; land use and land cover (LULC) classification [6]. The deep learning algorithm that has been widely adopted is the Convolutional Neural Network (CNN). This algorithm was designed to imitate the image recognition system on the human visual cortex [5]. Research by [7] explained the CNN model was designed as a hierarchy of convolutional and pooling layers. The hierarchical structure of CNN makes it possible to learn the hierarchical feature of the input data. This research was used for visual object recognition and vision navigation for off-road mobile robots. Multispectral data has not been explored in this research. Another research by [8] used information from spectral bands, shape and color to classify land cover but this research has not explored the spectral index yet. The Ministry of Environment and Forestry in Indonesia recorded the total forest area in Indonesia reaches 125,922,474 hectares in 2018. In general, the forest area decreases compared to 128 million hectares in 2015. Every year, Indonesia loses forest about 684,000 hectares due to illegal logging, fires, and deforestation. The deforestation rate reaches 1.09 million hectares in 2014–2015 and reaches 0.63 million hectares in 2015–2016. Forest cover change is one strategic issue in Indonesia for forestry development. This issue also becomes a crucial problem for policy planning in Indonesia due to this issue has a direct impact on the forest resources [9]. Technology is required to support and answer the forest monitoring problem. One main problem in forest monitoring in Indonesia is the wide area and wide distribution area of forest. One potential technology that could be employed to address this problem is the usage of the classification method based on deep learning and satellite imagery data. The usage of satellite imagery data enables faster data acquisition and wider data coverage compared with field observation approach [2].
Several studies have proposed CNN for land cover classification, to the best of our knowledge; little has been said the use of spectral satellite image including spectral feature, spectral index and spatial features together as input feature. Therefore, the objective of this study was to develop a classification method based on CNN and Sentinel-2 satellite imagery to overcome forest monitoring problem in Indonesia (the wide area and wide distribution area of the forest). This research included the spectral feature, spectral index and spatial feature together as input feature. This research also used contextual information on Indonesia National Standard Agency’s document for Land cover classification as a baseline for feature extraction to get the appropriate classifier feature. The objective of this research was defined based on potential work from a summary of related work in Table 1. This research also compared the result with another classifier model namely gradient boosted tree. Gradient boosted tree was chosen as a comparison result due to these two techniques, CNN and gradient tree boosting are powerful techniques that can model any kind of relationship in the data [10]. This paper is organized as follows, Section 2 introduces terminology and describes related work used throughout this paper, Section 3 proposes experimental method, Section 4 explains the experimental results and evaluation results, and Section 5 presents the conclusion.
CNN refers to a neural network model which has been particularly used to process grid-structured data namely two-dimensional image. CNN is a development of multilayer perceptron (MLP) and it is designed to process two-dimensional data in the image form [2]. CNN is one of the most broadly used deep learning models. This model is suitable to process multiband remote-sensing image data where pixels are organized regularly [11]. CNN consists of three different types of hierarchical structures, namely convolution layers, pooling layers, and fully connected layers. On each layer, the input image is convolved with a set of K kernels
CNN architecture was designed through two stages. First, define a kernel of a certain size. The number of kernels used depends on the number of features produced. Second, define the activation function; this function usually uses the ReLU activation function (Linear Unit Rectifier). It is the default activation function for several types of neural networks [5]. Subsequently, after passing the activation function, the process goes through the pooling process. This process was repeated several times until an appropriate feature map obtained to continue to the fully connected neural network [5]. The CNN architecture for image classification is shown in Figure 1. The advantage of using CNN to classify satellite imagery has gained wide research interest. A study reported by [13] explained a method utilizing the advantage of deep CNN framework, namely automatically extract robust and discriminative features to classify complex urban objects (such as building roofs and cars). However, the study has not extracted the contextual information of the class at the global level. Research by [14] presented the beneficial use of CNN to reach the high accuracy of 85% for major crops classification (wheat, maize, sunflower, soybeans, and sugar beet). This study became a foundation for further use of remote sensing data within Sentinel-1A imagery in Ukraine. Research by [15] demonstrated the advantage of CNN for feature extraction and image classification of land use from geolocated level satellite imagery. This research achieved 76% accuracy. The exploration of feature extraction and selection could be employed to increase the accuracy. Summary of related work is shown in Table 1.
NDVI is a well-known vegetation index and the most frequently used for land cover identification. NDVI value was obtained from vegetable canopies based on remote sensing imagery. The index is a quite simple and effective formula to evaluate the vegetation class of land covers quantitatively and qualitatively [16]. Research by [17] revealed the appearance of green vegetation land cover could be indicated by the NDVI. Another research by [18] explained NDVI was often used as an index to evaluate vegetation productivity based on leaf index and vegetation photosynthesis. NDVI was calculated using the value of the wavelength from light that was absorbed by vegetation. NDVI has a range of values from −1 to 1. NDVI value for the vegetation class is shown in Table 2. NDVI formula is shown in
B08 = band 8 (NIR (near infrared)) on Sentinel-2
B04 = band 4 (red) on Sentinel-2
The workflow of this study is organized as follow: defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image feature from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. The workflow is described in Sections 3.2, 3.3 and 3.4, respectively. Schema of the methodology in this study is shown in Figure 2.
In addition, as a comparison to the CNN result, this research also trained dataset using gradient boosted tree.
The test set was located in Semarang area, Central Java, Indonesia (about 373.67 km2). Forests in Semarang area consist of primary forest, secondary forest and planting forest. Forest was managed by the state (about 12,271.50 hectares) and by the citizen (about 15,613.10 hectare) [9]. This research utilized data from The Sentinel-2 satellite imagery through the web site with URL
Main forests types in Semarang area consist of three classes; those are primary dry forest, secondary dry forest and plantation forest. These classes were chosen due to they are the productive land for human life. This study extracted the characteristic of each forest class based on contextual information from The Indonesia National Standard Agency for Land Cover Classification. Contextual information for each class is described based on the optical feature of imagery interpretation [19, 20]. The characteristic would be explained as follows:
a. Primary dry forest. Natural forests or forests are growing and developing naturally, and have never been exploited by humans before. The forest floors have never been submerged either periodically or throughout the year. The primary dry forest is characterized by the presence of dark green objects tends to dark and coarse texture with a clustered tree canopy. There are no logged marks.
b. Secondary dry forest. It is forest that is growing naturally after destruction or modification or exploited by a human from the primary forest plant before. This forest is usually characterized by the appearance of road networks or other exploitation system networks. In addition, the secondary dry forest is characterized by the presence of dark green objects tending to dark and coarse texture with a clustered tree canopy. There are logged marks.
c. Plantation forest. The plantation forest is planted in order to increase the potential and quality of forests production, also planted for reforestation and industrial plantations. The plantation forest is characterized by the appearance of a green object, neatly arranged and has a certain pattern.
The explanation from the characteristics of each forest class subsequently produced the optical features of imagery interpretation. The features are color, hue (such as dark), texture (such as coarse) and shape (such as neatly arranged/regular).
Feature extraction is the process of collecting discriminative information from a set of image samples. Features extraction and segmentation of satellite imagery were executed using eCognition software [21]. Polygon-based segmentation was chosen for image segmentation technique. This technique divided image into a number of segments/objects based on pixel similarity characteristic. Furthermore, the multi-resolution segmentation technique was applied to sequentially combine pixels containing similar values into an object. Image features were extracted from each segment to produce the associated land cover class. These features were selected based on the definition and the characteristic of each class [19, 20]. The extracted features on eCognition were mapped from the optical features of imagery interpretation on the Indonesia National Standard Agency for Land Cover Classification. The corresponding features are shown in Table 3.
The forest classification process is shown in Figure 4. The process contains four main steps, namely: loading data from hard disk, splitting data into training and testing dataset, training CNN, and evaluating the result. K-fold cross-validation (K is a constant) is a method to evaluate the performance of a classifier model by splitting the input dataset randomly into K partition [2]. This research used 10-fold (the common technique to split data). The main layer of CNN architecture was used in this study. The architecture has the following parameter: Conv1D (filters 10, kernel size 2, activation relu, input shape (1, 4), padding ’same’)) filters = 10, number of output filter in a convolutional; kernel size = 2, the length of convolutional window; activation = ’relu’. Activation functions for ReLu. ReLu does not activate negative input; input shape = (1, 4), the shape of input vector; padding = ’same’, produces padding input, therefore output has the same length with the original input; epochs = 5,000, the number of times the entire dataset is passed through a neural network.
Image segmentation process produced 1,211 segments/objects using eCognition software. Furthermore, eCognition processed all segments to produce an image feature that represents each segment. A semi-supervised process was used to set out a label for each segment on training dataset by comparing the objects against the factual image that was used in Figure 3. Subsequently, CNN classifier was used to train the training dataset. Accuracy assessments based on the confusion matrix were carried out to measure the classification result.
Training and testing accuracies were calculated based on the confusion matrix. A confusion matrix is a technique for summarizing the performance of a classification algorithm. Overall accuracy was calculated by dividing the number of correct predictions with total predictions [22]. Producer’s accuracy measures the accuracy of the map makers (producer) view. This value could be interpreted as how often the real features in the field were correctly displayed on the classification map or probability of the certain land cover could be classified correctly. Producer accuracy was computed from the amount of reference data classified accurately divided by the total number of reference data for a class. User accuracy measures the accuracy of the users view. User accuracy (reliability) that measures the proportion of the class on the map was actually present in the field (how often the class on the map will actually be present on the ground). User accuracy was computed by taking the total number of correct classifications for a particular class divided by the total value of predictive data for a class. Figures 5 and 6 shows the confusion matrix for the CNN training data and testing data respectively. Table 4 shows the accuracies for CNN.
Figure 6 shows the highest false classification was produced by secondary dry forest against the primary dry forest. This false value was produced due to contextual information and NDVI value of the secondary dry forest and primary dry forest are quite similar.
In addition, as a comparison to the CNN result, training dataset was also trained using gradient boosted tree. Figures 7 and 8 shows the confusion matrix table for the gradient boosted tree training data and testing data respectively. Table 5 shows the accuracies for gradient boosted tree.
The producer’s accuracy (testing data) of our method for all forest types are higher than those based on the gradient boosted tree. This result revealed that there was a high probability for the certain land cover (primary dry forest, secondary dry forest, and plantation forest) to be classified correctly in this research and the real features in the field were almost able to be correctly displayed on the classification of this research. The user’s accuracy (testing data) of our method for all forest types was higher than the gradient boosted tree as well. This result revealed almost all classes on the classification map would actually be present on the ground. The classification results yielded high overall accuracy when using CNN with the image features extracted from contextual information on Indonesia National Standard Agency’s document for Land cover classification (containing NDVI, Brightness, GLCM homogeneity and Rectangular fit). The CNN classification result yielded small improvement of overall accuracy compared to gradient boosted tree (see Figure 9).
The result of this research is in accordance with the previous study [6] about the potential use of deep learning (CNN) for remote sensing image analysis tasks including scene classification, object detection; LULC classification. This research also has made an improvement based on potential work from a summary of related work in Table 1 by using spectral satellite image including spectral feature, spectral index and spatial features together as input feature and using contextual information from thematic map standard published by the local government (Indonesia National Standard Agency document for Land cover classification) as a baseline for feature extraction to get the appropriate classifier feature and to improve accuracy. This research also yielded small accuracy improvement against the previous study reported by [1]. A study by [1] was chosen as reference research for this study due to the similarity of the land cover class used. The result of this research against reference research [1] is shown in Table 6.
Table 6 shows this research yielded small accuracy improvement compared to reference research due to the use of appropriate classifier features from feature extraction based on the contextual information on Indonesia National Standard Agency’s document for Land cover classification and the use of appropriate classification method (CNN) for forest classification from satellite imagery.
The research has put forward the deep learning model (CNN) for forest classification of satellite imagery. A detailed workflow has been introduced that includes four major steps: defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image feature from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. Overall accuracy was used to evaluate the feasibility of the classification method. This research produced a method to classify forest type based on CNN and Sentinel-2 satellite imagery. This method could be used as alternative tools to support and answer the forest monitoring problem (the wide area and wide distribution area of the forest). This method has already classified forest types with high overall accuracy and also yielded small accuracy improvement compared to gradient boosted tree. The methodology of this research is knowledge-driven and should be shared among experts for future improvement. The particular challenges and future works which can be explored from this research are determining the region of interest and determining classifier features from the diversity of forest type. Technically, the method in this research could be adapted to train forest classification model at any location as long as Sentinel-2 satellite imagery for the location of interest and thematic map standard that is published by the local government available to determine the forest class.
No potential conflict of interest relevant to this article was reported.
We appreciatively admit the support from Research and Technology Transfer office (RTTO), Bina Nusantara University which helped the funding for this paper and Dr. Robby Kurniawan Harahap, S.Kom., M.T. who helped the writing process. Our appreciation is also addressed to Christina Eli Indriyani, M.Hum. who helped to proofread the article.
False color image fusion result of the Sentinel-2 satellite imagery for Semarang, Central Java, Indonesia.
Table 1. Related works.
Research | Finding | Potential work | This research |
---|---|---|---|
[7] | The hierarchical structure of CNN makes it possible to learn the hierarchical feature of the input data (visual object recognition and vision navigation for off-road mobile robots). | Multispectral data has not explored in this research. | Explore multispectral feature: color (NDVI), hue (brightness), texture (GLCM), shape (Rectangular-fit). |
[8] | Used information from spectral bands, shape and color to classify land cover. | Spectral index has not explored yet. | Explore spectral index: NDVI |
[13] | Utilizing the advantage of deep CNN framework (automatically extract robust and discriminative features) to classify complex urban objects (building roofs and cars). | The study has not extracted the contextual information of the class at the global level. | Using contextual information on Indonesia NationalStandard Agency’s document for Land cover classification as a base to extract the classifier features of the class. |
[14] | CNN for major crops classification (wheat, maize, sunflower, soybeans, and sugar beet). The accuracy of the classification result is 85%. This study became a foundation for further use of remote sensing data within Sentinel-1A imagery in Ukraine. | Exploration of CNN and Sentinel data in Indonesia area. | Explore CNN for forest classification from Sentinel-2 satellite imagery in Indonesia area. |
[15] | CNN for feature extraction and image classification of land use from geo-located level satellite imagery. The accuracy of the classification result is 76%. | The exploration of feature extraction and selection could be employed to increase the accuracy. | Feature extraction and feature selection based on contextual information (Indonesia National Standard Agency’s document) to get appropriate classifier features and to increase accuracy. |
Table 2. NDVI for vegetation class [4, 16].
Vegetation class | NDVI |
---|---|
Dense vegetation such as tropical forests or tropical plants at the peak stage of growth. | 0.6 – 0.9 |
High-density vegetation area (such as forests) | > 0.6 |
Rare vegetation such as shrubs and grasslands or ageing plants | 0.2 – 0.4 |
Grassland | 0 – 0.2 |
Bare land | 0 – 0.05 |
Water body or wetlands | <= 0 |
Table 3. The Indonesia National Standard Agency for Land Cover Classification feature and eCognition feature.
Corresponding feature | ||
---|---|---|
The Indonesia National Standard Agency for land cover classification feature | eCognition feature | Description [16, 21] |
Color | NDVI | This most known vegetation index is simple, but effective for quantifying green vegetation. NDVI quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs). |
Hue | Brightness | Brightness is a response to the amount of brightness produced by the eye. Brightness is the average value of grey from all satellite bands for each pixel. The brightness index shows the intensity of the spectral reflection of an image. |
Texture | GLCM homogeneity | Homogeneity GLCM is a pixel texture character profile that recognizes smooth and rough texture. |
Shape | Rectangular fit | Rectangular fit describes how well an object image forms a rectangle with the same size and proportion. The value is 0, which indicates the object image does not fit into the rectangular shape and value 1 indicates the object fits into the rectangular shape. |
Table 4. Accuracies using CNN.
CNN training data | CNN testing data | |||
---|---|---|---|---|
Producer’s accuracy | User’s accuracy | Producer’s accuracy | User’s accuracy | |
Primary dry forest | 0.9839 | 0.9745 | 0.9804 | 0.9709 |
Secondary dry forest | 0.9671 | 0.9735 | 0.9709 | 0.9615 |
Plantation forest | 0.9899 | 0.9932 | 0.9787 | 1.0000 |
Overall accuracy | 0.9803 | 0.9766 |
Table 5. Accuracies of gradient boosted tree (GBT).
GBT training | GBT testing | |||
---|---|---|---|---|
Producer’s accuracy | User’s accuracy | Producer’s accuracy | User’s accuracy | |
Primary dry forest | 0.9775 | 0.9806 | 0.9709 | 0.9524 |
Secondary dry forest | 0.9610 | 0.9610 | 0.9111 | 0.9425 |
Plantation forest | 0.9835 | 0.9803 | 0.9792 | 0.9691 |
Overall accuracy | 0.9740 | 0.9550 |
Table 6. This research against previous study/reference research.
Previous study / reference research [1] | This research |
---|---|
- Satellite imagery; Sentinel-2. - Land cover class: Forest. - Study area: Mediterranean environments. - Classification method: Random forest. - Parameter: spectral index (NDVI). - Accuracy > 83%. | - Satellite imagery; Sentinel-2. - Land cover class: Forest. - Study area: Semarang area, Indonesia. - Classification method: CNN, GBT. - Parameter: NDVI, brightness, GLCM-homogeneity, Rectangular-fit. - Feature extraction baseline: contextual information on Indonesia National Standard Agency’s document for Land cover classification. - Accuracy. CNN = 97.66%. GBT = 95.50%. |
E-mail: ekamiranda@binus.ac.id
Email: amutiara@staff.gunadarma.ac.id
Email: ernas@staff.gunadarma.ac.id
Email: wibowo@cs.ui.ac.id
International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 272-282
Published online December 25, 2019 https://doi.org/10.5391/IJFIS.2019.19.4.272
Copyright © The Korean Institute of Intelligent Systems.
Eka Miranda1;2, Achmad Benny Mutiara3, Ernastuti3 and Wahyu Catur Wibowo4
1Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Doctoral Programs in Information Technology, Gunadarma University, Jakarta, Indonesia
3Faculty of Computer Science and Information Technology, Gunadarma University, Jakarta, Indonesia
4Faculty of Computer Science, Universitas Indonesia, Jakarta, Indonesia
Correspondence to:Eka Miranda (ekamiranda@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The objective of this study is to develop a classification method based on convolutional neural network (CNN) and Sentinel-2 satellite imagery including the spectral feature, spectral index and spatial feature together as an input to answer forest monitoring problem. This research also used contextual information on Indonesia National Standard Agency’s document for Land cover classification as a baseline for feature extraction to get the appropriate classifier feature. The test set was located in Semarang, Central Java, Indonesia. The research workflow consists of defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image features from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. Image segmentation produced 1,211 segments/objects by using eCognition software. Subsequently, these objects were used as a dataset. Overall accuracy was used to evaluate the performance of the classification result. The result showed the classification method results in this study yielded high overall accuracy (97.66%) when using CNN with the image features like NDVI, Brightness, GLCM homogeneity and Rectangular fit. Small improvement of overall accuracy was also achieved when it was compared to GBT with an overall accuracy of 95.50%.
Keywords: Forest classification, Contextual information, Satellite imagery, Sentinel-2, Convolutional Neural Network.
Forest monitoring is an effort to maintain forest sustainability. Forest monitoring was accomplished by observing the existence of forest class, for example, primary forest, secondary forest or planting forest. The existence of each forest class supports the environmental sustainability and stability. Since the forests spread widely in the Land covers, a reliable monitoring technique is required to help the government to monitor the forest. Forest classification application requires very fine forest types mapping. A promising technique for the direct mapping of forest types is the usage of satellite imagery [1]. This technology enables faster data acquisition and wider data coverage compared with field observations [2].
Multispectral data on satellite imagery consist of several different bandwidths of image data. Using the band either one band or combined more than one band enables to identify various target types [3]. The widely used technique in remote sensing image to facilitate classification of land covers is defining of various indexes [3].
The prominent index that was used as the measure of natural space is the satellite-based Normalized Difference Vegetation Index (NDVI). This index was calculated using the different wavelengths of light that were absorbed by vegetation [4]. A study by [1] has successfully used Sentinel-2 satellite imagery for forest classification in Mediterranean environments using spectral index (NDVI) as a classifier attribute. The study achieved > 83% accuracy.
The development of remote sensing technology was followed by the development of digital image analysis and classification. The challenge in addressing this task is to duplicate human abilities in image understanding which purpose is the computer could recognize an object like a human. Deep learning algorithm has reached a massive rise in popularity for remote sensing image analysis over the past few years [5, 6]. Deep learning has been employed for satellite image analysis including object classification, object detection; land use and land cover (LULC) classification [6]. The deep learning algorithm that has been widely adopted is the Convolutional Neural Network (CNN). This algorithm was designed to imitate the image recognition system on the human visual cortex [5]. Research by [7] explained the CNN model was designed as a hierarchy of convolutional and pooling layers. The hierarchical structure of CNN makes it possible to learn the hierarchical feature of the input data. This research was used for visual object recognition and vision navigation for off-road mobile robots. Multispectral data has not been explored in this research. Another research by [8] used information from spectral bands, shape and color to classify land cover but this research has not explored the spectral index yet. The Ministry of Environment and Forestry in Indonesia recorded the total forest area in Indonesia reaches 125,922,474 hectares in 2018. In general, the forest area decreases compared to 128 million hectares in 2015. Every year, Indonesia loses forest about 684,000 hectares due to illegal logging, fires, and deforestation. The deforestation rate reaches 1.09 million hectares in 2014–2015 and reaches 0.63 million hectares in 2015–2016. Forest cover change is one strategic issue in Indonesia for forestry development. This issue also becomes a crucial problem for policy planning in Indonesia due to this issue has a direct impact on the forest resources [9]. Technology is required to support and answer the forest monitoring problem. One main problem in forest monitoring in Indonesia is the wide area and wide distribution area of forest. One potential technology that could be employed to address this problem is the usage of the classification method based on deep learning and satellite imagery data. The usage of satellite imagery data enables faster data acquisition and wider data coverage compared with field observation approach [2].
Several studies have proposed CNN for land cover classification, to the best of our knowledge; little has been said the use of spectral satellite image including spectral feature, spectral index and spatial features together as input feature. Therefore, the objective of this study was to develop a classification method based on CNN and Sentinel-2 satellite imagery to overcome forest monitoring problem in Indonesia (the wide area and wide distribution area of the forest). This research included the spectral feature, spectral index and spatial feature together as input feature. This research also used contextual information on Indonesia National Standard Agency’s document for Land cover classification as a baseline for feature extraction to get the appropriate classifier feature. The objective of this research was defined based on potential work from a summary of related work in Table 1. This research also compared the result with another classifier model namely gradient boosted tree. Gradient boosted tree was chosen as a comparison result due to these two techniques, CNN and gradient tree boosting are powerful techniques that can model any kind of relationship in the data [10]. This paper is organized as follows, Section 2 introduces terminology and describes related work used throughout this paper, Section 3 proposes experimental method, Section 4 explains the experimental results and evaluation results, and Section 5 presents the conclusion.
CNN refers to a neural network model which has been particularly used to process grid-structured data namely two-dimensional image. CNN is a development of multilayer perceptron (MLP) and it is designed to process two-dimensional data in the image form [2]. CNN is one of the most broadly used deep learning models. This model is suitable to process multiband remote-sensing image data where pixels are organized regularly [11]. CNN consists of three different types of hierarchical structures, namely convolution layers, pooling layers, and fully connected layers. On each layer, the input image is convolved with a set of K kernels
CNN architecture was designed through two stages. First, define a kernel of a certain size. The number of kernels used depends on the number of features produced. Second, define the activation function; this function usually uses the ReLU activation function (Linear Unit Rectifier). It is the default activation function for several types of neural networks [5]. Subsequently, after passing the activation function, the process goes through the pooling process. This process was repeated several times until an appropriate feature map obtained to continue to the fully connected neural network [5]. The CNN architecture for image classification is shown in Figure 1. The advantage of using CNN to classify satellite imagery has gained wide research interest. A study reported by [13] explained a method utilizing the advantage of deep CNN framework, namely automatically extract robust and discriminative features to classify complex urban objects (such as building roofs and cars). However, the study has not extracted the contextual information of the class at the global level. Research by [14] presented the beneficial use of CNN to reach the high accuracy of 85% for major crops classification (wheat, maize, sunflower, soybeans, and sugar beet). This study became a foundation for further use of remote sensing data within Sentinel-1A imagery in Ukraine. Research by [15] demonstrated the advantage of CNN for feature extraction and image classification of land use from geolocated level satellite imagery. This research achieved 76% accuracy. The exploration of feature extraction and selection could be employed to increase the accuracy. Summary of related work is shown in Table 1.
NDVI is a well-known vegetation index and the most frequently used for land cover identification. NDVI value was obtained from vegetable canopies based on remote sensing imagery. The index is a quite simple and effective formula to evaluate the vegetation class of land covers quantitatively and qualitatively [16]. Research by [17] revealed the appearance of green vegetation land cover could be indicated by the NDVI. Another research by [18] explained NDVI was often used as an index to evaluate vegetation productivity based on leaf index and vegetation photosynthesis. NDVI was calculated using the value of the wavelength from light that was absorbed by vegetation. NDVI has a range of values from −1 to 1. NDVI value for the vegetation class is shown in Table 2. NDVI formula is shown in
B08 = band 8 (NIR (near infrared)) on Sentinel-2
B04 = band 4 (red) on Sentinel-2
The workflow of this study is organized as follow: defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image feature from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. The workflow is described in Sections 3.2, 3.3 and 3.4, respectively. Schema of the methodology in this study is shown in Figure 2.
In addition, as a comparison to the CNN result, this research also trained dataset using gradient boosted tree.
The test set was located in Semarang area, Central Java, Indonesia (about 373.67 km2). Forests in Semarang area consist of primary forest, secondary forest and planting forest. Forest was managed by the state (about 12,271.50 hectares) and by the citizen (about 15,613.10 hectare) [9]. This research utilized data from The Sentinel-2 satellite imagery through the web site with URL
Main forests types in Semarang area consist of three classes; those are primary dry forest, secondary dry forest and plantation forest. These classes were chosen due to they are the productive land for human life. This study extracted the characteristic of each forest class based on contextual information from The Indonesia National Standard Agency for Land Cover Classification. Contextual information for each class is described based on the optical feature of imagery interpretation [19, 20]. The characteristic would be explained as follows:
a. Primary dry forest. Natural forests or forests are growing and developing naturally, and have never been exploited by humans before. The forest floors have never been submerged either periodically or throughout the year. The primary dry forest is characterized by the presence of dark green objects tends to dark and coarse texture with a clustered tree canopy. There are no logged marks.
b. Secondary dry forest. It is forest that is growing naturally after destruction or modification or exploited by a human from the primary forest plant before. This forest is usually characterized by the appearance of road networks or other exploitation system networks. In addition, the secondary dry forest is characterized by the presence of dark green objects tending to dark and coarse texture with a clustered tree canopy. There are logged marks.
c. Plantation forest. The plantation forest is planted in order to increase the potential and quality of forests production, also planted for reforestation and industrial plantations. The plantation forest is characterized by the appearance of a green object, neatly arranged and has a certain pattern.
The explanation from the characteristics of each forest class subsequently produced the optical features of imagery interpretation. The features are color, hue (such as dark), texture (such as coarse) and shape (such as neatly arranged/regular).
Feature extraction is the process of collecting discriminative information from a set of image samples. Features extraction and segmentation of satellite imagery were executed using eCognition software [21]. Polygon-based segmentation was chosen for image segmentation technique. This technique divided image into a number of segments/objects based on pixel similarity characteristic. Furthermore, the multi-resolution segmentation technique was applied to sequentially combine pixels containing similar values into an object. Image features were extracted from each segment to produce the associated land cover class. These features were selected based on the definition and the characteristic of each class [19, 20]. The extracted features on eCognition were mapped from the optical features of imagery interpretation on the Indonesia National Standard Agency for Land Cover Classification. The corresponding features are shown in Table 3.
The forest classification process is shown in Figure 4. The process contains four main steps, namely: loading data from hard disk, splitting data into training and testing dataset, training CNN, and evaluating the result. K-fold cross-validation (K is a constant) is a method to evaluate the performance of a classifier model by splitting the input dataset randomly into K partition [2]. This research used 10-fold (the common technique to split data). The main layer of CNN architecture was used in this study. The architecture has the following parameter: Conv1D (filters 10, kernel size 2, activation relu, input shape (1, 4), padding ’same’)) filters = 10, number of output filter in a convolutional; kernel size = 2, the length of convolutional window; activation = ’relu’. Activation functions for ReLu. ReLu does not activate negative input; input shape = (1, 4), the shape of input vector; padding = ’same’, produces padding input, therefore output has the same length with the original input; epochs = 5,000, the number of times the entire dataset is passed through a neural network.
Image segmentation process produced 1,211 segments/objects using eCognition software. Furthermore, eCognition processed all segments to produce an image feature that represents each segment. A semi-supervised process was used to set out a label for each segment on training dataset by comparing the objects against the factual image that was used in Figure 3. Subsequently, CNN classifier was used to train the training dataset. Accuracy assessments based on the confusion matrix were carried out to measure the classification result.
Training and testing accuracies were calculated based on the confusion matrix. A confusion matrix is a technique for summarizing the performance of a classification algorithm. Overall accuracy was calculated by dividing the number of correct predictions with total predictions [22]. Producer’s accuracy measures the accuracy of the map makers (producer) view. This value could be interpreted as how often the real features in the field were correctly displayed on the classification map or probability of the certain land cover could be classified correctly. Producer accuracy was computed from the amount of reference data classified accurately divided by the total number of reference data for a class. User accuracy measures the accuracy of the users view. User accuracy (reliability) that measures the proportion of the class on the map was actually present in the field (how often the class on the map will actually be present on the ground). User accuracy was computed by taking the total number of correct classifications for a particular class divided by the total value of predictive data for a class. Figures 5 and 6 shows the confusion matrix for the CNN training data and testing data respectively. Table 4 shows the accuracies for CNN.
Figure 6 shows the highest false classification was produced by secondary dry forest against the primary dry forest. This false value was produced due to contextual information and NDVI value of the secondary dry forest and primary dry forest are quite similar.
In addition, as a comparison to the CNN result, training dataset was also trained using gradient boosted tree. Figures 7 and 8 shows the confusion matrix table for the gradient boosted tree training data and testing data respectively. Table 5 shows the accuracies for gradient boosted tree.
The producer’s accuracy (testing data) of our method for all forest types are higher than those based on the gradient boosted tree. This result revealed that there was a high probability for the certain land cover (primary dry forest, secondary dry forest, and plantation forest) to be classified correctly in this research and the real features in the field were almost able to be correctly displayed on the classification of this research. The user’s accuracy (testing data) of our method for all forest types was higher than the gradient boosted tree as well. This result revealed almost all classes on the classification map would actually be present on the ground. The classification results yielded high overall accuracy when using CNN with the image features extracted from contextual information on Indonesia National Standard Agency’s document for Land cover classification (containing NDVI, Brightness, GLCM homogeneity and Rectangular fit). The CNN classification result yielded small improvement of overall accuracy compared to gradient boosted tree (see Figure 9).
The result of this research is in accordance with the previous study [6] about the potential use of deep learning (CNN) for remote sensing image analysis tasks including scene classification, object detection; LULC classification. This research also has made an improvement based on potential work from a summary of related work in Table 1 by using spectral satellite image including spectral feature, spectral index and spatial features together as input feature and using contextual information from thematic map standard published by the local government (Indonesia National Standard Agency document for Land cover classification) as a baseline for feature extraction to get the appropriate classifier feature and to improve accuracy. This research also yielded small accuracy improvement against the previous study reported by [1]. A study by [1] was chosen as reference research for this study due to the similarity of the land cover class used. The result of this research against reference research [1] is shown in Table 6.
Table 6 shows this research yielded small accuracy improvement compared to reference research due to the use of appropriate classifier features from feature extraction based on the contextual information on Indonesia National Standard Agency’s document for Land cover classification and the use of appropriate classification method (CNN) for forest classification from satellite imagery.
The research has put forward the deep learning model (CNN) for forest classification of satellite imagery. A detailed workflow has been introduced that includes four major steps: defining forest class based on Indonesia National Standard Agency for Land cover classification, extracting optical image features based on contextual information of the forest class definition, extracting image feature from the Sentinel-2 satellite image, and classifying image object features using CNN classifier. Overall accuracy was used to evaluate the feasibility of the classification method. This research produced a method to classify forest type based on CNN and Sentinel-2 satellite imagery. This method could be used as alternative tools to support and answer the forest monitoring problem (the wide area and wide distribution area of the forest). This method has already classified forest types with high overall accuracy and also yielded small accuracy improvement compared to gradient boosted tree. The methodology of this research is knowledge-driven and should be shared among experts for future improvement. The particular challenges and future works which can be explored from this research are determining the region of interest and determining classifier features from the diversity of forest type. Technically, the method in this research could be adapted to train forest classification model at any location as long as Sentinel-2 satellite imagery for the location of interest and thematic map standard that is published by the local government available to determine the forest class.
No potential conflict of interest relevant to this article was reported.
We appreciatively admit the support from Research and Technology Transfer office (RTTO), Bina Nusantara University which helped the funding for this paper and Dr. Robby Kurniawan Harahap, S.Kom., M.T. who helped the writing process. Our appreciation is also addressed to Christina Eli Indriyani, M.Hum. who helped to proofread the article.
CNN architecture for classification process.
Schema of the present study methodology.
False color image fusion result of the Sentinel-2 satellite imagery for Semarang, Central Java, Indonesia.
Training process of the land cover classification.
Confusion matrixes: CNN training data.
Confusion matrixes: CNN testing data.
Confusion matrixes: gradient boosted tree training data.
Confusion matrixes: gradient boosted tree testing data.
Testing accuracy CNN against gradient boosted tree.
Table 1 . Related works.
Research | Finding | Potential work | This research |
---|---|---|---|
[7] | The hierarchical structure of CNN makes it possible to learn the hierarchical feature of the input data (visual object recognition and vision navigation for off-road mobile robots). | Multispectral data has not explored in this research. | Explore multispectral feature: color (NDVI), hue (brightness), texture (GLCM), shape (Rectangular-fit). |
[8] | Used information from spectral bands, shape and color to classify land cover. | Spectral index has not explored yet. | Explore spectral index: NDVI |
[13] | Utilizing the advantage of deep CNN framework (automatically extract robust and discriminative features) to classify complex urban objects (building roofs and cars). | The study has not extracted the contextual information of the class at the global level. | Using contextual information on Indonesia NationalStandard Agency’s document for Land cover classification as a base to extract the classifier features of the class. |
[14] | CNN for major crops classification (wheat, maize, sunflower, soybeans, and sugar beet). The accuracy of the classification result is 85%. This study became a foundation for further use of remote sensing data within Sentinel-1A imagery in Ukraine. | Exploration of CNN and Sentinel data in Indonesia area. | Explore CNN for forest classification from Sentinel-2 satellite imagery in Indonesia area. |
[15] | CNN for feature extraction and image classification of land use from geo-located level satellite imagery. The accuracy of the classification result is 76%. | The exploration of feature extraction and selection could be employed to increase the accuracy. | Feature extraction and feature selection based on contextual information (Indonesia National Standard Agency’s document) to get appropriate classifier features and to increase accuracy. |
Table 2 . NDVI for vegetation class [4, 16].
Vegetation class | NDVI |
---|---|
Dense vegetation such as tropical forests or tropical plants at the peak stage of growth. | 0.6 – 0.9 |
High-density vegetation area (such as forests) | > 0.6 |
Rare vegetation such as shrubs and grasslands or ageing plants | 0.2 – 0.4 |
Grassland | 0 – 0.2 |
Bare land | 0 – 0.05 |
Water body or wetlands | <= 0 |
Table 3 . The Indonesia National Standard Agency for Land Cover Classification feature and eCognition feature.
Corresponding feature | ||
---|---|---|
The Indonesia National Standard Agency for land cover classification feature | eCognition feature | Description [16, 21] |
Color | NDVI | This most known vegetation index is simple, but effective for quantifying green vegetation. NDVI quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs). |
Hue | Brightness | Brightness is a response to the amount of brightness produced by the eye. Brightness is the average value of grey from all satellite bands for each pixel. The brightness index shows the intensity of the spectral reflection of an image. |
Texture | GLCM homogeneity | Homogeneity GLCM is a pixel texture character profile that recognizes smooth and rough texture. |
Shape | Rectangular fit | Rectangular fit describes how well an object image forms a rectangle with the same size and proportion. The value is 0, which indicates the object image does not fit into the rectangular shape and value 1 indicates the object fits into the rectangular shape. |
Table 4 . Accuracies using CNN.
CNN training data | CNN testing data | |||
---|---|---|---|---|
Producer’s accuracy | User’s accuracy | Producer’s accuracy | User’s accuracy | |
Primary dry forest | 0.9839 | 0.9745 | 0.9804 | 0.9709 |
Secondary dry forest | 0.9671 | 0.9735 | 0.9709 | 0.9615 |
Plantation forest | 0.9899 | 0.9932 | 0.9787 | 1.0000 |
Overall accuracy | 0.9803 | 0.9766 |
Table 5 . Accuracies of gradient boosted tree (GBT).
GBT training | GBT testing | |||
---|---|---|---|---|
Producer’s accuracy | User’s accuracy | Producer’s accuracy | User’s accuracy | |
Primary dry forest | 0.9775 | 0.9806 | 0.9709 | 0.9524 |
Secondary dry forest | 0.9610 | 0.9610 | 0.9111 | 0.9425 |
Plantation forest | 0.9835 | 0.9803 | 0.9792 | 0.9691 |
Overall accuracy | 0.9740 | 0.9550 |
Table 6 . This research against previous study/reference research.
Previous study / reference research [1] | This research |
---|---|
- Satellite imagery; Sentinel-2. - Land cover class: Forest. - Study area: Mediterranean environments. - Classification method: Random forest. - Parameter: spectral index (NDVI). - Accuracy > 83%. | - Satellite imagery; Sentinel-2. - Land cover class: Forest. - Study area: Semarang area, Indonesia. - Classification method: CNN, GBT. - Parameter: NDVI, brightness, GLCM-homogeneity, Rectangular-fit. - Feature extraction baseline: contextual information on Indonesia National Standard Agency’s document for Land cover classification. - Accuracy. CNN = 97.66%. GBT = 95.50%. |
CNN architecture for classification process.
|@|~(^,^)~|@|Schema of the present study methodology.
|@|~(^,^)~|@|False color image fusion result of the Sentinel-2 satellite imagery for Semarang, Central Java, Indonesia.
|@|~(^,^)~|@|Training process of the land cover classification.
|@|~(^,^)~|@|Confusion matrixes: CNN training data.
|@|~(^,^)~|@|Confusion matrixes: CNN testing data.
|@|~(^,^)~|@|Confusion matrixes: gradient boosted tree training data.
|@|~(^,^)~|@|Confusion matrixes: gradient boosted tree testing data.
|@|~(^,^)~|@|Testing accuracy CNN against gradient boosted tree.