International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 214-228
Published online June 25, 2023
https://doi.org/10.5391/IJFIS.2023.23.2.214
© The Korean Institute of Intelligent Systems
Christine Musanase1,2, Anthony Vodacek2,3 , Damien Hanyurwimfura1,2 , Alfred Uwitonze1,2 , Aloys Fashaho1,2 , and Adrien Turamyemyirijuru1,2
1African Center of Excellence in Internet of Things, University of Rwanda, Kigali, Rwanda
2Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, USA
3College of Agriculture, Animal Sciences and Veterinary Medicine, University of Rwanda, Musanze, Rwanda
Correspondence to :
Christine Musanase (musanasechristine@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The ability to estimate soil quality has great value for agriculture, especially for low-income regions with minimal agricultural and financial resources. This prediction provides users with information that is useful in determining whether the soil is suitable for a specific crop, such as potato (Solanum tuberosum). Farmers in Rwanda lack information on soil quality. There are not enough soil laboratories to perform the requisite measurements of NPK, pH, and organic carbon, nor are there enough experts to analyze the data and provide farmers with timely results. The prime objective of the proposed study is to develop a predictive framework that can estimate soil quality for the ideal cultivation of potato (Solanum tuberosum) considering a case study of Rwanda. In this study, bootstrapping is used to augment the small soil dataset, and fuzzy logic is used to label soil data into four classes of soil suitability, with verification of the labeling by soil experts. Several machine learning methods are then tested on the labeled data, resulting in the classification of suitability for the augmented dataset and an assessment of their performance as a way to support experts in predicting soil quality. All machine learning methods applied were viable, with the best performance achieved using an artificial neural network. The quantified outcome showed that the adoption of a neural-network-based scheme has an average accuracy of 32% in contrast to other learning schemes. However, 70%-80% accuracy was achieved upon the adoption of fuzzy logic.
Keywords: Soil quality, Fuzzy logic, Artificial intelligence, Rwanda, Machine learning, NPK, Predictive model
Soil quality is an important factor in determining the suitability of sites for specific crop types. Soil quality is primarily a function of the nitrogen, phosphorus, and potassium (NPK) ratio, soil water pH, and quantity of organic matter [1, 2]. Dependency on nitrogen varies from plant to plant; however, it is known that the demand for nitrogen is always high in leafy plants. Therefore, an appropriate ratio of NPK fertilizer is required. Different amounts of chemicals and nutrients that are reported to be soil water soluble are significantly affected by the soil water pH. The availability of soil nutrients differs based on the acidic or alkaline conditions of the soil, and the development of plants can be adversely affected if the soil is highly acidic, with a pH value less than 5.5. This can occur due to various reasons, such as the toxicity associated with calcium, manganese, and aluminum, as well magnesium deficiency. Furthermore, soil may suffer from deficiencies in manganese, boron, copper, and zinc in cases of high alkalinity. Similarly, a sufficient amount of soil organic matter can increase the capacity of the soil to facilitate key nutrients and improve soil quality. In addition, a variety of other characteristics such as compaction, soil texture, water-holding capacity, soil biological activity, infiltration rate, sodicity, and cation exchange capacity can affect soil quality. Desertification, biodiversity loss, erosion, deforestation, and agricultural practices also affect soil quality.
Furthermore, site conditions and climatic factors affect potato cultivation [3]. Recent work performed by Coelho et al. [4] suggested that the primary focus should be placed on soil nutrients and organic matter and proposed a scheme emphasizing these attributes. Considering this as an input factor, the proposed scheme uses machine learning to predict the soil quality for potato cultivation. Previous studies [5, 6] noted that machine learning was validated using a segment of an existing dataset with no explicit information required.
The idea of validation is to ensure that appropriate data are considered for predictive analysis. Although the level of nutrients can be altered by the farmers, the adoption of proposed scheme for predicting soil quality for cultivation of potato has an added advantage. The scheme offers an autonomous and reliable score for an appropriate fertilizer along with the pH content, thereby facilitating effective decision-making by farmers. The adoption of the machine learning process is accompanied by the consideration of multiple inputs, along with the uncertainty factor in the form of the anticipated error rate, to ensure that the predictive outcome justifies the practicality of its applicability for the given test data of the soil of Rwanda. The predictors used in the models are density with respect to organic carbon (OC), density with respect to N, P, K, and water pH. Yields are significantly higher when soil quality is analyzed and used to determine the site suitability for growing specific crops [7, 8]. In Rwanda, relatively little soil variable data are available, and the interpretation of soil quality from these data was performed by an expert.
Therefore, the assessment of soil quality for various crops in Rwanda is limited by the availability of soil data and expert analyses. One method to overcome this limitation is to apply machine learning methods to complement the efforts of agricultural experts [9]. For example, fuzzy logic has been used to interpret the values of NPK obtained from conventional soil tests to assess their levels in the soil and to predict possible NPK inputs. Artificial neural networks (ANNs), support vector machines, cubist regression, random forests, and multiple linear regression are other machine learning methods used to advance the prediction of soil OC content [10]. In [11], the authors showed that a machine learning approach improved the accuracy of soil property prediction based on a comparative analysis of six commonly used techniques: random forest, decision tree, naïve Bayes, support vector machine, least-squares support vector machine, and ANN.
In this study, an analysis was conducted in Rwanda, and a machine learning model was built to classify the type of soil into four different groups: 1) highly suitable, 2) moderately suitable, 3) marginally suitable, and 4) not suitable. Potatoes were the target crop because they are an important commercial crop in Rwanda [12]. Soil quality information must be accessible to every farmer to decide on the type of crop to be grown and to earn more profit. In addition to the NPK ratio of the soil, the water pH plays an important role [13]. Moreover, alkaline soils inhibit micronutrient absorption [14]. Hence, it must be ensured that these plants are not planted in hostile environments. If one type of soil is less suited for a particular plant and more suited for another, this result can be further used to deduce yield predictions, and thereby, profitability can be predicted. With input from soil scientists, soil data were processed using fuzzy logic to create labels for a small set of soil data. The experts validated the results to be reasonable, and then machine learning was applied to expand the results to compensate for the fact that there are not enough experts in Rwanda. The fuzzy logic model contains the values for the classification of soil into the values used to predict soil quality. In summary, a combination of soft computing and supervised machine learning was used to predict the soil suitability for potatoes in different regions of Rwanda.
This section outlines the existing approaches developed for soil quality prediction. Udutalapally et al. [15] developed a device that can explore the health status of crops by assessing any form of disease using Internet of Things (IoT). The study inferred the inclusion of devices for the evaluation of crop health, which is a recent trend in the inclusion of IoT in agriculture. Existing studies have also included the Bayesian machine learning model, and a classified ensemble was reported in the work of Mosavi et al. [16] for identifying the salinity content of ground water. Zia et al. [17] developed a model that could predict the deficiency of nitrates in soil using a machine learning approach. These studies have focused mainly on salinity factors or single nutrient entities without the inclusion of other essential nutrient factors. In addition to machine learning, deep learning is slowly gaining pace in the predictive analysis of agricultural issues, as reported by Elavarasan and Vincent [18]. They used a deep reinforcement model to perform the predictions. Irrespective of the improved form of the model, benchmarking is lacking to prove its applicability on practical grounds. A unique study presented by de Lima Neto et al. [19] investigated the availability of nutrients in a specific form of banana. However, the model does not include any specific nutrients nor is it benchmarked.
The model presented by Zhang et al. [20] used multiple sets of machine learning methods to identify the carbon content in the soil in a specific region. Similarly, various other work was conducted to assess various problems that directly relate to assessment of soil quality; the studies included averaging method for soil properties [21], forecasting carbon in soil using multiple sets of machine learning [22], yield prediction [23], assessing effectiveness of machine learning [24], growth prediction using deep learning [25], assessing CO2 fluxes [26], prediction of texture [27], reliability forecasting model [28], and carbon variability assessment [29]. A fuzzy-logic-based approach for assessing soil quality has also been investigated by Ogunleye et al. [30], Nooriman et al. [31], Hoseini [32], Chen et al. [33], and Atijosan et al. [34]. However, the accuracy of the model largely depends on the massive size of the training data, which is sometimes unavailable. All the aforementioned studies considered various crops and different locations to predict soil quality. These studies aimed to determine the optimal amount of NPK fertilizer required for better soil quality. However, apart from the beneficial features and claims of these studies, there is also an open scope for further development toward predicting soil quality.
Studies have focused on determining the optimal amount of NPK fertilizer required for better soil quality. Additionally, various predictive mechanisms have been reported for investigating soil quality. Wu et al. [35] presented a fuzzy logic model to evaluate the soil quality based on soil samples. They also applied random forest, cubist mathematical models, and hybrid models (regression kriging) to 201 composite surface soil samples to predict soil chemical properties, such as soil OC and available phosphorus content. However, the study did not facilitate assessment based on individual nutrients, which led to unanswered questions about its applicability.
Kaya et al. [36] compared model-averaging techniques for predicting the spatial distribution of soil properties. Their results showed that ANNs and random forest-based learners were the most effective in predicting the soil properties. However, the applicability of learning-based models was found to differ in different use cases, and the study did not offer an inference of soil properties for optimal soil quality retention. Li et al. [37] used ANNs, multiple linear regression, and support vector machines to model the effects of nanomaterials in compacted clay soil. The applicability of this study is restricted to specific forms of soil. Ozcoban et al. [38] developed a model that can predict nitrate deficiency in soil using a machine learning approach. Arciniegas-Ortega et al. [39] used logistic regression to generate an index based on the land-use order. However, it uses a sophisticated approach that makes it difficult to gather data on a daily basis.
Arguments for the need of this study are as follows: the primary motive was to explore the need for predictive modeling to predict soil quality. Therefore, this study explores the scope that could offer improved potato productivity in Rwanda. A clear analysis from the literature review shows that most studies employed sophisticated machine learning techniques to design their predictive framework. It has also been observed that most existing predictive models for soil quality require increased flexibility, which restricts their applicability to a particular crop or region, indicating that the models limit their usage to a particular data size. However, few studies have focused on computing the optimal amount of NPK fertilizer, which is a key indicator for soil prediction. Despite their potential, very few studies have explored the strength factors of the rule set for fuzzy logic type-2, which could assist in enhancing the performance of machine learning models. A review of the existing approaches shows that more work is required to consider real-time soil data, and no similar study has been conducted in Rwanda.
The primary reason for conducting the proposed study was to address the current challenges faced by existing soil quality prediction techniques. The first challenge addressed by the proposed scheme is to develop a simplified fuzzy-logic-based scheme that balances ruleset deployment as well as effective decision-making performance verified by a soil expert. The second challenge addressed in this study was to consider the use case of potato cultivation in Rwanda, which has never been explored before. The third challenge addressed in the current study was the deployment of a simplified learning model to perform predictive tasks for determining soil quality. The primary purpose of the present work was to develop an effective machine learning model capable of the predictive analysis of soil quality in Rwanda. A review of existing approaches shows that little work has been performed to consider real-time soil data, and no similar study has been conducted in Rwanda. The next section discusses the materials and methodology used to achieve the aims of this study.
This section emphasizes the methodologies and formulated algorithms for predicting soil quality. This study’s model was developed considering the underlying principles of agricultural science as well as data science. The study model aggregates environmental information using agricultural science, and a similar concept is used to configure the thresholds of pH and NPK values.
Prior to discussing the data collection, it is essential to understand the complete structure of the proposed implementation. Figure 1 highlights the structure of the proposed implementation scheme, which exhibits the area from which soil data were collected, followed by a series of processes to analyze soil quality.
The soil in Rwanda is naturally fragile [40]. Agricultural development is governed by crop intensification programs [41]. The selection toward cultivation of crops is conducted based on its potential ability to fulfill demands of food security for the region, climatic conditions of the ecological region that suits crop cultivation, and comparative advantage. As per Figure 2, a specific region in Rwanda, viz., Buberuka and Birunga, which include Burera and Rubavu districts located in the northern and western provinces of Rwanda is preferred for cultivating potato [42] (Figure 2). Both districts have high rainfall, low temperatures, high altitude, and steeply sloping hills. The soils in these regions mainly fall under the category of volcanic soil, whereas the Gicumbi and Rwamagana districts are in the northeastern and eastern parts of Rwanda, with altitudes ranging from medium to high, generally drier conditions than Burera and Rubavu, and nonvolcanic soils.
For the purpose of the investigation of the proposed scheme, the samples of soils have been aggregated from two regions, viz., Gicumbi and Rwamagana districts during two seasons, September 2017 to February 2018 and March 2018 to August 2018. Soil sampling was performed at a depth of 30 cm, and samples were collected from 16 plots. Finally, the study also considered the aggregation of soil samples during rainy season of the year 2017 from Rubavu and Burera districts. The study investigated 12 randomly selected plots in the region characterized by a depth of 0–30 cm. The study was conducted in a controlled research environment following the standard procedure for soil analysis in the laboratory. The final dataset consisted of 6,051 soil samples from four locations: Gicumbi, Rwamagana, Rubavu, and Burera. The five variables of soil data (K, P, N, OC, and pH), which were considered to build our models, are shown in Table 1. The dataset was split for training and testing, with 80% for training and 20% for testing.
The data in Table 1 were gathered following the standard procedures of soil analysis in the soil laboratory of the University of Rwanda, College of Agriculture, Animal Sciences, and Veterinary Medicine, Busogo Campus.
The proposed scheme was analyzed using both fuzzy logic and a machine learning approach. Fuzzy logic is applied to solve problems characterized by impartial or vague sets of input information, thereby offering higher flexibility for reasoning in the presence of uncertainties. The proposed scheme implements fuzzy logic because it offers higher robustness owing to its independence from inputs; it does not require inputs to be noise-free or fixed. Furthermore, it assists in constructing user-defined rules to make them practically implementable. The performance of the fuzzy controller system can be improved to optimize the system performance. The fuzzy logic mechanism can also assist in generating a user-friendly outcome while processing a reasonable number of inputs. Furthermore, the adoption of fuzzy logic offers non-dependency from complex mathematical implementations because it can efficiently perform nonlinear systems.
Figure 3 highlights the architecture of the fuzzy logic type-2 used in the proposed scheme for predicting soil quality by considering the uncertain and imprecise information associated with the condition of the soil. The first part of this implementation process involves defining the crisp key input that affects soil quality. There are various possibilities for such inputs, such as compaction, nutrient levels, moisture content, and organic matter content. The inputs considered were taken from four different agricultural sites in Rwanda, considering the pH value, OC, and proportions of N, P, and K fertilizer. The next processing step, shown in Figure 3, is associated with the construction of a rule base to represent the connection between the input arguments and the outcome of soil quality. The study uses “If-Then” rules for constructing the fuzzy set exhibited in Lines 1–4 of the proposed algorithm. The next step is to develop a fuzzy inference system that applies a rule base to consider input arguments. The objective was to assess the degree of influence of each rule on the input variables to yield an outcome. The next operation is mainly associated with defuzzification using a type-reduction operation, which is essentially an extended configuration of defuzzification process for its legacy type-1 fuzzy logic. The architecture shown in Figure 3 can control the uncertainty and possible impressions associated with the soil data.
The proposed scheme to develop soil suitability labels uses fuzzy logic type-2, as shown in Figure 3, where the proposed scheme considers the soil dataset sample in the form of initial data, followed by the activation of the inference engine and mapping performed through rule construction. The outcome is processed by a type reducer, which is then subjected to defuzzification to obtain the final outcome of the type-reduction score and crisp output.
• The first module of the fuzzifier is responsible for mapping the crisp input that solely depends upon the category of fuzzifier being deployed.
• The second module of rule is responsible for constructing rules considering N, P, and K attributes of soil quality where the outcome of rule states the matching predicted quality of soil.
• The third module of the fuzzy inference engine is responsible for mapping the input to output considering all the stated rules in the fuzzy rule base.
• The fourth module of the type-reduction that is an extended operation of defuzzification where type-1 fuzzy set is obtained by transforming type-2 fuzzy set.
• The fifth and final module of defuzzifier is responsible for generating a quantified numerical outcome as a consequence of considering associated degree of membership function, crisp logic, and adopted fuzzy set. This final block of operation basically maps its fuzzy set into crisp set, which is essential in a fuzzy control system.
The significant benefit of adopting this fuzzy logic technique is that it facilitates modeling of varied levels of uncertainties in predicting soil quality, which cannot be carried out by type-1 fuzzy logic. The algorithmic steps are as Algorithm 1.
The algorithm described above is used for labeling soil information using fuzzy logic. The algorithm takes the input of the soil data, which, after processing, yields an outcome for the soil. The algorithm constructs a matrix of four columns to retain the information associated with the pH water quality, N quality, P quality, and K quality (Line 1). The next step is to assign quality values (Line 2). Furthermore, the algorithm constructs a conditional logic to assess whether the quality of water pH and N is equivalent to 1, while it also assesses whether the quality of soil has the lowest quality score for P and K (Line 3). The algorithm performs similar rounds of checks for a pH water quality equivalent to 2 (Line 4), and the quality is equivalent to the pH water quality (Line 5). All the above processing steps yield inferences regarding the quality of the soil. The processing outcome showed that the majority of soil samples from the observation region were marginally suitable for potato cultivation.
Table 3. Numerical outcome of accuracy-based comparative analysis
ML approaches | Precision (%) | Recall (%) | F1-score |
---|---|---|---|
Gaussian NB | 0.82 | 0.83 | 0.84 |
ANN | 0.87 | 0.85 | 0.86 |
Logistic regression | 0.79 | 0.75 | 0.77 |
KNN | 0.75 | 0.72 | 0.73 |
The primary reason for the adoption of fuzzy logic for setting up this ruleset is mainly to develop a decision control system in which soil quality can be ascertained based on the rules generated by the user. The advantage of this rule, set up in fuzzy logic, is that the suitability score can be customized by farmers or agricultural experts based on the present agricultural environment of any farming region in Rwanda. Therefore, the higher the soil quality, the higher the anticipated potato yield in Rwanda.
The applicability of fuzzy logic was restricted to a few soil samples, and quality predictions were found to be reasonable by soil experts. The soil experts involved in the proposed study were local skilled agriculturists and agronomists with extensive experience in analyzing and interpreting the quality of soil. The experts were consulted regarding the anticipated outcomes of the model. According to them, the lower the error rate of the predictive model, the higher is the possibility of reliable outcomes for a higher degree of soil quality. They also provided a recommendation to assess the individual scope of NPK fertilizer, followed by soil water pH. As experts are not available to assess all the results, the proposed analysis adopted a machine learning approach to increase the analytical coverage of the study. For this purpose, an ANN was adopted to predict soil quality.
ANN is capable of learning complex problems, thereby making precise decisions regarding the quality of soil based on different quality parameters [43]. It can also perform parallel processing of multiple inputs, thus generating multiple outcomes. This means that the proposed model is capable of predicting the soil for any geographic region, not only Rwanda. The Gaussian naïve Bayes is a simple and powerful algorithm for predictive modeling. It is assumed that each input variable is independent [44]. We also used Gaussian naïve Bayes to classify the soil data as it holds continuous data. It was easier to calculate the mean and standard deviation from the training database using the Gaussian mode. The naïve Bayes approach often yields accurate and stable models with very small sample sizes, similar to our soil sample analysis dataset from Rwanda. Logistic regression can be used to analyze the relationship between multiple independent variables and categorical dependent variables and to estimate the probability of an event [45]. However, in this study, the dependent variable was not dichotomous but comprised four categories; we used multinomial logistic regression for soil quality prediction. The K-nearest neighbors (KNN) machine learning algorithm is a well-known nonparametric classification method. KNN determines the class of a new sample based on the class of its nearest neighbors [46]. For each modeling approach tested in this study, the assessment of the algorithm was based on four measures: accuracy, precision, recall, and F1 score, which were calculated using the following formulas:
In the above empirical expressions (
This section describes the results of the fuzzy logic method for labeling data, followed by a comparison of the outcomes of different machine learning methods as predictive models. The labeling method was verified as reasonable by soil experts with knowledge of Rwandan soils. Machine learning methods were compared using performance metrics.
The outcome of the fuzzy logic was analyzed using a kernel density estimation plot, which is a probability density function that is a variation of the histogram that uses kernel smoothing while plotting the values. Density associated with OC, proportion of NPK individually, and proportion of pH. The X-axes represent the data points, and the Y-axes represent the probability density function. The region of the plot with a higher peak is the region with the maximum number of data points between the values.
As shown in Figure 4, the OC percentage was satisfactory throughout the region, and this study evaluated the impact of individual N, P, and K densities on OC proportion (g g-1). Therefore, this variation did not influence the suitability score, and these data could be ignored. In Figure 5, the density plot of N shows two types of quantities in the soil. Most of it was 0.08, whereas another peak was present at 0.2; hence, this became a deciding factor for overall soil quality. In Figure 6, the P percentage is higher in a particular region but lower in others; in the case of a lower N percentage, the P percentage is the deciding factor, along with the K percentage. Negative values of P represent outliers.
Figures 7 and 8 show the density analysis for the proportion of K and water pH. The results show that the density of K is slightly lower, whereas the density of the water pH is quite good for the given region under observation in the proposed use case of Rwanda. The final outcome of soil quality is shown in Figure 9.
The outcomes shown in Figure 9 were produced using the amended version of fuzzy logic. The original sample size was 360 samples; however, a random function was used to increase the sample size programmatically to assess the influence of different samples on the predicted quality. The graphical representation of Figure 9 was obtained by considering the first sample of 360 data points (Sample-1), the second sample of 720 data points (Sample-2), and the third sample of 1,440 data points. Recall the development mechanism of the ruleset constructed using fuzzy logic, and this result was validated by a soil expert. The outcome of processing showed that the majority of cases from the soil were marginally suitable for cultivating potatoes. It should be noted that the outcome obtained is highly subjective of the use case of four different sites in Rwanda from where the samples were collected (represented in the graph as SC-1.0, SC-2.0, SC-3.0, and SC-4.0). According to experts who are also co-authors of this work, fuzzy logic can classify soil with an accuracy of 70%–80%; hence, most of the soil samples came from acrisols of Gicumbi and Lixisols of Rwamagana where Irish potatoes are marginally cultivated. Therefore, fuzzy logic offers a better predictive performance.
The outcomes of the machine learning approach were assessed using performance metrics.
The proposed scheme was assessed through a comparison of the results with a set of frequently used machine learning methods, that is, ANN, Gaussian naïve Bayes, logistic regression, and KNN. The results shown in Figure 10 and Table 3 indicate that ANN offers better predictive accuracy than the other approaches. This suggests that predictive modeling of soil quality using ANNs offers higher reliability. Machine learning contributes to the development of a predictive model that can be used by farmers and agriculturists to understand the possibilities of suitable environments for potato farming. The scalability of this predictive model is that upon feeding any form of dataset, the model can perform predictions associated with soil quality. Based on the predicted outcomes, farmers can finetune the concentration of various fertilizers to ensure better crop cultivation in Rwanda. Hence, it saves time and effort in decision-making for a better yield. From the perspective of novelty, the core idea of the proposed scheme is to develop a cost-effective and simplified computational predictive model that can assess soil quality suitable for potato cultivation in Rwanda. The mechanism of adoption of the proposed model is based on data acquisition and designing a fuzzy processor; applying a machine learning approach for this purpose is a novel notion under the environmental conditions of Rwanda. The model is simple in design and implementation and offers reliable outcomes in easy steps. The basis of this model deployment is to refrain from adopting any advanced data analytical or predictive scheme using machine learning or deep learning, where accuracy is achieved at the cost of the computational burden. However, future work can be conducted to extend the proposed machine learning predictive scheme to its advanced version, ensuring a lower computational cost without affecting accuracy. The next section presents discussion of the results of the proposed study.
From the highlights of the outcome obtained in the previous section, it is noted that the proposed scheme introduces a mechanism that not only can identify the optimal fertilizers required for potato cultivation but also offers a robust classification mechanism to determine the soil quality required for this purpose. There is no doubt that the proposed scheme is implemented considering the use case of potatoes in Rwanda; however, the same model can also be applied to different agricultural crops in different countries. This is possible by altering the input variables of this learning model to predict soil quality, which, at present, is considered only for four different sites in Rwanda.
To understand the contribution of the proposed method to potato production, it is necessary to understand the actual agricultural and cultivation factors required for optimal potato production. The amount of fertilizer to be provided in the soil for potato cultivation depends on the soil test data. The proposed scheme considered standard input data from four agricultural sites in Rwanda to determine the optimal distribution of nutrients. It is also known that NPK fertilizers are administered in the same proportion at the same time during planting; however, the demand for these fertilizers continues to change over a period of cultivation. Fuzzy logic type-2 was applied in the proposed scheme considering the input of soil data to predict soil quality with respect to water quality and quality with respect to individual N, P, and K fertilizers. Further accuracy of this predictive analysis was achieved by deploying machine learning schemes using an ANN. Hence, the model is capable of predicting the change in the quantity of fertilizer required to reach the optimal soil quality to ensure better production.
Although a machine learning approach has been introduced to overcome the constraints of using fuzzy logic for various types of soil, fuzzy logic should not be deemed a low-performing module. It has its own advantages that cannot be achieved using a machine learning approach. Fuzzy logic offers a better platform for introducing a user-friendly logical ruleset that is not only easy to deploy but also flexible to manage. The ruleset can be redefined based on the problem space considered in the investigation. Machine learning contributes to ensuring a higher predictive accuracy by considering more coverage of the soil area. The fuzzy logic analysis showed a higher proportion of OC density (Figure 4) and a lower proportion of N (Figure 5) and P (Figure 6), while the proportion of K (Figure 7) was found to be better than prior values of N and P, which are highly essential for potato cultivation. Furthermore, fuzzy logic was found to offer a predictive performance, which was validated by a soil expert (Figure 10). The analysis of various machine learning schemes demonstrates that ANNs offer better predictive performance than other machine learning approaches. The prime justifications behind this are as follows: less effective feature management system using the Gaussian naïve Bayes algorithm, shortcomings in considering linearity associated with different categories of variables in logistic regression, and less applicability/scalability toward higher dimensions of data in the KNN algorithm. However, it is not difficult to implement an ANN with parallel processing. This not only speeds up processing but also addresses the minimization of errors at each progressive epoch. Hence, the ANN demonstrated better performance in predicting soil quality in this study.
This study presents a simplified predictive modeling for forecasting the soil quality in selected areas of Rwanda, which has been less explored in existing studies. The contributions of the proposed scheme are as follows: (1) the predictive operation is performed using the fuzzy logic and machine learning approaches; both have unique benefits, unlike any form of sophisticated usage of multiple machine learning schemes found in existing studies; (2) the proposed predictive model is highly flexible in its operation and does not solely depend on the environment, which means that it can be applied for predicting soil quality for other crops or other regions, irrespective of any size or dimension of the data; (3) the outcome of the proposed scheme shows that the proposed fuzzy logic scheme offers better predictive performance; (4) the quantified outcome of the machine learning scheme shows that the proposed artificial neural network offers approximately 25% higher accuracy compared with Gaussian naïve Bayes, approximately 32% higher accuracy compared with logistic regression, and approximately 37% higher accuracy compared with the KNN algorithm; and (5) the overall quantified outcome shows that the proposed model is able to show an accuracy of 89%, whereas the implementation of the fuzzy logic algorithm proposed in this system is able to classify the soil with an accuracy of 70% to 80% with shallow learning algorithms.
The numerical outcome of the proposed scheme offers significant guidelines for potato cultivation in Rwanda. Based on this outcome, an agronomist will need to identify the geographical features of Rwanda, followed by the development of a smart ruleset using fuzzy logic. This could facilitate better proportion identification of NPK fertilizers for improving soil quality. Hence, without much demand for re-engineering, this model can be utilized for any geographical location to exhibit better predictive performance than the frequently exercised schemes. Future work will focus on using soil sensors for soil analysis to overcome the challenges of insufficient soil datasets and soil experts to address geographical differences. Future work will also be conducted in the direction of soil map generation to further assess soil quality in the presence of various challenging agricultural conditions.
The datasets are available from the corresponding author upon request.
The authors declare no conflicts of interest.
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 214-228
Published online June 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.2.214
Copyright © The Korean Institute of Intelligent Systems.
Christine Musanase1,2, Anthony Vodacek2,3 , Damien Hanyurwimfura1,2 , Alfred Uwitonze1,2 , Aloys Fashaho1,2 , and Adrien Turamyemyirijuru1,2
1African Center of Excellence in Internet of Things, University of Rwanda, Kigali, Rwanda
2Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, USA
3College of Agriculture, Animal Sciences and Veterinary Medicine, University of Rwanda, Musanze, Rwanda
Correspondence to:Christine Musanase (musanasechristine@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The ability to estimate soil quality has great value for agriculture, especially for low-income regions with minimal agricultural and financial resources. This prediction provides users with information that is useful in determining whether the soil is suitable for a specific crop, such as potato (Solanum tuberosum). Farmers in Rwanda lack information on soil quality. There are not enough soil laboratories to perform the requisite measurements of NPK, pH, and organic carbon, nor are there enough experts to analyze the data and provide farmers with timely results. The prime objective of the proposed study is to develop a predictive framework that can estimate soil quality for the ideal cultivation of potato (Solanum tuberosum) considering a case study of Rwanda. In this study, bootstrapping is used to augment the small soil dataset, and fuzzy logic is used to label soil data into four classes of soil suitability, with verification of the labeling by soil experts. Several machine learning methods are then tested on the labeled data, resulting in the classification of suitability for the augmented dataset and an assessment of their performance as a way to support experts in predicting soil quality. All machine learning methods applied were viable, with the best performance achieved using an artificial neural network. The quantified outcome showed that the adoption of a neural-network-based scheme has an average accuracy of 32% in contrast to other learning schemes. However, 70%-80% accuracy was achieved upon the adoption of fuzzy logic.
Keywords: Soil quality, Fuzzy logic, Artificial intelligence, Rwanda, Machine learning, NPK, Predictive model
Soil quality is an important factor in determining the suitability of sites for specific crop types. Soil quality is primarily a function of the nitrogen, phosphorus, and potassium (NPK) ratio, soil water pH, and quantity of organic matter [1, 2]. Dependency on nitrogen varies from plant to plant; however, it is known that the demand for nitrogen is always high in leafy plants. Therefore, an appropriate ratio of NPK fertilizer is required. Different amounts of chemicals and nutrients that are reported to be soil water soluble are significantly affected by the soil water pH. The availability of soil nutrients differs based on the acidic or alkaline conditions of the soil, and the development of plants can be adversely affected if the soil is highly acidic, with a pH value less than 5.5. This can occur due to various reasons, such as the toxicity associated with calcium, manganese, and aluminum, as well magnesium deficiency. Furthermore, soil may suffer from deficiencies in manganese, boron, copper, and zinc in cases of high alkalinity. Similarly, a sufficient amount of soil organic matter can increase the capacity of the soil to facilitate key nutrients and improve soil quality. In addition, a variety of other characteristics such as compaction, soil texture, water-holding capacity, soil biological activity, infiltration rate, sodicity, and cation exchange capacity can affect soil quality. Desertification, biodiversity loss, erosion, deforestation, and agricultural practices also affect soil quality.
Furthermore, site conditions and climatic factors affect potato cultivation [3]. Recent work performed by Coelho et al. [4] suggested that the primary focus should be placed on soil nutrients and organic matter and proposed a scheme emphasizing these attributes. Considering this as an input factor, the proposed scheme uses machine learning to predict the soil quality for potato cultivation. Previous studies [5, 6] noted that machine learning was validated using a segment of an existing dataset with no explicit information required.
The idea of validation is to ensure that appropriate data are considered for predictive analysis. Although the level of nutrients can be altered by the farmers, the adoption of proposed scheme for predicting soil quality for cultivation of potato has an added advantage. The scheme offers an autonomous and reliable score for an appropriate fertilizer along with the pH content, thereby facilitating effective decision-making by farmers. The adoption of the machine learning process is accompanied by the consideration of multiple inputs, along with the uncertainty factor in the form of the anticipated error rate, to ensure that the predictive outcome justifies the practicality of its applicability for the given test data of the soil of Rwanda. The predictors used in the models are density with respect to organic carbon (OC), density with respect to N, P, K, and water pH. Yields are significantly higher when soil quality is analyzed and used to determine the site suitability for growing specific crops [7, 8]. In Rwanda, relatively little soil variable data are available, and the interpretation of soil quality from these data was performed by an expert.
Therefore, the assessment of soil quality for various crops in Rwanda is limited by the availability of soil data and expert analyses. One method to overcome this limitation is to apply machine learning methods to complement the efforts of agricultural experts [9]. For example, fuzzy logic has been used to interpret the values of NPK obtained from conventional soil tests to assess their levels in the soil and to predict possible NPK inputs. Artificial neural networks (ANNs), support vector machines, cubist regression, random forests, and multiple linear regression are other machine learning methods used to advance the prediction of soil OC content [10]. In [11], the authors showed that a machine learning approach improved the accuracy of soil property prediction based on a comparative analysis of six commonly used techniques: random forest, decision tree, naïve Bayes, support vector machine, least-squares support vector machine, and ANN.
In this study, an analysis was conducted in Rwanda, and a machine learning model was built to classify the type of soil into four different groups: 1) highly suitable, 2) moderately suitable, 3) marginally suitable, and 4) not suitable. Potatoes were the target crop because they are an important commercial crop in Rwanda [12]. Soil quality information must be accessible to every farmer to decide on the type of crop to be grown and to earn more profit. In addition to the NPK ratio of the soil, the water pH plays an important role [13]. Moreover, alkaline soils inhibit micronutrient absorption [14]. Hence, it must be ensured that these plants are not planted in hostile environments. If one type of soil is less suited for a particular plant and more suited for another, this result can be further used to deduce yield predictions, and thereby, profitability can be predicted. With input from soil scientists, soil data were processed using fuzzy logic to create labels for a small set of soil data. The experts validated the results to be reasonable, and then machine learning was applied to expand the results to compensate for the fact that there are not enough experts in Rwanda. The fuzzy logic model contains the values for the classification of soil into the values used to predict soil quality. In summary, a combination of soft computing and supervised machine learning was used to predict the soil suitability for potatoes in different regions of Rwanda.
This section outlines the existing approaches developed for soil quality prediction. Udutalapally et al. [15] developed a device that can explore the health status of crops by assessing any form of disease using Internet of Things (IoT). The study inferred the inclusion of devices for the evaluation of crop health, which is a recent trend in the inclusion of IoT in agriculture. Existing studies have also included the Bayesian machine learning model, and a classified ensemble was reported in the work of Mosavi et al. [16] for identifying the salinity content of ground water. Zia et al. [17] developed a model that could predict the deficiency of nitrates in soil using a machine learning approach. These studies have focused mainly on salinity factors or single nutrient entities without the inclusion of other essential nutrient factors. In addition to machine learning, deep learning is slowly gaining pace in the predictive analysis of agricultural issues, as reported by Elavarasan and Vincent [18]. They used a deep reinforcement model to perform the predictions. Irrespective of the improved form of the model, benchmarking is lacking to prove its applicability on practical grounds. A unique study presented by de Lima Neto et al. [19] investigated the availability of nutrients in a specific form of banana. However, the model does not include any specific nutrients nor is it benchmarked.
The model presented by Zhang et al. [20] used multiple sets of machine learning methods to identify the carbon content in the soil in a specific region. Similarly, various other work was conducted to assess various problems that directly relate to assessment of soil quality; the studies included averaging method for soil properties [21], forecasting carbon in soil using multiple sets of machine learning [22], yield prediction [23], assessing effectiveness of machine learning [24], growth prediction using deep learning [25], assessing CO2 fluxes [26], prediction of texture [27], reliability forecasting model [28], and carbon variability assessment [29]. A fuzzy-logic-based approach for assessing soil quality has also been investigated by Ogunleye et al. [30], Nooriman et al. [31], Hoseini [32], Chen et al. [33], and Atijosan et al. [34]. However, the accuracy of the model largely depends on the massive size of the training data, which is sometimes unavailable. All the aforementioned studies considered various crops and different locations to predict soil quality. These studies aimed to determine the optimal amount of NPK fertilizer required for better soil quality. However, apart from the beneficial features and claims of these studies, there is also an open scope for further development toward predicting soil quality.
Studies have focused on determining the optimal amount of NPK fertilizer required for better soil quality. Additionally, various predictive mechanisms have been reported for investigating soil quality. Wu et al. [35] presented a fuzzy logic model to evaluate the soil quality based on soil samples. They also applied random forest, cubist mathematical models, and hybrid models (regression kriging) to 201 composite surface soil samples to predict soil chemical properties, such as soil OC and available phosphorus content. However, the study did not facilitate assessment based on individual nutrients, which led to unanswered questions about its applicability.
Kaya et al. [36] compared model-averaging techniques for predicting the spatial distribution of soil properties. Their results showed that ANNs and random forest-based learners were the most effective in predicting the soil properties. However, the applicability of learning-based models was found to differ in different use cases, and the study did not offer an inference of soil properties for optimal soil quality retention. Li et al. [37] used ANNs, multiple linear regression, and support vector machines to model the effects of nanomaterials in compacted clay soil. The applicability of this study is restricted to specific forms of soil. Ozcoban et al. [38] developed a model that can predict nitrate deficiency in soil using a machine learning approach. Arciniegas-Ortega et al. [39] used logistic regression to generate an index based on the land-use order. However, it uses a sophisticated approach that makes it difficult to gather data on a daily basis.
Arguments for the need of this study are as follows: the primary motive was to explore the need for predictive modeling to predict soil quality. Therefore, this study explores the scope that could offer improved potato productivity in Rwanda. A clear analysis from the literature review shows that most studies employed sophisticated machine learning techniques to design their predictive framework. It has also been observed that most existing predictive models for soil quality require increased flexibility, which restricts their applicability to a particular crop or region, indicating that the models limit their usage to a particular data size. However, few studies have focused on computing the optimal amount of NPK fertilizer, which is a key indicator for soil prediction. Despite their potential, very few studies have explored the strength factors of the rule set for fuzzy logic type-2, which could assist in enhancing the performance of machine learning models. A review of the existing approaches shows that more work is required to consider real-time soil data, and no similar study has been conducted in Rwanda.
The primary reason for conducting the proposed study was to address the current challenges faced by existing soil quality prediction techniques. The first challenge addressed by the proposed scheme is to develop a simplified fuzzy-logic-based scheme that balances ruleset deployment as well as effective decision-making performance verified by a soil expert. The second challenge addressed in this study was to consider the use case of potato cultivation in Rwanda, which has never been explored before. The third challenge addressed in the current study was the deployment of a simplified learning model to perform predictive tasks for determining soil quality. The primary purpose of the present work was to develop an effective machine learning model capable of the predictive analysis of soil quality in Rwanda. A review of existing approaches shows that little work has been performed to consider real-time soil data, and no similar study has been conducted in Rwanda. The next section discusses the materials and methodology used to achieve the aims of this study.
This section emphasizes the methodologies and formulated algorithms for predicting soil quality. This study’s model was developed considering the underlying principles of agricultural science as well as data science. The study model aggregates environmental information using agricultural science, and a similar concept is used to configure the thresholds of pH and NPK values.
Prior to discussing the data collection, it is essential to understand the complete structure of the proposed implementation. Figure 1 highlights the structure of the proposed implementation scheme, which exhibits the area from which soil data were collected, followed by a series of processes to analyze soil quality.
The soil in Rwanda is naturally fragile [40]. Agricultural development is governed by crop intensification programs [41]. The selection toward cultivation of crops is conducted based on its potential ability to fulfill demands of food security for the region, climatic conditions of the ecological region that suits crop cultivation, and comparative advantage. As per Figure 2, a specific region in Rwanda, viz., Buberuka and Birunga, which include Burera and Rubavu districts located in the northern and western provinces of Rwanda is preferred for cultivating potato [42] (Figure 2). Both districts have high rainfall, low temperatures, high altitude, and steeply sloping hills. The soils in these regions mainly fall under the category of volcanic soil, whereas the Gicumbi and Rwamagana districts are in the northeastern and eastern parts of Rwanda, with altitudes ranging from medium to high, generally drier conditions than Burera and Rubavu, and nonvolcanic soils.
For the purpose of the investigation of the proposed scheme, the samples of soils have been aggregated from two regions, viz., Gicumbi and Rwamagana districts during two seasons, September 2017 to February 2018 and March 2018 to August 2018. Soil sampling was performed at a depth of 30 cm, and samples were collected from 16 plots. Finally, the study also considered the aggregation of soil samples during rainy season of the year 2017 from Rubavu and Burera districts. The study investigated 12 randomly selected plots in the region characterized by a depth of 0–30 cm. The study was conducted in a controlled research environment following the standard procedure for soil analysis in the laboratory. The final dataset consisted of 6,051 soil samples from four locations: Gicumbi, Rwamagana, Rubavu, and Burera. The five variables of soil data (K, P, N, OC, and pH), which were considered to build our models, are shown in Table 1. The dataset was split for training and testing, with 80% for training and 20% for testing.
The data in Table 1 were gathered following the standard procedures of soil analysis in the soil laboratory of the University of Rwanda, College of Agriculture, Animal Sciences, and Veterinary Medicine, Busogo Campus.
The proposed scheme was analyzed using both fuzzy logic and a machine learning approach. Fuzzy logic is applied to solve problems characterized by impartial or vague sets of input information, thereby offering higher flexibility for reasoning in the presence of uncertainties. The proposed scheme implements fuzzy logic because it offers higher robustness owing to its independence from inputs; it does not require inputs to be noise-free or fixed. Furthermore, it assists in constructing user-defined rules to make them practically implementable. The performance of the fuzzy controller system can be improved to optimize the system performance. The fuzzy logic mechanism can also assist in generating a user-friendly outcome while processing a reasonable number of inputs. Furthermore, the adoption of fuzzy logic offers non-dependency from complex mathematical implementations because it can efficiently perform nonlinear systems.
Figure 3 highlights the architecture of the fuzzy logic type-2 used in the proposed scheme for predicting soil quality by considering the uncertain and imprecise information associated with the condition of the soil. The first part of this implementation process involves defining the crisp key input that affects soil quality. There are various possibilities for such inputs, such as compaction, nutrient levels, moisture content, and organic matter content. The inputs considered were taken from four different agricultural sites in Rwanda, considering the pH value, OC, and proportions of N, P, and K fertilizer. The next processing step, shown in Figure 3, is associated with the construction of a rule base to represent the connection between the input arguments and the outcome of soil quality. The study uses “If-Then” rules for constructing the fuzzy set exhibited in Lines 1–4 of the proposed algorithm. The next step is to develop a fuzzy inference system that applies a rule base to consider input arguments. The objective was to assess the degree of influence of each rule on the input variables to yield an outcome. The next operation is mainly associated with defuzzification using a type-reduction operation, which is essentially an extended configuration of defuzzification process for its legacy type-1 fuzzy logic. The architecture shown in Figure 3 can control the uncertainty and possible impressions associated with the soil data.
The proposed scheme to develop soil suitability labels uses fuzzy logic type-2, as shown in Figure 3, where the proposed scheme considers the soil dataset sample in the form of initial data, followed by the activation of the inference engine and mapping performed through rule construction. The outcome is processed by a type reducer, which is then subjected to defuzzification to obtain the final outcome of the type-reduction score and crisp output.
• The first module of the fuzzifier is responsible for mapping the crisp input that solely depends upon the category of fuzzifier being deployed.
• The second module of rule is responsible for constructing rules considering N, P, and K attributes of soil quality where the outcome of rule states the matching predicted quality of soil.
• The third module of the fuzzy inference engine is responsible for mapping the input to output considering all the stated rules in the fuzzy rule base.
• The fourth module of the type-reduction that is an extended operation of defuzzification where type-1 fuzzy set is obtained by transforming type-2 fuzzy set.
• The fifth and final module of defuzzifier is responsible for generating a quantified numerical outcome as a consequence of considering associated degree of membership function, crisp logic, and adopted fuzzy set. This final block of operation basically maps its fuzzy set into crisp set, which is essential in a fuzzy control system.
The significant benefit of adopting this fuzzy logic technique is that it facilitates modeling of varied levels of uncertainties in predicting soil quality, which cannot be carried out by type-1 fuzzy logic. The algorithmic steps are as Algorithm 1.
The algorithm described above is used for labeling soil information using fuzzy logic. The algorithm takes the input of the soil data, which, after processing, yields an outcome for the soil. The algorithm constructs a matrix of four columns to retain the information associated with the pH water quality, N quality, P quality, and K quality (Line 1). The next step is to assign quality values (Line 2). Furthermore, the algorithm constructs a conditional logic to assess whether the quality of water pH and N is equivalent to 1, while it also assesses whether the quality of soil has the lowest quality score for P and K (Line 3). The algorithm performs similar rounds of checks for a pH water quality equivalent to 2 (Line 4), and the quality is equivalent to the pH water quality (Line 5). All the above processing steps yield inferences regarding the quality of the soil. The processing outcome showed that the majority of soil samples from the observation region were marginally suitable for potato cultivation.
Table 3. Numerical outcome of accuracy-based comparative analysis.
ML approaches | Precision (%) | Recall (%) | F1-score |
---|---|---|---|
Gaussian NB | 0.82 | 0.83 | 0.84 |
ANN | 0.87 | 0.85 | 0.86 |
Logistic regression | 0.79 | 0.75 | 0.77 |
KNN | 0.75 | 0.72 | 0.73 |
The primary reason for the adoption of fuzzy logic for setting up this ruleset is mainly to develop a decision control system in which soil quality can be ascertained based on the rules generated by the user. The advantage of this rule, set up in fuzzy logic, is that the suitability score can be customized by farmers or agricultural experts based on the present agricultural environment of any farming region in Rwanda. Therefore, the higher the soil quality, the higher the anticipated potato yield in Rwanda.
The applicability of fuzzy logic was restricted to a few soil samples, and quality predictions were found to be reasonable by soil experts. The soil experts involved in the proposed study were local skilled agriculturists and agronomists with extensive experience in analyzing and interpreting the quality of soil. The experts were consulted regarding the anticipated outcomes of the model. According to them, the lower the error rate of the predictive model, the higher is the possibility of reliable outcomes for a higher degree of soil quality. They also provided a recommendation to assess the individual scope of NPK fertilizer, followed by soil water pH. As experts are not available to assess all the results, the proposed analysis adopted a machine learning approach to increase the analytical coverage of the study. For this purpose, an ANN was adopted to predict soil quality.
ANN is capable of learning complex problems, thereby making precise decisions regarding the quality of soil based on different quality parameters [43]. It can also perform parallel processing of multiple inputs, thus generating multiple outcomes. This means that the proposed model is capable of predicting the soil for any geographic region, not only Rwanda. The Gaussian naïve Bayes is a simple and powerful algorithm for predictive modeling. It is assumed that each input variable is independent [44]. We also used Gaussian naïve Bayes to classify the soil data as it holds continuous data. It was easier to calculate the mean and standard deviation from the training database using the Gaussian mode. The naïve Bayes approach often yields accurate and stable models with very small sample sizes, similar to our soil sample analysis dataset from Rwanda. Logistic regression can be used to analyze the relationship between multiple independent variables and categorical dependent variables and to estimate the probability of an event [45]. However, in this study, the dependent variable was not dichotomous but comprised four categories; we used multinomial logistic regression for soil quality prediction. The K-nearest neighbors (KNN) machine learning algorithm is a well-known nonparametric classification method. KNN determines the class of a new sample based on the class of its nearest neighbors [46]. For each modeling approach tested in this study, the assessment of the algorithm was based on four measures: accuracy, precision, recall, and F1 score, which were calculated using the following formulas:
In the above empirical expressions (
This section describes the results of the fuzzy logic method for labeling data, followed by a comparison of the outcomes of different machine learning methods as predictive models. The labeling method was verified as reasonable by soil experts with knowledge of Rwandan soils. Machine learning methods were compared using performance metrics.
The outcome of the fuzzy logic was analyzed using a kernel density estimation plot, which is a probability density function that is a variation of the histogram that uses kernel smoothing while plotting the values. Density associated with OC, proportion of NPK individually, and proportion of pH. The X-axes represent the data points, and the Y-axes represent the probability density function. The region of the plot with a higher peak is the region with the maximum number of data points between the values.
As shown in Figure 4, the OC percentage was satisfactory throughout the region, and this study evaluated the impact of individual N, P, and K densities on OC proportion (g g-1). Therefore, this variation did not influence the suitability score, and these data could be ignored. In Figure 5, the density plot of N shows two types of quantities in the soil. Most of it was 0.08, whereas another peak was present at 0.2; hence, this became a deciding factor for overall soil quality. In Figure 6, the P percentage is higher in a particular region but lower in others; in the case of a lower N percentage, the P percentage is the deciding factor, along with the K percentage. Negative values of P represent outliers.
Figures 7 and 8 show the density analysis for the proportion of K and water pH. The results show that the density of K is slightly lower, whereas the density of the water pH is quite good for the given region under observation in the proposed use case of Rwanda. The final outcome of soil quality is shown in Figure 9.
The outcomes shown in Figure 9 were produced using the amended version of fuzzy logic. The original sample size was 360 samples; however, a random function was used to increase the sample size programmatically to assess the influence of different samples on the predicted quality. The graphical representation of Figure 9 was obtained by considering the first sample of 360 data points (Sample-1), the second sample of 720 data points (Sample-2), and the third sample of 1,440 data points. Recall the development mechanism of the ruleset constructed using fuzzy logic, and this result was validated by a soil expert. The outcome of processing showed that the majority of cases from the soil were marginally suitable for cultivating potatoes. It should be noted that the outcome obtained is highly subjective of the use case of four different sites in Rwanda from where the samples were collected (represented in the graph as SC-1.0, SC-2.0, SC-3.0, and SC-4.0). According to experts who are also co-authors of this work, fuzzy logic can classify soil with an accuracy of 70%–80%; hence, most of the soil samples came from acrisols of Gicumbi and Lixisols of Rwamagana where Irish potatoes are marginally cultivated. Therefore, fuzzy logic offers a better predictive performance.
The outcomes of the machine learning approach were assessed using performance metrics.
The proposed scheme was assessed through a comparison of the results with a set of frequently used machine learning methods, that is, ANN, Gaussian naïve Bayes, logistic regression, and KNN. The results shown in Figure 10 and Table 3 indicate that ANN offers better predictive accuracy than the other approaches. This suggests that predictive modeling of soil quality using ANNs offers higher reliability. Machine learning contributes to the development of a predictive model that can be used by farmers and agriculturists to understand the possibilities of suitable environments for potato farming. The scalability of this predictive model is that upon feeding any form of dataset, the model can perform predictions associated with soil quality. Based on the predicted outcomes, farmers can finetune the concentration of various fertilizers to ensure better crop cultivation in Rwanda. Hence, it saves time and effort in decision-making for a better yield. From the perspective of novelty, the core idea of the proposed scheme is to develop a cost-effective and simplified computational predictive model that can assess soil quality suitable for potato cultivation in Rwanda. The mechanism of adoption of the proposed model is based on data acquisition and designing a fuzzy processor; applying a machine learning approach for this purpose is a novel notion under the environmental conditions of Rwanda. The model is simple in design and implementation and offers reliable outcomes in easy steps. The basis of this model deployment is to refrain from adopting any advanced data analytical or predictive scheme using machine learning or deep learning, where accuracy is achieved at the cost of the computational burden. However, future work can be conducted to extend the proposed machine learning predictive scheme to its advanced version, ensuring a lower computational cost without affecting accuracy. The next section presents discussion of the results of the proposed study.
From the highlights of the outcome obtained in the previous section, it is noted that the proposed scheme introduces a mechanism that not only can identify the optimal fertilizers required for potato cultivation but also offers a robust classification mechanism to determine the soil quality required for this purpose. There is no doubt that the proposed scheme is implemented considering the use case of potatoes in Rwanda; however, the same model can also be applied to different agricultural crops in different countries. This is possible by altering the input variables of this learning model to predict soil quality, which, at present, is considered only for four different sites in Rwanda.
To understand the contribution of the proposed method to potato production, it is necessary to understand the actual agricultural and cultivation factors required for optimal potato production. The amount of fertilizer to be provided in the soil for potato cultivation depends on the soil test data. The proposed scheme considered standard input data from four agricultural sites in Rwanda to determine the optimal distribution of nutrients. It is also known that NPK fertilizers are administered in the same proportion at the same time during planting; however, the demand for these fertilizers continues to change over a period of cultivation. Fuzzy logic type-2 was applied in the proposed scheme considering the input of soil data to predict soil quality with respect to water quality and quality with respect to individual N, P, and K fertilizers. Further accuracy of this predictive analysis was achieved by deploying machine learning schemes using an ANN. Hence, the model is capable of predicting the change in the quantity of fertilizer required to reach the optimal soil quality to ensure better production.
Although a machine learning approach has been introduced to overcome the constraints of using fuzzy logic for various types of soil, fuzzy logic should not be deemed a low-performing module. It has its own advantages that cannot be achieved using a machine learning approach. Fuzzy logic offers a better platform for introducing a user-friendly logical ruleset that is not only easy to deploy but also flexible to manage. The ruleset can be redefined based on the problem space considered in the investigation. Machine learning contributes to ensuring a higher predictive accuracy by considering more coverage of the soil area. The fuzzy logic analysis showed a higher proportion of OC density (Figure 4) and a lower proportion of N (Figure 5) and P (Figure 6), while the proportion of K (Figure 7) was found to be better than prior values of N and P, which are highly essential for potato cultivation. Furthermore, fuzzy logic was found to offer a predictive performance, which was validated by a soil expert (Figure 10). The analysis of various machine learning schemes demonstrates that ANNs offer better predictive performance than other machine learning approaches. The prime justifications behind this are as follows: less effective feature management system using the Gaussian naïve Bayes algorithm, shortcomings in considering linearity associated with different categories of variables in logistic regression, and less applicability/scalability toward higher dimensions of data in the KNN algorithm. However, it is not difficult to implement an ANN with parallel processing. This not only speeds up processing but also addresses the minimization of errors at each progressive epoch. Hence, the ANN demonstrated better performance in predicting soil quality in this study.
This study presents a simplified predictive modeling for forecasting the soil quality in selected areas of Rwanda, which has been less explored in existing studies. The contributions of the proposed scheme are as follows: (1) the predictive operation is performed using the fuzzy logic and machine learning approaches; both have unique benefits, unlike any form of sophisticated usage of multiple machine learning schemes found in existing studies; (2) the proposed predictive model is highly flexible in its operation and does not solely depend on the environment, which means that it can be applied for predicting soil quality for other crops or other regions, irrespective of any size or dimension of the data; (3) the outcome of the proposed scheme shows that the proposed fuzzy logic scheme offers better predictive performance; (4) the quantified outcome of the machine learning scheme shows that the proposed artificial neural network offers approximately 25% higher accuracy compared with Gaussian naïve Bayes, approximately 32% higher accuracy compared with logistic regression, and approximately 37% higher accuracy compared with the KNN algorithm; and (5) the overall quantified outcome shows that the proposed model is able to show an accuracy of 89%, whereas the implementation of the fuzzy logic algorithm proposed in this system is able to classify the soil with an accuracy of 70% to 80% with shallow learning algorithms.
The numerical outcome of the proposed scheme offers significant guidelines for potato cultivation in Rwanda. Based on this outcome, an agronomist will need to identify the geographical features of Rwanda, followed by the development of a smart ruleset using fuzzy logic. This could facilitate better proportion identification of NPK fertilizers for improving soil quality. Hence, without much demand for re-engineering, this model can be utilized for any geographical location to exhibit better predictive performance than the frequently exercised schemes. Future work will focus on using soil sensors for soil analysis to overcome the challenges of insufficient soil datasets and soil experts to address geographical differences. Future work will also be conducted in the direction of soil map generation to further assess soil quality in the presence of various challenging agricultural conditions.
The datasets are available from the corresponding author upon request.
Structure of the proposed scheme of implementation.
Study area from which the soil data were derived, i.e., Rubavu, Burera, Gicumbi, and Rwamagana.
Architecture of fuzzy logic type 2.
Histogram of OC percent.
N percentage histogram.
P percentage histogram.
K percentage histogram.
Water pH histogram.
Results of the fuzzy logic labeling for the three different samples. The classes are ordered by prevalence.
Comparison of all algorithms.
Algorithm 1. Algorithm for labeling using fuzzy logic type-2..
1. | Add 4 new columns in S |
-c1 for pH quality | |
-c2 for N quality, | |
-c3 for P quality | |
-c4 for K quality | |
2. | Assign quality values according to the study conducted |
3. | If pH quality is 1 |
-if N quality is 1 then quality is 1 | |
-else quality is smallest among p & k | |
4. | Else if pH quality is 2 |
-if n quality is 1 then quality is 1 | |
-else quality is 2 | |
5. | Quality is equal to pH quality |
Table 1. Dataset sample presentation of values obtained from soil analysis.
pH | OC (%) | N (%) | P (ppm) | K (ppm) |
---|---|---|---|---|
5 | 2.88 | 0.09 | 18.8 | 80.3 |
4.89 | 2.93 | 0.07 | 29.3 | 73.5 |
4.94 | 2.91 | 0.07 | 8.37 | 45.1 |
5.3 | 2.78 | 0.10 | 18.8 | 88 |
5.0 | 2.71 | 0.07 | 13.6 | 51 |
4.94 | 2.65 | 0.077 | 8.37 | 87 |
Table 2. Classification of selected soil properties values for potato.
Suitable | Moderately suitable | Marginally suitable | Not suitable | |
---|---|---|---|---|
K (ppm) | >55 | 35–55 | 15–35 | <15 |
P (ppm) | >10 | 6.5–10 | 2.5–6.5 | <2.5 |
N (%) | >0.30 | 0.225–0.30 | 0.125–0.225 | <0.125 |
OC (%) | >0.7 | 0.5–0.7 | 0.3–0.5 | <0.3 |
pH | 5.5–7 | 5–5.5 | 4–5 | <4 |
7–7.5 | 7.5–8 | >8 |
Table 3. Numerical outcome of accuracy-based comparative analysis.
ML approaches | Precision (%) | Recall (%) | F1-score |
---|---|---|---|
Gaussian NB | 0.82 | 0.83 | 0.84 |
ANN | 0.87 | 0.85 | 0.86 |
Logistic regression | 0.79 | 0.75 | 0.77 |
KNN | 0.75 | 0.72 | 0.73 |
LeeChae Jang,TaeKyun Kim,Dal-Won Park,Daniela A. Langova-Orozova
Int. J. Fuzzy Log. Intell. Syst. 2004; 4(1): 90-96Amirthalakshmi Thirumalai Maadapoosi, Velan Balamurugan, V. Vedanarayanan, Sahaya Anselin Nisha, and R. Narmadha
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 231-241 https://doi.org/10.5391/IJFIS.2024.24.3.231Abdul Kareem and Varuna Kumara
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 30-42 https://doi.org/10.5391/IJFIS.2024.24.1.30Structure of the proposed scheme of implementation.
|@|~(^,^)~|@|Study area from which the soil data were derived, i.e., Rubavu, Burera, Gicumbi, and Rwamagana.
|@|~(^,^)~|@|Architecture of fuzzy logic type 2.
|@|~(^,^)~|@|Histogram of OC percent.
|@|~(^,^)~|@|N percentage histogram.
|@|~(^,^)~|@|P percentage histogram.
|@|~(^,^)~|@|K percentage histogram.
|@|~(^,^)~|@|Water pH histogram.
|@|~(^,^)~|@|Results of the fuzzy logic labeling for the three different samples. The classes are ordered by prevalence.
|@|~(^,^)~|@|Comparison of all algorithms.