Article Search
닫기

## Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(4): 373-381

Published online December 25, 2022

https://doi.org/10.5391/IJFIS.2022.22.4.373

© The Korean Institute of Intelligent Systems

## IdVar4CL: Causal Loop Variable Identification Method for Systems Thinking Based on Text Mining Approach

1Department of Software Engineering, Telkom University, Bandung, Indonesia
2Department of Informatics Business, Telkom University, Bandung, Indonesia

Correspondence to :

Received: July 10, 2021; Revised: May 1, 2022; Accepted: September 23, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Systems thinking is a discipline for understanding wholeness and frameworks based on the changing patterns of the interconnectedness of the whole system. The storytelling of a system is a description of the mental model of an individual in describing the state of the environment. There are differences in the interpretation of the system description. This difference occurs because each individual has a different level of systems thinking in terms of experience, learning process, insight, intuition, and assumption in understanding system interactions. This study aims to extract data in the description of the storytelling of a systems thinking case by performing text mining and similarity to identify and find a variable to form causal loop diagrams. Based on the results of this study, there are results in the data extraction from the description of storytelling for the systems thinking case. The conclusions of this study are as follows: First, processing the five documents has successfully identified two documents with the highest similarity value, such as d1 and d3. Second, based on the cosine similarity calculation results and the results of the similarity value, there is a value closest to 1, such as 0.0913166. This value is at the d1 and d3 positions. Third, it produces a variable approach in the form of a group of words used in modeling thinking systems based on a connectedness value greater than 0.50.

Keywords: Systems thinking, Storytelling, Text mining, Similarity, Causal loop diagrams

Systems thinking is a way for an individual to understand the interrelationship of interactions in the system as a whole [13]. According to Sterman [4], systems thinking is a discipline in understanding wholeness and frameworks based on the changing patterns of the interconnectedness of a whole system.

The storytelling of a system is a description of an individual’s mental model in describing the state of the environment. An individual has a difference in interpreting the explanation description of storytelling, and this difference occurs because each individual is influenced by their mental model and systems thinking.

There are differences in the interpretation of the system description. This difference occurs because each individual has a different level of systems thinking in terms of experience, learning process, insight, intuition, and assumptions in understanding system interactions [5, 6]. As a result, if there is the same storyline, each individual will make a difference when determining the variable. This variable is the beginning of understanding systems thinking and is used as a reference in the next stage for modeling systems thinking or system dynamics [7, 8].

Causal loop diagrams present a language for articulating an understanding of the dynamic nature of interrelated systems [9]. In this diagram, sentences are interconnected with key variables, thereby showing a causal relationship. Through several loops, a logical interrelated story can be built regarding a particular problem. According to Kim [10], who referred to guidelines regarding the causal loop diagram design rules, when building this diagram, there is a design focus, such as selecting variable names and loop construction. There are differences in determining the causal loop variable. This difference depends on theme selection, time horizon, behavior over time charts, boundary issues, level of aggregation, and significant delays [11].

In text mining for the activities of text preprocessing, the stages of the process can be adjusted depending on the type of text data and the results required. According to Octavially et al. [12], there is an extraction process for text preprocessing consisting of tokenization, stopword removal, and stemming. In addition, there is a semantic similarity measurement process through the WordNet similarity for Java applications. The results of the extraction process, combined with greedy algorithms, constitute an optimal value solution approach. In addition, there is a method for calculating similarities using the Wu Palmer and Levenshtein method [12, 18].

Referring to the description of the concept above, identifying variables in the system is essential in determining and forming the model using the causal loop diagram. Variable identification must be made because the results of one storytelling description will result in the focus of a different modeling approach. This is because of the difference in an individual’s understanding of a storytelling description. Through text mining, the description of storytelling can be used to identify variables in the form of a collection of words that are related and can be represented in the formation of casual loop events.

This study aims to extract data in the description of the storytelling of a systems thinking case by performing text preprocessing and similarity to identify and find a variable to be used to form causal loop diagrams.

The contributions and novelties of this study are as follows:

• Through the Python NLTK and based on the text description of storytelling, this activity produces case folding, tokenization, stopword removal, and stemming.

• Generate a text weighting value from the results of the document similarity activity.

• Generate a variable approach in the form of a set of words, which will be used in modeling systems thinking.

This section explains all concepts related to understanding the relationship between the systems thinking and causal loops, the relationship between text mining and text preprocessing, and similarity.

### 2.1 System Thinking and Causal Loop

Systems thinking is a discipline for understanding wholeness and frameworks based on the changing patterns of the interconnectedness of the whole system. Storytelling, in the form of text descriptions of a case of phenomena, produces different variables in interpreting the situation of a system [4].

A causal loop diagram (CLD) is used to represent the association of variables in the system dynamics. This diagram presents a language for articulating an understanding of the interrelationships between words that have a causal relationship. A story can be created through a series of multiple loops that are logically related to a problem [10]. Additionally, there is an available concept that follows the coding process to obtain the causal relationships of data explicitly using qualitative analysis software to make the relationship between the final causal map and data sources more transparent. The stages are as follows [15]:

• Identifying concepts and discovering themes in the data.

• Categorizing and aggregating themes into variables.

• Identifying causal relationships.

• Transforming the coding dictionary into causal diagrams.

The relationship between systems thinking and mental models that influence assumptions in understanding the system is related to the fifth discipline for understanding a system [16]. There are several ways to describe the system. Among them is through textual descriptions. Text mining is one of the techniques used in data mining. In simple terms, mining refers to the process of using keywords from a set of words to identify patterns that have meaning or to make predictions [17]. Support for these activities. There is a method for determining the semantic similarity level between pairs of short texts [19]. This method is based on the similarity between word contexts that are built using word embeddings and semantic linkages between concepts based on external sources of knowledge [1921].

### 2.2 Text Mining

In text mining, the implementation is required to prepare for the irregularity of the text data structure to become structured data. This implementation uses several activities for text preprocessing, including [12, 18]

• Case folding for converting to lowercase.

• Tokenization in breaking sentences into words.

• Stopwords removal to eliminate words that have no meaning in the text mining process.

• Stemming/lemmatization to make the root word.

### 2.3 Similarity

Through the term frequency-inverse document frequency (TF-IDF), we determine the weight of each word in each document. This method is used in natural language processing (NLP), text information retrieval, and text mining. The more important a word is, the more it appears in a document. In determining the value of a word, this method uses two elements: TF - term frequency of the term “i” in document “j” and IDF - inverse document frequency of the term “i” [2224]. The multiplication result between the TF and IDF uses the following formula:

tfi=freqi(dj)i=1kfreqi(dj),idfi=log D{d:tid},(tf-idf)ij=tfi(dj)*idfi.

To find the similarity of the TF-IDF weighting results, we can use the formula below.

Sim=cos(a,b)=a.ba·b=a1b1++anbna12++an2+b12++bn2.

The principle of semantic similarity calculation refers to the edge-counting method, which is a set of nodes and root nodes (R). For nodes C1 and C2, the similarity between two elements is calculated. The principle of similarity calculation is based on the distance (N1 and N2), which separates C1 and C2 nodes from the root node, and the distance (N) separated by CS between C1 and C2 from the root node R [13]. The basis for measuring semantic similarity is defined by the formulation of Wu and Palmer [18].

Semantic SimWP=2*N/(N1+N2).

Based on Arora et al. [14], some requirement methods analyze the impact of NLP-based changes. In this method, detection is performed that considers the phrasal structure of the statement of a requirement. The input is in the form of a requirement document that contains the NL requirement statement. The steps of the requirements statement process for this method are as follows:

• Identify the requirement statement phrase.

• Calculate the value of pairwise similarity for all tokens (words) appearing in the identified phrase.

### 3. Fundamental Ideas for the IdVar4CL Method

In Figure 1, the storytelling from systems thinking, in the form of a text description of a phenomenon, produces a different variable. This difference can occur because every individual differently understand the cause and effect that occur in an environment/ system, such as experience, learning process, insight, intuition, and assumption.

The proposed method, called IdVar4CL, implements a process that involves using the text-mining concept approach as a solution to deal with the different identification of storytelling. At the beginning of the formation of the causal loop diagram model, determining variables is important to understand the dynamic system.

The data sources for this research consisted of five documents adopted from the paragraphs of articles (see https://time.com). This article was written by Hillary Leung on January 8, 2020. The following paragraphs were processed through text mining, resulting in the variables used in the causal loop diagram model (see Figure 2).

Based on this paragraph, it consists of five sentences. Therefore, in preparation for using the dataset, the process was further divided into five documents (see Table 1).

In practice, the contribution of this study is the combination of systems thinking (system dynamics) and data mining (text mining). The output is a set of words (variables) used in the causal loop diagram model in system dynamics.

In the extraction process, there is a text preprocessing activity for each document, consisting of case folding, tokenization, stopword removal, and stemming. To measure the semantic similarity, WordNet similarity for Java applications, which includes a method for calculating similarities through the Wu Palmer and Levenshtein methods, can be used as an alternative (see Figure 3).

This section describes the implementation of all the steps in the method. Some steps explain the processing of the dataset to be extracted using text preprocessing. Then, its similarity is measured using cosine similarity and its semantics through WS4J. After successfully obtaining the variables for the causal loop diagram, the final step was to test its validity.

### 6.1 Extraction Process

Referring to the five documents as a dataset, in this activity, text processing is carried out using the natural language toolkit (NLTK). NLTK is a tool used in the field of natural language processing using Python-3 programming language. Text preprocessing activities include the following steps:

• Case folding for changing in the form of lowercase letters

• Tokenization in breaking a sentence into words

• Stemming/Lemmatization to make the essential words

• Stopword removal eliminates words that have no meaning in the text mining process

As shown in Figure 4, there is an example process snippet of paragraphs of articles processed in text preprocessing using case folding. All forms of writing in this paragraph have been changed to lowercase. This paragraph consists of five sentences and uses a text file named “beforecase.txt” and “aftercase.txt” in reading and writing the results of the case folding process.

In the paragraph, which is the result of the case folding process, there are five sentences in which each sentence becomes a dataset of five documents, such as d1, d2, d3, d4, and d5. Figure 5 shows the process for the dataset documents.

After preparing the documents, a tokenization process is carried out to break up the sentences into words, followed by the stemming/lemmatization process such that all the results turn into essential words. For the process of lemmatization, “wordnet” NLTK is used as an English semantic dictionary. The normalized text from the results of lemmatization in five documents that were previously given tokens and then given an index. Examples of this process are apparent in the screenshot presented in Figure 6 regarding the normalization and indexing of all word results.

### 6.2 Similarity

The results of the indexed lemmatization carried out transformation in the form of a TF matrix, which consists of five documents with 68 words in the corpus. The features of Python “tf matrix.shape”, which will produce the output “(5, 68)”, can be utilized to check and determine the truth of the matrix form. The meaning of the output is five documents with 68-word indices Figure 7 shows the term frequency (TF) matrix.

Using the Python library “sklearn” feature, the IDF value was calculated based on the TF matrix results. Based on Figure 8, the IDF values for words that appear in many documents can be determined. There were three different IDF values.

• The IDF value of 2.09861229 that appears in one document

• IDF value 1.69314718 that appears in two documents

• IDF value of 1.40546511 that appears in three documents

This value can be shown through the calculation results of the “import math” library from Python, as shown in Figure 8, regarding the inverse document frequency (IDF).

Subsequently, we calculated the TF-IDF Matrix (5 × 68). After calculating the TF and IDF values, the document value weights were processed using the transformation method. This method multiplies the TF matrix (5 × 68) with the IDF matrix (68 × 68 with IDF for each word on the main diagonal) and divides the TF-IDF with the Euclidean norm. The results are shown in Figure 9 for the TF-IDF matrix data processing results.

The formula cosine similarity can calculate the similarity of an object. Through this process, we searched for similarities between the documents. Based on the results of the processing of document value weights in the TF-IDF process, an experiment has been carried out, with the following steps:

• The two documents with the highest similarity values were selected. This process applies the formulation of cosine similarity using Python. Figure 10 illustrates the process of calculating document similarities.

• Analysis of similarity calculation results. Based on the cosine similarity calculation results, of all the results of the similarity value, there is a value that is closest to 1, such as 0.0913166. This value is at the d1 and d3 positions. Table 2 summarizes the similarity values between documents.

• Perform the stopword removal process on documents d1 and d3. The purpose of this process is to eliminate words that have no meaning in the text mining process. The results can be analyzed using Python, as shown in Figure 11. The process of erasing words has no meaning. The results of document d1 is “death toll severe flooding around Indonesian capital Jakarta is risky 66 parts country continues heavy rain began New Year Eve”. For document d3, the result is “worst floods Indonesia seen since 2013, at least 29 people edited aftermath torrential rains.”

• Semantic linkages between words were calculated. The Wu–Palmer concept of similarity is used in the “WordNet Similarity for Java” applications to calculate the semantics between words contained in documents d1 and d3. An illustration of this process is presented in Figure 12. Calculation of similarities and related semantics.

### 6.3 Variables for Causal Loops

All words used as variables were identified based on similarity results. All words that have a semantic value higher than 0.50 are identified. Then, all words are paired based on the semantic value. The results of this identification presented in Table 3 regarding the identification results of the variables that will be used in the causal loop diagram. For the value limit of 0.50, this study used the middle limit between 0 and 1 such that the semantic value can be more dynamic and balanced.

Based on the results and discussions, there are results in the form of data extraction from the description of storytelling for the systems thinking case. This extraction is performed through text preprocessing and similarity, identifying a variable to be used in forming causal loop diagrams. Three things can be concluded as the core of this research activity, which are as follows:

• Through the Python NLTK and based on the text description of storytelling, this activity produces case folding, tokenization, stopword removal, and stemming/lemmatization, which is applied to five documents (i.e., d1, d2, d3, d4, and d5). The results of processing the five documents successfully identified two documents that had the highest similarity value:

• d1 = “death toll, severe flooding around Indonesian capital Jakarta, 66 risky country parts continue to reel heavy rain began New Year Eve.”

• d3 = “worst floods Indonesia seen since 2013, at least 29 people edited aftermath torrential rains.”

• Generate a text weighting value from the results of the document similarity activity. Based on the Cosine Similarity calculation results, of all the results of the similarity value, there is a value that is closest to 1, namely 0.0913166. This value is at the d1 and d3 positions.

• This produced a variable approach in the form of a group of words, which will be used in modeling Thinking Systems based on a connectedness value greater than 0.50.

For further research, we plan to conduct the following to reach the implementation phase:

• The validity and reliability of all variables generated by the prototype method named “IdVar4CL” are investigated. This research continues to apply the agreement coefficient concept to measure the veracity of all variables produced. This measurement can be carried out by employing the agreement of experts as a reference in text mining and system dynamic activities.

• This variable identification method was tested through the implementation in a systems thinking case study that used the causal loop diagram modeling. It is expected that all variables produced in this study can follow the method of thinking of analyst systems in determining variables when modeling.

• This activity can be used to initiate software that produces Intellectual Property Rights.

This work was supported by the Directorate of Research and Community Service (PPM Tel-U), Department of Software Engineering (RPL), and Department of Informatics Business at Telkom University, Bandung 40257.

Fig. 1.

Fundamental ideas for the IdVar4CL method.

Fig. 2.

Paragraphs as dataset. Source: https://time.com/5761097/jakarta-indonesia-floods

Fig. 3.

Illustration of causal loop variable identification.

Fig. 4.

Case folding.

Fig. 5.

Dataset documents.

Fig. 6.

Normalization and index.

Fig. 7.

Term frequency (TF) matrix.

Fig. 8.

Inverse document frequency (IDF).

Fig. 9.

Preview of TF-IDF matrix results.

Fig. 10.

Cosine similarity.

Fig. 11.

Stopword removal process.

Fig. 12.

Measures semantic similarity/relatedness between words.

Table. 1.

Table 1. Documents.

Document labelingDocument identification
d1The death toll from severe flooding in and around the Indonesian capital of Jakarta has risen to 66 as parts of the country continue to reel from heavy rain that began on New Year’s Eve.
d2Landslides and flash floods have displaced more than 36,000 in Jakarta and the nearby provinces of West Java and Banten, according to the ASEAN Coordinating Center for Humanitarian Assistance (AHA).
d3These are the worst floods Indonesia has seen since 2013, when at least 29 people edited in the aftermath of torrential rains.
d4The disaster, experts say, underscores the impacts of climate change in a country with a capital city that is sinking so quickly that officials are working to move it to another island.
d5The floods are also threatening to exacerbate the already severe wealth inequality that plagues the Southeast Asian nation.

Table. 2.

Table 2. Document similarity value..

d1d2d3d4d5
d110.036202630.09131660.079170810.04962767
d20.0362026310.0330806200.03595653
d30.09131660.03308062100.04534792
d40.079170810010
d50.049627670.035956530.0453479201

Table. 3.

Table 3. Identification of variables for causal loops.

Variable identification resultsValue of connectedness
DeathFloods0.7500
DeathPeople0.5455
DeathAftermath0.7059
DeathRains0.6316
TollFloods0.7059
TollAftermath0.7059
CapitalFloods0.5217
CapitalPeople0.5333
RisenSeen0.5714
RisenDied0.6667
PartsFloods0.6000
PartsPeople0.6000
CountryPeople0.9091
ContinueSeen0.5714
RainFloods0.6316
BeganSeen0.6667
YearFloods0.5333
YearPeople0.6667
EveFloods0.5333

1. Garrity, E (2018). Using Systems Thinking to Understand and Enlarge Mental Models: Helping the Transition to a Sustainable World. Systems. 6, 15.
2. Arnold, RD, and Wade, JP (2015). A definition of systems thinking: A systems approach. Procedia Comput Sci. 44, 669-678.
3. Stave, K, and Hopper, M (2007). What Constitutes Systems Thinking? A Proposed Taxonomy. Int Conf Syst Dyn Soc. 235, 245.
4. Sterman, JD (2002). System Dynamics: Systems Thinking and Modeling for a Complex World. MIT Sloan Sch Manag. 147, 248-249.
5. Meadows, DH (2008). Thinking in Systems: A Primer. White River Junction: VT Chelsea Green Publ
6. Senge, P (1990). The Fifth Discipline, the Art and Practice of the Learning Organization.
7. Plate, R, and Monroe, M (2014). A Structure for Asssessing Systems Thinking. Creat Learn Exch. 23, 1-12.
8. Sweeney, LB, and Sterman, JD (2000). Bathtub dynamics: Initial results of a systems thinking inventory. Syst Dyn Rev. 16, 249-286.
9. Plate, R (2010). Assessing individuals’ understanding of nonlinear causal structures in complex systems. Syst Dyn Rev. 26, 19-33.
10. Kim, DH (2011). Guidelines For Drawing Causal Loop Diagrams. Syst Thinker. 22, 5-7.
11. Richmond, B . System Dynamics/Systems Thinking: Let’s Just Get On With It., Proc. of Int. Syst. Dyn. Conf, 1994, pp.1-24.
12. Octavially, RP, Priyadi, Y, and Widowati, S . Extraction of Activity Diagrams Based on Steps Performed in Use Case Description Using Text Mining (Case Study: SRS Myoffice Application)., Proc. of International Conference on Electrical and Electronic Intelegent System (ICE3IS), 2022.
13. Shenoy, KM, Shet, KC, and Acharya, UD (2012). A New Similarity measure for taxonomy based on edge counting. Int J Web Semant Technol. 3, 23-30.
14. Arora, C, Sabetzadeh, M, Goknil, A, Briand, LC, and Zimmer, F . Change impact analysis for Natural Language requirements: An NLP approach., Proc. of 2015 IEEE 23rd Int. Requir. Eng. Conf. RE 2015 - Proc, 2015, pp.6-15.
15. Eker, S, and Zimmermann, N (2016). Using Textual Data in System Dynamics Model Conceptualization. Systems. 4, 28.
16. Coto, R (2012). Five Disciplines Concerning The Dynamics Of Change. International Journal of Arts & Sciences, Cumberland. 5, 259-275.
17. Kotu, V, and Deshpande, B (2019). Data Science Concepts and Practice. Data Handling in Science and Technology. 2, 282-283.
18. Sari, EJ, Priyadi, Y, and Riskiana, RR . Implementation of Semantic Textual Similarity Between Requirement Specification and Use Case Description Using WUP Method (Case Study: Sipjabs Application)., Proc. of 2022 IEEE World AI IoT Congress (AIIoT), 2022.
19. Nguyen, Hien T (2019). Learning Short-Text Semantic Similarity with Word Embeddings and External Knowledge Sources. Knowledge-Based Systems: Elsevier B.V, pp. 104842
20. Saura, Jose Ramon, and Bennett, Dag R (2019). A Three-Stage Method for Data Text Mining: Using UGC in Business Intelligence Analysis. Symmetry. 11.
21. Saif, H, Fernandez, M, He, Y, and Alani, H. (2013) . Evaluation Datasets for Twitter Sentiment Analysis: A Survey and a New Dataset. Available online: http://ceurws.org/Vol-1096/paper1.pdf
22. Trstenjak, Bruno (2014). KNN with TF-IDF Based Framework for Text Categorization. Procedia Engineering: Elsevier B.V, pp. 1356-64
23. Zhang, W, and Yoshida, T (2011). A comparative study of TF-IDF, LSI and multi-words for text classification. Expert Systems with Applications. 38, 2758-2765.
24. Masudaa, K, Matsuzakib, T, and Tsujiic, J (2011). Semantic Search based on the Online Integration of NLP Techniques. Procedia - Social and Behavioral Sciences. 27, 281-290.

Yudi Priyadi currently actives as a researcher and lecturer at the Department of Software Engineering, Telkom University. He has the competence of teaching and practitioners in Text Mining, Data Management, Web Development, Information System Modeling, Multimedia Action Script, and Information Technology Risk Management.

KrishnaKusumahadi currently actives as a researcher and lecturer in the Department of Informatics Business, Telkom University. He has the competence of teaching and practitioners in Systems Thinking, Data Modelling, and Big Data.

Pramoedya Syachrizalhaq Lyanda currently actives as a researcher and student in the Department of Software Engineering, Telkom University. He is competent in the field of Software Requirement Specification.

### Article

#### Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(4): 373-381

Published online December 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.4.373

## IdVar4CL: Causal Loop Variable Identification Method for Systems Thinking Based on Text Mining Approach

1Department of Software Engineering, Telkom University, Bandung, Indonesia
2Department of Informatics Business, Telkom University, Bandung, Indonesia

Received: July 10, 2021; Revised: May 1, 2022; Accepted: September 23, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

Systems thinking is a discipline for understanding wholeness and frameworks based on the changing patterns of the interconnectedness of the whole system. The storytelling of a system is a description of the mental model of an individual in describing the state of the environment. There are differences in the interpretation of the system description. This difference occurs because each individual has a different level of systems thinking in terms of experience, learning process, insight, intuition, and assumption in understanding system interactions. This study aims to extract data in the description of the storytelling of a systems thinking case by performing text mining and similarity to identify and find a variable to form causal loop diagrams. Based on the results of this study, there are results in the data extraction from the description of storytelling for the systems thinking case. The conclusions of this study are as follows: First, processing the five documents has successfully identified two documents with the highest similarity value, such as d1 and d3. Second, based on the cosine similarity calculation results and the results of the similarity value, there is a value closest to 1, such as 0.0913166. This value is at the d1 and d3 positions. Third, it produces a variable approach in the form of a group of words used in modeling thinking systems based on a connectedness value greater than 0.50.

Keywords: Systems thinking, Storytelling, Text mining, Similarity, Causal loop diagrams

### 1. Introduction

Systems thinking is a way for an individual to understand the interrelationship of interactions in the system as a whole [13]. According to Sterman [4], systems thinking is a discipline in understanding wholeness and frameworks based on the changing patterns of the interconnectedness of a whole system.

The storytelling of a system is a description of an individual’s mental model in describing the state of the environment. An individual has a difference in interpreting the explanation description of storytelling, and this difference occurs because each individual is influenced by their mental model and systems thinking.

There are differences in the interpretation of the system description. This difference occurs because each individual has a different level of systems thinking in terms of experience, learning process, insight, intuition, and assumptions in understanding system interactions [5, 6]. As a result, if there is the same storyline, each individual will make a difference when determining the variable. This variable is the beginning of understanding systems thinking and is used as a reference in the next stage for modeling systems thinking or system dynamics [7, 8].

Causal loop diagrams present a language for articulating an understanding of the dynamic nature of interrelated systems [9]. In this diagram, sentences are interconnected with key variables, thereby showing a causal relationship. Through several loops, a logical interrelated story can be built regarding a particular problem. According to Kim [10], who referred to guidelines regarding the causal loop diagram design rules, when building this diagram, there is a design focus, such as selecting variable names and loop construction. There are differences in determining the causal loop variable. This difference depends on theme selection, time horizon, behavior over time charts, boundary issues, level of aggregation, and significant delays [11].

In text mining for the activities of text preprocessing, the stages of the process can be adjusted depending on the type of text data and the results required. According to Octavially et al. [12], there is an extraction process for text preprocessing consisting of tokenization, stopword removal, and stemming. In addition, there is a semantic similarity measurement process through the WordNet similarity for Java applications. The results of the extraction process, combined with greedy algorithms, constitute an optimal value solution approach. In addition, there is a method for calculating similarities using the Wu Palmer and Levenshtein method [12, 18].

Referring to the description of the concept above, identifying variables in the system is essential in determining and forming the model using the causal loop diagram. Variable identification must be made because the results of one storytelling description will result in the focus of a different modeling approach. This is because of the difference in an individual’s understanding of a storytelling description. Through text mining, the description of storytelling can be used to identify variables in the form of a collection of words that are related and can be represented in the formation of casual loop events.

This study aims to extract data in the description of the storytelling of a systems thinking case by performing text preprocessing and similarity to identify and find a variable to be used to form causal loop diagrams.

The contributions and novelties of this study are as follows:

• Through the Python NLTK and based on the text description of storytelling, this activity produces case folding, tokenization, stopword removal, and stemming.

• Generate a text weighting value from the results of the document similarity activity.

• Generate a variable approach in the form of a set of words, which will be used in modeling systems thinking.

### 2. Related Works

This section explains all concepts related to understanding the relationship between the systems thinking and causal loops, the relationship between text mining and text preprocessing, and similarity.

### 2.1 System Thinking and Causal Loop

Systems thinking is a discipline for understanding wholeness and frameworks based on the changing patterns of the interconnectedness of the whole system. Storytelling, in the form of text descriptions of a case of phenomena, produces different variables in interpreting the situation of a system [4].

A causal loop diagram (CLD) is used to represent the association of variables in the system dynamics. This diagram presents a language for articulating an understanding of the interrelationships between words that have a causal relationship. A story can be created through a series of multiple loops that are logically related to a problem [10]. Additionally, there is an available concept that follows the coding process to obtain the causal relationships of data explicitly using qualitative analysis software to make the relationship between the final causal map and data sources more transparent. The stages are as follows [15]:

• Identifying concepts and discovering themes in the data.

• Categorizing and aggregating themes into variables.

• Identifying causal relationships.

• Transforming the coding dictionary into causal diagrams.

The relationship between systems thinking and mental models that influence assumptions in understanding the system is related to the fifth discipline for understanding a system [16]. There are several ways to describe the system. Among them is through textual descriptions. Text mining is one of the techniques used in data mining. In simple terms, mining refers to the process of using keywords from a set of words to identify patterns that have meaning or to make predictions [17]. Support for these activities. There is a method for determining the semantic similarity level between pairs of short texts [19]. This method is based on the similarity between word contexts that are built using word embeddings and semantic linkages between concepts based on external sources of knowledge [1921].

### 2.2 Text Mining

In text mining, the implementation is required to prepare for the irregularity of the text data structure to become structured data. This implementation uses several activities for text preprocessing, including [12, 18]

• Case folding for converting to lowercase.

• Tokenization in breaking sentences into words.

• Stopwords removal to eliminate words that have no meaning in the text mining process.

• Stemming/lemmatization to make the root word.

### 2.3 Similarity

Through the term frequency-inverse document frequency (TF-IDF), we determine the weight of each word in each document. This method is used in natural language processing (NLP), text information retrieval, and text mining. The more important a word is, the more it appears in a document. In determining the value of a word, this method uses two elements: TF - term frequency of the term “i” in document “j” and IDF - inverse document frequency of the term “i” [2224]. The multiplication result between the TF and IDF uses the following formula:

$tfi=freqi(dj)∫i=1kfreqi(dj),$$idfi=log ∣D∣∣{d:ti∈d}∣,$$(tf-idf)ij=tfi(dj)*idfi.$

To find the similarity of the TF-IDF weighting results, we can use the formula below.

$Sim=cos(a,b)=a.b‖a‖·‖b‖=a1b1+⋯+anbna12+⋯+an2+b12+⋯+bn2.$

The principle of semantic similarity calculation refers to the edge-counting method, which is a set of nodes and root nodes (R). For nodes C1 and C2, the similarity between two elements is calculated. The principle of similarity calculation is based on the distance (N1 and N2), which separates C1 and C2 nodes from the root node, and the distance (N) separated by CS between C1 and C2 from the root node R [13]. The basis for measuring semantic similarity is defined by the formulation of Wu and Palmer [18].

$Semantic SimWP=2*N/(N1+N2).$

Based on Arora et al. [14], some requirement methods analyze the impact of NLP-based changes. In this method, detection is performed that considers the phrasal structure of the statement of a requirement. The input is in the form of a requirement document that contains the NL requirement statement. The steps of the requirements statement process for this method are as follows:

• Identify the requirement statement phrase.

• Calculate the value of pairwise similarity for all tokens (words) appearing in the identified phrase.

### 3. Fundamental Ideas for the IdVar4CL Method

In Figure 1, the storytelling from systems thinking, in the form of a text description of a phenomenon, produces a different variable. This difference can occur because every individual differently understand the cause and effect that occur in an environment/ system, such as experience, learning process, insight, intuition, and assumption.

The proposed method, called IdVar4CL, implements a process that involves using the text-mining concept approach as a solution to deal with the different identification of storytelling. At the beginning of the formation of the causal loop diagram model, determining variables is important to understand the dynamic system.

### 4. Datasets

The data sources for this research consisted of five documents adopted from the paragraphs of articles (see https://time.com). This article was written by Hillary Leung on January 8, 2020. The following paragraphs were processed through text mining, resulting in the variables used in the causal loop diagram model (see Figure 2).

Based on this paragraph, it consists of five sentences. Therefore, in preparation for using the dataset, the process was further divided into five documents (see Table 1).

### 5. Methodology

In practice, the contribution of this study is the combination of systems thinking (system dynamics) and data mining (text mining). The output is a set of words (variables) used in the causal loop diagram model in system dynamics.

In the extraction process, there is a text preprocessing activity for each document, consisting of case folding, tokenization, stopword removal, and stemming. To measure the semantic similarity, WordNet similarity for Java applications, which includes a method for calculating similarities through the Wu Palmer and Levenshtein methods, can be used as an alternative (see Figure 3).

### 6. Result and Discussion

This section describes the implementation of all the steps in the method. Some steps explain the processing of the dataset to be extracted using text preprocessing. Then, its similarity is measured using cosine similarity and its semantics through WS4J. After successfully obtaining the variables for the causal loop diagram, the final step was to test its validity.

### 6.1 Extraction Process

Referring to the five documents as a dataset, in this activity, text processing is carried out using the natural language toolkit (NLTK). NLTK is a tool used in the field of natural language processing using Python-3 programming language. Text preprocessing activities include the following steps:

• Case folding for changing in the form of lowercase letters

• Tokenization in breaking a sentence into words

• Stemming/Lemmatization to make the essential words

• Stopword removal eliminates words that have no meaning in the text mining process

As shown in Figure 4, there is an example process snippet of paragraphs of articles processed in text preprocessing using case folding. All forms of writing in this paragraph have been changed to lowercase. This paragraph consists of five sentences and uses a text file named “beforecase.txt” and “aftercase.txt” in reading and writing the results of the case folding process.

In the paragraph, which is the result of the case folding process, there are five sentences in which each sentence becomes a dataset of five documents, such as d1, d2, d3, d4, and d5. Figure 5 shows the process for the dataset documents.

After preparing the documents, a tokenization process is carried out to break up the sentences into words, followed by the stemming/lemmatization process such that all the results turn into essential words. For the process of lemmatization, “wordnet” NLTK is used as an English semantic dictionary. The normalized text from the results of lemmatization in five documents that were previously given tokens and then given an index. Examples of this process are apparent in the screenshot presented in Figure 6 regarding the normalization and indexing of all word results.

### 6.2 Similarity

The results of the indexed lemmatization carried out transformation in the form of a TF matrix, which consists of five documents with 68 words in the corpus. The features of Python “tf matrix.shape”, which will produce the output “(5, 68)”, can be utilized to check and determine the truth of the matrix form. The meaning of the output is five documents with 68-word indices Figure 7 shows the term frequency (TF) matrix.

Using the Python library “sklearn” feature, the IDF value was calculated based on the TF matrix results. Based on Figure 8, the IDF values for words that appear in many documents can be determined. There were three different IDF values.

• The IDF value of 2.09861229 that appears in one document

• IDF value 1.69314718 that appears in two documents

• IDF value of 1.40546511 that appears in three documents

This value can be shown through the calculation results of the “import math” library from Python, as shown in Figure 8, regarding the inverse document frequency (IDF).

Subsequently, we calculated the TF-IDF Matrix (5 × 68). After calculating the TF and IDF values, the document value weights were processed using the transformation method. This method multiplies the TF matrix (5 × 68) with the IDF matrix (68 × 68 with IDF for each word on the main diagonal) and divides the TF-IDF with the Euclidean norm. The results are shown in Figure 9 for the TF-IDF matrix data processing results.

The formula cosine similarity can calculate the similarity of an object. Through this process, we searched for similarities between the documents. Based on the results of the processing of document value weights in the TF-IDF process, an experiment has been carried out, with the following steps:

• The two documents with the highest similarity values were selected. This process applies the formulation of cosine similarity using Python. Figure 10 illustrates the process of calculating document similarities.

• Analysis of similarity calculation results. Based on the cosine similarity calculation results, of all the results of the similarity value, there is a value that is closest to 1, such as 0.0913166. This value is at the d1 and d3 positions. Table 2 summarizes the similarity values between documents.

• Perform the stopword removal process on documents d1 and d3. The purpose of this process is to eliminate words that have no meaning in the text mining process. The results can be analyzed using Python, as shown in Figure 11. The process of erasing words has no meaning. The results of document d1 is “death toll severe flooding around Indonesian capital Jakarta is risky 66 parts country continues heavy rain began New Year Eve”. For document d3, the result is “worst floods Indonesia seen since 2013, at least 29 people edited aftermath torrential rains.”

• Semantic linkages between words were calculated. The Wu–Palmer concept of similarity is used in the “WordNet Similarity for Java” applications to calculate the semantics between words contained in documents d1 and d3. An illustration of this process is presented in Figure 12. Calculation of similarities and related semantics.

### 6.3 Variables for Causal Loops

All words used as variables were identified based on similarity results. All words that have a semantic value higher than 0.50 are identified. Then, all words are paired based on the semantic value. The results of this identification presented in Table 3 regarding the identification results of the variables that will be used in the causal loop diagram. For the value limit of 0.50, this study used the middle limit between 0 and 1 such that the semantic value can be more dynamic and balanced.

### 7. Conclusion and Future Work

Based on the results and discussions, there are results in the form of data extraction from the description of storytelling for the systems thinking case. This extraction is performed through text preprocessing and similarity, identifying a variable to be used in forming causal loop diagrams. Three things can be concluded as the core of this research activity, which are as follows:

• Through the Python NLTK and based on the text description of storytelling, this activity produces case folding, tokenization, stopword removal, and stemming/lemmatization, which is applied to five documents (i.e., d1, d2, d3, d4, and d5). The results of processing the five documents successfully identified two documents that had the highest similarity value:

• d1 = “death toll, severe flooding around Indonesian capital Jakarta, 66 risky country parts continue to reel heavy rain began New Year Eve.”

• d3 = “worst floods Indonesia seen since 2013, at least 29 people edited aftermath torrential rains.”

• Generate a text weighting value from the results of the document similarity activity. Based on the Cosine Similarity calculation results, of all the results of the similarity value, there is a value that is closest to 1, namely 0.0913166. This value is at the d1 and d3 positions.

• This produced a variable approach in the form of a group of words, which will be used in modeling Thinking Systems based on a connectedness value greater than 0.50.

For further research, we plan to conduct the following to reach the implementation phase:

• The validity and reliability of all variables generated by the prototype method named “IdVar4CL” are investigated. This research continues to apply the agreement coefficient concept to measure the veracity of all variables produced. This measurement can be carried out by employing the agreement of experts as a reference in text mining and system dynamic activities.

• This variable identification method was tested through the implementation in a systems thinking case study that used the causal loop diagram modeling. It is expected that all variables produced in this study can follow the method of thinking of analyst systems in determining variables when modeling.

• This activity can be used to initiate software that produces Intellectual Property Rights.

### Fig 1.

Figure 1.

Fundamental ideas for the IdVar4CL method.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 2.

Figure 2.

Paragraphs as dataset. Source: https://time.com/5761097/jakarta-indonesia-floods

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 3.

Figure 3.

Illustration of causal loop variable identification.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 4.

Figure 4.

Case folding.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 5.

Figure 5.

Dataset documents.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 6.

Figure 6.

Normalization and index.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 7.

Figure 7.

Term frequency (TF) matrix.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 8.

Figure 8.

Inverse document frequency (IDF).

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 9.

Figure 9.

Preview of TF-IDF matrix results.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 10.

Figure 10.

Cosine similarity.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 11.

Figure 11.

Stopword removal process.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

### Fig 12.

Figure 12.

Measures semantic similarity/relatedness between words.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 373-381https://doi.org/10.5391/IJFIS.2022.22.4.373

Documents.

Document labelingDocument identification
d1The death toll from severe flooding in and around the Indonesian capital of Jakarta has risen to 66 as parts of the country continue to reel from heavy rain that began on New Year’s Eve.
d2Landslides and flash floods have displaced more than 36,000 in Jakarta and the nearby provinces of West Java and Banten, according to the ASEAN Coordinating Center for Humanitarian Assistance (AHA).
d3These are the worst floods Indonesia has seen since 2013, when at least 29 people edited in the aftermath of torrential rains.
d4The disaster, experts say, underscores the impacts of climate change in a country with a capital city that is sinking so quickly that officials are working to move it to another island.
d5The floods are also threatening to exacerbate the already severe wealth inequality that plagues the Southeast Asian nation.

Document similarity value..

d1d2d3d4d5
d110.036202630.09131660.079170810.04962767
d20.0362026310.0330806200.03595653
d30.09131660.03308062100.04534792
d40.079170810010
d50.049627670.035956530.0453479201

Identification of variables for causal loops.

Variable identification resultsValue of connectedness
DeathFloods0.7500
DeathPeople0.5455
DeathAftermath0.7059
DeathRains0.6316
TollFloods0.7059
TollAftermath0.7059
CapitalFloods0.5217
CapitalPeople0.5333
RisenSeen0.5714
RisenDied0.6667
PartsFloods0.6000
PartsPeople0.6000
CountryPeople0.9091
ContinueSeen0.5714
RainFloods0.6316
BeganSeen0.6667
YearFloods0.5333
YearPeople0.6667
EveFloods0.5333

### References

1. Garrity, E (2018). Using Systems Thinking to Understand and Enlarge Mental Models: Helping the Transition to a Sustainable World. Systems. 6, 15.
2. Arnold, RD, and Wade, JP (2015). A definition of systems thinking: A systems approach. Procedia Comput Sci. 44, 669-678.
3. Stave, K, and Hopper, M (2007). What Constitutes Systems Thinking? A Proposed Taxonomy. Int Conf Syst Dyn Soc. 235, 245.
4. Sterman, JD (2002). System Dynamics: Systems Thinking and Modeling for a Complex World. MIT Sloan Sch Manag. 147, 248-249.
5. Meadows, DH (2008). Thinking in Systems: A Primer. White River Junction: VT Chelsea Green Publ
6. Senge, P (1990). The Fifth Discipline, the Art and Practice of the Learning Organization.
7. Plate, R, and Monroe, M (2014). A Structure for Asssessing Systems Thinking. Creat Learn Exch. 23, 1-12.
8. Sweeney, LB, and Sterman, JD (2000). Bathtub dynamics: Initial results of a systems thinking inventory. Syst Dyn Rev. 16, 249-286.
9. Plate, R (2010). Assessing individuals’ understanding of nonlinear causal structures in complex systems. Syst Dyn Rev. 26, 19-33.
10. Kim, DH (2011). Guidelines For Drawing Causal Loop Diagrams. Syst Thinker. 22, 5-7.
11. Richmond, B . System Dynamics/Systems Thinking: Let’s Just Get On With It., Proc. of Int. Syst. Dyn. Conf, 1994, pp.1-24.
12. Octavially, RP, Priyadi, Y, and Widowati, S . Extraction of Activity Diagrams Based on Steps Performed in Use Case Description Using Text Mining (Case Study: SRS Myoffice Application)., Proc. of International Conference on Electrical and Electronic Intelegent System (ICE3IS), 2022.
13. Shenoy, KM, Shet, KC, and Acharya, UD (2012). A New Similarity measure for taxonomy based on edge counting. Int J Web Semant Technol. 3, 23-30.
14. Arora, C, Sabetzadeh, M, Goknil, A, Briand, LC, and Zimmer, F . Change impact analysis for Natural Language requirements: An NLP approach., Proc. of 2015 IEEE 23rd Int. Requir. Eng. Conf. RE 2015 - Proc, 2015, pp.6-15.
15. Eker, S, and Zimmermann, N (2016). Using Textual Data in System Dynamics Model Conceptualization. Systems. 4, 28.
16. Coto, R (2012). Five Disciplines Concerning The Dynamics Of Change. International Journal of Arts & Sciences, Cumberland. 5, 259-275.
17. Kotu, V, and Deshpande, B (2019). Data Science Concepts and Practice. Data Handling in Science and Technology. 2, 282-283.
18. Sari, EJ, Priyadi, Y, and Riskiana, RR . Implementation of Semantic Textual Similarity Between Requirement Specification and Use Case Description Using WUP Method (Case Study: Sipjabs Application)., Proc. of 2022 IEEE World AI IoT Congress (AIIoT), 2022.
19. Nguyen, Hien T (2019). Learning Short-Text Semantic Similarity with Word Embeddings and External Knowledge Sources. Knowledge-Based Systems: Elsevier B.V, pp. 104842
20. Saura, Jose Ramon, and Bennett, Dag R (2019). A Three-Stage Method for Data Text Mining: Using UGC in Business Intelligence Analysis. Symmetry. 11.
21. Saif, H, Fernandez, M, He, Y, and Alani, H. (2013) . Evaluation Datasets for Twitter Sentiment Analysis: A Survey and a New Dataset. Available online: http://ceurws.org/Vol-1096/paper1.pdf
22. Trstenjak, Bruno (2014). KNN with TF-IDF Based Framework for Text Categorization. Procedia Engineering: Elsevier B.V, pp. 1356-64
23. Zhang, W, and Yoshida, T (2011). A comparative study of TF-IDF, LSI and multi-words for text classification. Expert Systems with Applications. 38, 2758-2765.
24. Masudaa, K, Matsuzakib, T, and Tsujiic, J (2011). Semantic Search based on the Online Integration of NLP Techniques. Procedia - Social and Behavioral Sciences. 27, 281-290.