International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338
Published online September 25, 2022
https://doi.org/10.5391/IJFIS.2022.22.3.325
© The Korean Institute of Intelligent Systems
Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3
1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India
Correspondence to :
Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.
Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network
A rumor is described as the circulation of false messages of events, objects, and issues to a group of people. Social media is the easiest platform for propagating rumors or false information in the public [1–4]. Twitter has evolved as a significant source of information sharing across all social media platforms where, on average, millions of tweets are shared per day, and each user can tweet up to 140-character long messages or share files, such as pictures or videos. These tweets sometimes contain relevant information, such as public announcements by governing organizations and sharing of events, personal experiences, beliefs, and thoughts. Camouflaged among these data, rumors are widely spreading and destroying the credibility of this information. This problem is elevated owing to crowdsourcing, where a rumor can be backed by many users because of their lack of knowledge regarding the matter and beliefs. To address this issue, much work has been done in the research community to find a prominent solution. This research attempts to understand the root causes of rumor generation on Twitter and which features may help classify rumors. In previous studies, machine learning methods, such as logistic regression, decision tree, support vector machine (SVM), naive Bayes, and k-nearest neighbor, have been widely used for text categorization [1,5]. Word-embedding approaches have been widely used for word vector generation [5]. In this context, deep learning methods including recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional LSTMs (Bi-LSTM), gated recurrent units, and convolutional neural networks are also being applied in the field of text analysis [6]. Previous research has focused on word vector features, and recently, researchers have focused on other features of tweets. In this manner, user metadata and reaction (sentiment) features are used to obtain accurate results. User metadata have numerical and categorical features, which are important for identifying rumors. The features have different values, and their values must be scaled. Log transformation and standard feature methods are widely used for feature scaling. The correlation between the features or attributes helps identify the features that can be used to identify the results. In this study, four different feature models that use different features of tweet data and classify the results as rumor and non-rumor using neural network architecture were used for analysis. All the models were developed as a supervised learning task on a publicly available dataset from Twitter from which tweet-id and labels were gathered; then using the respective tweet-id, tweets were collected in JSON format using the Twitter API.
In previous studies, the main focus was text analysis with text sequential features for classifying rumors and non-rumors. In previous studies, there were some research gaps and questions that arose, such as how to automatically identify rumor text from large unstructured data and what is the main focus of the research on automatically identifying rumor texts. The improvement of this study over previous research is as follows. This research has provided solutions to these questions by focusing on different features, such as reaction and user metadata features, along with the text features or attributes of the tweets. Previous research was based on word-embedding features of the text, and some researchers suggested sentiment features of the post; however, in this study, other features have also been used, and these features provide input for the neural network architecture. In this research, we cannot directly combine the metadata feature, reaction features, and sentences; thus, we separately trained all the models with different features or attributes; then, we combined all the weights of different models and trained the model with the same dataset.
The primary goal of this research is to identify tweet attributes or features, such as tweet text, user information, and other tweet metadata, by which we can determine the veracity of a tweet. The objective of this study was to determine the numerical importance of certain features or attributes. The primary goal of this study was to develop an automated method for accurately identifying rumors by comparing it to existing models that employ a variety of combined features from tweets, including word embedding, reaction, and metadata features. This work focuses on finding attribute importance for classification, whereas previous work aimed to find better techniques for classification.
The objective of this study was to identify important features to distinguish rumor-spreading tweets from authentic tweets. For this purpose and because the dataset comprised various data formats, four models were developed. (1) Bi-LSTM model for tweet text, which was mapped to the target class to check how much tweet text can classify tweets as rumors. (2) Neural model for tweet metadata trained with attributes such as user information (e.g., user account age and user follower count) and tweet information (such as count, retweet count, share count, and tweet text sentiment), which was also mapped to the target class to check how much tweet metadata can classify tweets as rumors. (3) Bi-LSTM model for comments on tweets; for each tweet, the first 20 comments were collected and used to train the model to understand the impact of comments on tweet rumor classification. (4) Ensemble model to merge the above models and train them with the target class to determine whether combining models trained on different parts of tweets has an effect.
The remainder of this paper is organized as follows: Section 2 contains related works. The conceptual underpinning of this study is presented in Section 3, which also provides a conceptual background for areas related to model development. Section 4 defines the loss function, correlation functions, weight normalization techniques, and other methods used in model development. It includes the mathematical equations of the techniques and the justification of their use in our model development. Finally, the proposed alternative methods of rumor classification are also presented in Section 4. The experimental evaluation is presented in Section 5, and an analysis of the results is presented in Section 6. Section 7 concludes the paper by outlining the future goals of the research.
In previous studies, models for rumor detection using state-of-the-art deep learning approaches with different text features, metadata features, and user profile features were proposed. In [1], a future research direction on rumor detection that was data-oriented, feature-oriented, model-oriented, and application-oriented was suggested. To learn hidden representations, researchers [6] suggested a model based on RNNs. Owing to extra hidden layers, experimental results on datasets from two real-world microblog platforms showed that the RNN outperformed state-of-the-art rumor detection models that used handcrafted features and that the RNN-based method detected rumors more quickly and accurately than existing techniques. A unique machine learning-based approach for the automatic detection of users distributing rumor information was proposed in [3]. The study used the concept of believability and the retweet network using word embedding to ensure user trust. In another study [7], researchers combined a feature engineering and deep learning model and developed a hierarchical social attention model that led to the improvement of rumor detection. In [8], a hybrid SVM classifier was proposed based on a graph kernel that captures high-order propagation patterns as well as semantic information, such as topics and sentiments. A deep attention model based on RNNs was presented in [1] to learn hidden representations of consecutive posts for spotting rumors. The proposed model used soft attention to probe recurrence to extract different features with a specific emphasis while simultaneously producing hidden representations that capture contextual fluctuations of relevant posts over time. Another study [9] investigated whether historical data-based knowledge could potentially aid in the detection of freshly formed rumors. We found and applied patterns from prior labeled data to assist in detecting emerging rumors because a disputed factual claim arouses specific behaviors, such as curiosity, skepticism, and astonishment. In [10], experiments were conducted to craft new features representing rumor diffusion and use them to correlate the different features. In the early detection of rumors, a study [10] proposed a probabilistic model based on eminent features, such as user and linguistic features. In [11], a method for early rumor identification that used convolutional neural networks to learn the hidden representations of individual rumor-related tweets to gain insights about their believability was proposed. Another study [12] presented a unique rumor detection approach that learned from the sequential reporting of news on social media to detect rumors in fresh stories. When compared with a state-of-the-art rumor detection system, the results show that the proposed classifier outperformed the state-of-the-art classifier. Furthermore, researchers [6] presented the propagation tree kernel, a kernel-based technique that captures high-order patterns distinguishing different types of rumors by comparing their propagation tree architectures. State-of-the-art rumor detection algorithms were outperformed by the proposed kernel-based methodology, which detected rumors faster and more accurately. Previous research has shown the importance of different social media features to help obtain more accurate results of rumor classification. A study [13] suggested a statistical and automatic approach based on a wide range of different attributes. In this approach, user profile attributes are used to develop a supervised learning model for rumor detection.
Real-world databases are highly susceptible to noisy, missing, incomplete, and inconsistent data owing to their large size and origin from various heterogeneous sources. Low-quality data result in low-quality mining outcomes. Data cleansing, integration, transformation, and reduction are just a few of the data pre-processing techniques accessible. In this research, many tweets contain noise and inconsistencies, for example,
• Many tweet texts and comments were of different origin, had much numerical data, emoji, and words that were shortened, which needed to be replaced by the right words.
• Many tweets had missing comment sections and metadata.
• Most metadata were skewed.
• Text was vectorized as per word embedding.
A tweet has different types of features that are used for information extraction. In previous studies, tweet text features using word-embedding methods were used to generate word vectors and analyze the results. In the present study, the focus was on tweet user features [7], tweet metadata features, and sentimental features (follower count, reply count, retweet count, verified, favorite count, number of symbols, profile image URL, user mentions, number of hashtags, default profile image, number of URLs, polarity, text length, location, post age, status count, friend count, account age, favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity) [4,14].
In tweet text analysis, the word-embedding model has been used to find the vectors of tweets, and these vectors have been input for text classification methods. In previous studies, the TF-IDF, word2vec, GloVe, and fastText word-embedding models have been widely used for text classification. In this study, the fastText word-embedding model was used for word vector generation [5,15,17]. This word-embedding method is an extension of the word2vec model; it represents each word as an n-gram of characters instead of learning vectors for words directly.
For the non-categorical skewed features, the log-transform method, and the standard scalar method were used. The standard scalar method standardizes a feature by removing the mean and scaling it to the unit variance. The log-transform method transforms the data into a normal distribution for more reliable prediction. The standard deviation was used to divide all numbers to obtain the unit variance.
To find the correlation between user metadata features or attributes, the point biserial method of Pearson correlation was used to identify the correlation between the features. It is used to determine the correlation between a continuous variable and a binary variable (
where M1 represents the mean of the positive binary variable group, M0 represents the mean of the group with a negative binary variable, sn represents the standard deviation, p represents the proportion of cases in the “0” group, q represents the proportion of cases in the “1” group, and the resultant correlation coefficient ranges between 0 and 1. If rpb > 0, then it signifies a positive correlation between the features.
The activation function of the layer determines the output of the layer given a set of inputs. The rectified linear unit (ReLU) and sigmoid activation functions were used in the implemented neural network models, and the ReLU function gives the output directly if the output is positive; otherwise, it gives 0, and the output range is between (0, ∞) (
The sigmoid function is a real, bounded, and differentiable function that defines all the input values, and the output range is (0, 1) (
Weight initialization is a procedure for setting the initial weights of a neural network using different methods, and it is used in the implemented neural network models. There are two methods of weight initialization: He-normal and Glorot-normal. In He-normal, the weights are initialized by considering the size of the previous layer. This helps achieve a faster and more accurate gradient descent. It draws weights from a normal distribution centered on 0 and with a standard deviation, as calculated from the formula below. For ReLU function,
where Y is the output of the layer before applying the activation function, (w1, w2, w3, ..., wn) are the weights of the respective nodes (x1, x2, x3, ..., xn), and fanin is the number of input units.
In Glorot normal, we used fanin and fanout to determine the initial weights of the nodes. It draws weights from a normal distribution centered on 0 and with a standard deviation, as calculated from the formula below:
where fanin is the number of input units and fanout is the number of output units. A Glorot-normal function was used with sigmoid activation functions in the implemented neural networks.
Batch normalization is a technique that transforms the data in such a way that the mean becomes 0 and the standard deviation becomes 1. When the input to the first layer of the neural network is already normalized, the output from the activation functions may no longer be on the same scale, resulting in internal covariate shifts in the data. To avoid this problem, batch normalization was applied to each input. Batch normalization is a two-step process in which the input is normalized, and rescaling and offsetting are performed later. In this method, we first computed the mean using the input from the
where hi are the inputs from layer h, and m is the number of neurons in layer h. Second, the standard deviation of hidden activations is calculated using (
Subsequently, the mean is subtracted from each input and divided by the sum of the standard deviations and smoothing term
Finally, rescaling parameter
The loss function is used to determine the error between the predicted and desired outputs. The primary loss function used for classification problems was cross-entropy [2,17]. In this study, we categorized tweets into rumor or non-rumor data; hence, binary cross-entropy was used (
where ŷ is the desired output, and y is the predicted output.
A Bi-LSTM is a sequence-processing model that consists of two LSTMs: one that takes the input forward and the other that takes it backward. Bi-LSTMs effectively improve the quantity of data available to the network and provide the algorithm with more context. This strategy is used to determine which words come after and before a word in a sentence. This is an important method in text classification that has been used for sequential learning [7,18].
In this research, the PHEME, Twitter15, and Twitter16 datasets were used to evaluate all models. The total number of tweets in these datasets is listed in Tables 1 and 2.
A collection of rumors and non-rumors reported on Twitter during breaking news events can be found in the PHEME dataset. It includes rumors about nine different events, and each rumor is annotated with its veracity value, which may be true, false, or unverified (Table 2).
The original data were used from the PHEME, Twitter15, and Twitter16 datasets, whose tweets were re-fetched over a period of 2 days to obtain the latest metadata values. No specific keyword was used for crawling; the data were fetched from Twitter using the tweet IDs present in the old dataset.
Previous research has suggested text classification approaches that use different text and metadata features of tweets for rumor detection. In this study, we analyzed the results using four different models of tweet classification, all of which use different features. The first model used word vector features, the second used user and metadata features, the third used tweet reaction features, and the fourth was an ensemble model that combined all three models and extracted the probabilities of each model on complete datasets for tweet text classification. The loss and accuracy of these models were calculated using deep learning classification approaches. The class target in this research was the authenticity of tweets classified into whether a tweet was a rumor or not.
In this model, word vectors were used for text sentence classification. The text was preprocessed by removing hyperlinks, emojis, hashtags, mentions, stop words, and numbers, replacing the contracted words with their expanded forms, and replacing the abbreviated words with their true forms. After these processes, tokenization and fastText word embedding were applied to create the embedding matrix. The generated vectors were input in the deep learning Bi-LSTM sequential methods. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 1).
In this model, tweet user metadata features were used for sentence classification. The Twitter features were pre-processed by applying the log-transformation method for data transformation and determining the correlation between the features using the Pearson correlation method. The generated features were input into the sequential TensorFlow model. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 2).
In this model, tweet reaction features were used for sentence classification. In the first step, the tweet texts were cleaned for similar text classification using word vectors, and then the tweet data were lemmatized. In the second step, the polarity of the tweets was found and the weighted polarities were determined by applying multiple retweet counts, favorite counts, and the sum of favorite and retweet count features or attributes. After performing feature selection, it was observed that favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity had a similar negative correlation to the label. Finally, in reaction-based feature selection, the favorite count, sentiments (polarity), and the number of hashtag features were used for classification. The generated features were input into the sequential TensorFlow model. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 3).
The last three models were combined to predict the probabilistic weight of each model on the complete dataset and treated as separate features to train the model. These weights determine the extent to which the rumor classification task is dependent on the models, which in turn provide the weight required to understand the importance of different tweet attributes. These probabilistic features were input into a sequential TensorFlow model for rumor and non-rumor classification. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 4).
Experimental Setup: In this research, PHEME [14,15], Twitter15 [17], and Twitter16 [17] datasets were used to evaluate all the models. A total of 6,915 tweets were chosen from the datasets for this experiment, with 85% of the tweets used to train the model and 15% used for testing. These tweets were labeled into rumor and non-rumor categories, with 3,815 rumors in the non-rumor category, 2,281 in the rumor category, and 819 tweets had an unverified labeled; these were dropped from the experiment. The rumors were labeled as true rumors (1,309) and false rumors (972); true rumors were found to contain correct information at a later time but started spreading as rumors initially. False rumors were defined as incorrect information or deliberate misinformation after extensive search either from a third-party organization like Politifact or Gossipcop or by flagging such tweets with the help of users. Tweets labeled unverified were tweets whose veracity is still questionable, and people are still verifying them.
This experiment was built on a Twitter text categorization model that was applied to word vectors using deep learning methodologies. The output was examined using a dropout layer with a value of 0.5, four layers of Bi-LSTM with 64 nodes, one layer of Bi-LSTM with 32 nodes, a dense layer with 32 nodes, and the ReLU activation function. The model was then trained using a layer of batch normalization, which was processed with a dropout layer with a value of 0.5 and ultimately a dense layer with 1 node and a sigmoid activation function. This process was compiled with the Adam optimizer, and the loss was compiled using binary cross-entropy (Figure 5).
This model was trained using various parameters taken from various layers of the model (Table 3). Bi-LSTM deep learning models were used to assess the results. We calculated the loss values and the model training and testing accuracies after 150 epochs. The results indicate that the model was well trained, with an accuracy of 88.6% and a testing data accuracy of 80.1% (Table 4).
This experiment was based on a Twitter rumor text classification model using neural network-based approaches applied to Twitter user metadata features. In this process, a total of 27 different features were used for experimental purposes (Tabe 5). In the first process, log transformation was applied for data transformation. In the second process, we found a correlation between these features using the Pearson correlation method and 18 selected important features (
This model was trained using various parameters retrieved from various layers of the model (Table 7). Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. The results show that the model was well trained, with an accuracy of 77.5% and a testing data accuracy of 74.5% (Table 4).
This experiment was based on a Twitter rumor text classification model using neural network-based approaches applied to Twitter reaction features. In this process, in the first step, the tweet texts were cleaned as similar text classifications using word vectors, and then the tweet data were lemmatized. In the second step, we found the polarity of the tweets and weighted polarities by multiplying with retweet counts, favorite counts, and the sum of favorite and retweet count features or attributes (Table 8). After performing feature selection, it was observed that favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity had a similar negative correlation with the label. Finally, in reaction-based feature selection, the favorite count, sentiments (polarity), and the number of hashtag features were used for classification (Table 9). The generated features were input into the sequential TensorFlow model. In this process, the model was trained with the first dense layer with 32 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 16 nodes, a ReLU activation function, a dropout layer with a value of 0.2, a dense layer with 16 nodes, a ReLU activation function, and a final layer, which was a dense layer with a sigmoid activation function. This process was compiled with the Adam optimizer, and the loss was compiled using binary cross-entropy (Figure 7).
This model was trained using various parameters retrieved from various layers of the model (Table 10]. Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. We found that the model was well trained, with an accuracy of 63.5% and a testing data accuracy of 63.6% (Table 4).
This experiment predicted the probabilities from each of the last three models; these predicted probabilities were treated as separate features to train the model. This experiment was based on a Twitter rumor text classification model using sequential TensorFlow neural network-based approaches applied to Twitter ensemble probabilistic weights of the three proposed models. Ensemble probabilistic weight features were input into the sequential TensorFlow model. In this process, the model was trained with a dense layer with 64 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 64 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 1 node, a sigmoid activation function, and a glorot uniform weight initializer. This process was compiled with the Adam optimizer, and the loss was compiled with binary cross-entropy (Figure 8).
This model was trained using various parameters taken from various layers of the model (Table 11). Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. From the results, the model was well trained with an accuracy of 88.9% and testing data accuracy of 82.5% (Table 4).
This experiment was based on the prediction of unseen test data. In this experiment, a total of 1038 unseen tweets were selected from PHEME, Twitter15, and Twitter16 datasets. The results were evaluated with all the models, and their accuracy was determined (Figure 9).
This study compared the results of four different rumor classification models. The results of the first experiment with rumor text classification using the word-embedding model suggested the importance of sequential text, with 80.1% validation accuracy. Experiments 2 and 3 showed the importance of user metadata and reaction features of the tweet with validation accuracies of 74.5% and 63.6%, respectively. Finally, Experiment 4 suggested the importance of all combined probabilistic features of the different models. These ensemble features provided a validation accuracy of 82.5%, which was higher than those of all the other models (Figure 9). In Experiment 5, we used unseen data and obtained the most accurate results from the ensemble probabilistic features model (Figure 10).
All models were combined and trained on the dataset to obtain their final weights. These weights determined the extent to which the rumor classification task was dependent on the model, which in turn provides the weights required to understand the importance of different tweet attributes. The main contributions of this research are as follows: (1) proposing a feature extraction mechanism and preprocessing techniques to make data more relevant; (2) constructing a hypothesis based on metadata in the user and tweet datasets and mapping the relationship to the problem; (3) correlating derived and extracted features to actualize attribute relationships and reduce dimensionality for effective resource utilization; (4) ensemble strategies for merging models as well as training and assessing the final model for a general answer; (5) formulating the relevance of feature variables by understanding their weight distribution; (6) defining loss metrics for dichotomous rumor data categorization problem; and (7) the accuracy of the ensemble model can be improved through live training and manual weight adjustment.
To understand the importance of the user metadata features in an ensemble model, the built-in feature importance function of the decision tree classifier was used to obtain the feature importance scores (Figures 11 and 12).
This study compared four different models using different features of tweet data. The results of the ensemble probabilistic features rumor classification model provided better training accuracy, validation accuracy, and loss value. Furthermore, we tested the models on unseen data and concluded that the ensemble model overestimates the importance of the Tweet text model because of the limited availability of data. The ensemble probabilistic feature rumor classification model performed better than the other individual feature models. This study demonstrated the significance of traits that played a significant role in the task of identifying tweets as rumors or non-rumors. Future work on improving these models will include live training of ensemble feature models using real-time data to increase the accuracy of the model.
No potential conflict of interest relevant to this article was reported.
Table 1. Statistics of Twitter15 and Twitter16 datasets.
Statistic | Twitter15 | Twitter16 |
---|---|---|
# of users | 276,663 | 173,487 |
# of source tweets | 1,490 | 818 |
# of threads | 331,612 | 204,820 |
# of non-rumors | 374 | 205 |
# of false rumors | 370 | 205 |
# of true rumors | 372 | 205 |
# of unverified rumors | 374 | 203 |
Avg. time length (hr) | 1,337 | 848 |
Avg. # of posts | 223 | 251 |
Max #of posts | 1,768 | 2,765 |
Min # of posts | 55 | 81 |
Table 2. Statistics of PHEME dataset.
Events | Threads | Tweets | Rumors | Non-rumors | True | False | Unverified |
---|---|---|---|---|---|---|---|
Charlie Hebdo | 2,079 | 38,268 | 458 | 1,621 | 193 | 116 | 149 |
Sydney siege | 1,221 | 23,996 | 522 | 699 | 382 | 86 | 54 |
Ferguson | 1,143 | 24,175 | 284 | 859 | 10 | 8 | 266 |
Ottawa shooting | 890 | 12,284 | 470 | 420 | 329 | 72 | 69 |
German wings crash | 469 | 4,489 | 238 | 231 | 94 | 111 | 33 |
Putin missing | 238 | 835 | 126 | 112 | 0 | 9 | 117 |
Prince Toronto | 233 | 902 | 229 | 4 | 0 | 222 | 7 |
Gurlitt | 138 | 179 | 61 | 77 | 59 | 0 | 2 |
Ebola Essien | 14 | 226 | 14 | 0 | 0 | 14 | 0 |
Table 3. Parameters received from different layers of rumor text classification model using word vectors.
Layer (type) | Output shape | Param# |
---|---|---|
embedding(Embedding) | (None,93,300) | 2,724,600 |
dropout(Dropout) | (None,93,300) | 0 |
bidirectional(Bidirectional LSTM) | (None,93,128) | 186,880 |
Bidirectional_1(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_2(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_3(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_4(Bidirectional LSTM) | (None,64) | 41,216 |
dense(Dense) | (None,32) | 2,080 |
batch_normalization(BatchNormalization) | (None,32) | 128 |
dropout_1(Dropout) | (None,32) | 0 |
Total params: 6,317,985 | ||
Trainable params: 526,721 | ||
Non-trainable params: 5,791,264 |
Table 4. Loss and accuracy comparisons of different rumor text classification models.
Models | Loss | Accuracy (%) | ||
---|---|---|---|---|
Training | Validation | Training | Validation | |
Rumor text classification using word vector features | 0.274 | 0.449 | 88.6 | 80.1 |
Rumor text classification using user metadata features | 0.469 | 0.550 | 77.5 | 74.5 |
Rumor text classification using reaction features | 0.644 | 0.647 | 63.5 | 63.6 |
Rumor text classification using ensemble probabilistic features | 0.270 | 0.430 | 88.9 | 82.5 |
Table 5. Tweet user metadata features.
Numerical features | Categorical features |
---|---|
Follower count | Is_reply |
Retweet count | Verified |
Favorite count | Is_quote_status |
No. of symbols | Profile_image_url |
No. of user mentions | profile_background_image_url |
No. of hashtags | Default profile image |
No. of URLs | Default profile |
Polarity | Profile_use_background_image |
Text length | Has_location |
Post age | Has_url |
Status count | |
Friends count | |
Favorites count of user | |
Listed count | |
Account age | |
Screen name length | |
The time gap between user-created time and tweeted time |
Table 6. Extracted important features of tweet user metadata after applied standard scalar method.
Numerical features | Categorical features |
---|---|
Text length | Profile_use_background_image |
No. of URLs | Profile_background_image_url |
Post age | Default_profile_image |
Status count | Default_profile |
Listed count | Verified |
No. of symbols | |
Posted_in | |
No. of user mentions | |
Polarity | |
Favorites count of user | |
Favorites count of a tweet | |
Screen name length | |
No. of hashtags |
Table 7. Parameters received from different layers of rumor text classification model using tweet user metadata features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 1216 |
dropout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_3(Dense) | (None,1) | 65 |
Total params: 5,441 | ||
Trainable params: 5,441 | ||
Non-trainable params: 0 |
Table 8. Twitter reaction features.
Twitter reaction features |
---|
Retweet count |
Favorite count |
Sentiments |
No. of hashtags |
Favorite weighted polarity |
Retweet-weighted polarity |
Favorite retweet-weighted polarity |
Table 9. Useful Twitter reaction features after applied feature correlation method.
Twitter reaction features |
---|
Favorite count |
Sentiments (Polarity) |
No. of hashtags |
Table 10. Parameters received from different layers of rumor text classification model using tweet reaction features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (4598,32) | 256 |
drouput_1(Dropout) | (4598,32) | 0 |
dense_2(Dense) | (4598,16) | 528 |
dropout_2(Dropout) | (4598,16) | 0 |
dense_3(Dense) | (4598,16) | 272 |
dense_4(Dense) | (4598,1) | 17 |
Total params: 1,073 | ||
Trainable params: 1,073 | ||
Non-trainable params: 0 |
Table 11. Parameters received from different layers of rumor text classification model using ensemble probabilistic features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 256 |
droupout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_8(Dense) | (None,1) | 65 |
Total params: 4,481 | ||
Trainable params: 4,481 | ||
Non-trainable params: 0 |
E-mail: amitchandnia@gmail.com
E-mail: devesh988@yahoo.com
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338
Published online September 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.3.325
Copyright © The Korean Institute of Intelligent Systems.
Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3
1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India
Correspondence to:Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.
Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network
A rumor is described as the circulation of false messages of events, objects, and issues to a group of people. Social media is the easiest platform for propagating rumors or false information in the public [1–4]. Twitter has evolved as a significant source of information sharing across all social media platforms where, on average, millions of tweets are shared per day, and each user can tweet up to 140-character long messages or share files, such as pictures or videos. These tweets sometimes contain relevant information, such as public announcements by governing organizations and sharing of events, personal experiences, beliefs, and thoughts. Camouflaged among these data, rumors are widely spreading and destroying the credibility of this information. This problem is elevated owing to crowdsourcing, where a rumor can be backed by many users because of their lack of knowledge regarding the matter and beliefs. To address this issue, much work has been done in the research community to find a prominent solution. This research attempts to understand the root causes of rumor generation on Twitter and which features may help classify rumors. In previous studies, machine learning methods, such as logistic regression, decision tree, support vector machine (SVM), naive Bayes, and k-nearest neighbor, have been widely used for text categorization [1,5]. Word-embedding approaches have been widely used for word vector generation [5]. In this context, deep learning methods including recurrent neural networks (RNN), long short-term memory (LSTM), bidirectional LSTMs (Bi-LSTM), gated recurrent units, and convolutional neural networks are also being applied in the field of text analysis [6]. Previous research has focused on word vector features, and recently, researchers have focused on other features of tweets. In this manner, user metadata and reaction (sentiment) features are used to obtain accurate results. User metadata have numerical and categorical features, which are important for identifying rumors. The features have different values, and their values must be scaled. Log transformation and standard feature methods are widely used for feature scaling. The correlation between the features or attributes helps identify the features that can be used to identify the results. In this study, four different feature models that use different features of tweet data and classify the results as rumor and non-rumor using neural network architecture were used for analysis. All the models were developed as a supervised learning task on a publicly available dataset from Twitter from which tweet-id and labels were gathered; then using the respective tweet-id, tweets were collected in JSON format using the Twitter API.
In previous studies, the main focus was text analysis with text sequential features for classifying rumors and non-rumors. In previous studies, there were some research gaps and questions that arose, such as how to automatically identify rumor text from large unstructured data and what is the main focus of the research on automatically identifying rumor texts. The improvement of this study over previous research is as follows. This research has provided solutions to these questions by focusing on different features, such as reaction and user metadata features, along with the text features or attributes of the tweets. Previous research was based on word-embedding features of the text, and some researchers suggested sentiment features of the post; however, in this study, other features have also been used, and these features provide input for the neural network architecture. In this research, we cannot directly combine the metadata feature, reaction features, and sentences; thus, we separately trained all the models with different features or attributes; then, we combined all the weights of different models and trained the model with the same dataset.
The primary goal of this research is to identify tweet attributes or features, such as tweet text, user information, and other tweet metadata, by which we can determine the veracity of a tweet. The objective of this study was to determine the numerical importance of certain features or attributes. The primary goal of this study was to develop an automated method for accurately identifying rumors by comparing it to existing models that employ a variety of combined features from tweets, including word embedding, reaction, and metadata features. This work focuses on finding attribute importance for classification, whereas previous work aimed to find better techniques for classification.
The objective of this study was to identify important features to distinguish rumor-spreading tweets from authentic tweets. For this purpose and because the dataset comprised various data formats, four models were developed. (1) Bi-LSTM model for tweet text, which was mapped to the target class to check how much tweet text can classify tweets as rumors. (2) Neural model for tweet metadata trained with attributes such as user information (e.g., user account age and user follower count) and tweet information (such as count, retweet count, share count, and tweet text sentiment), which was also mapped to the target class to check how much tweet metadata can classify tweets as rumors. (3) Bi-LSTM model for comments on tweets; for each tweet, the first 20 comments were collected and used to train the model to understand the impact of comments on tweet rumor classification. (4) Ensemble model to merge the above models and train them with the target class to determine whether combining models trained on different parts of tweets has an effect.
The remainder of this paper is organized as follows: Section 2 contains related works. The conceptual underpinning of this study is presented in Section 3, which also provides a conceptual background for areas related to model development. Section 4 defines the loss function, correlation functions, weight normalization techniques, and other methods used in model development. It includes the mathematical equations of the techniques and the justification of their use in our model development. Finally, the proposed alternative methods of rumor classification are also presented in Section 4. The experimental evaluation is presented in Section 5, and an analysis of the results is presented in Section 6. Section 7 concludes the paper by outlining the future goals of the research.
In previous studies, models for rumor detection using state-of-the-art deep learning approaches with different text features, metadata features, and user profile features were proposed. In [1], a future research direction on rumor detection that was data-oriented, feature-oriented, model-oriented, and application-oriented was suggested. To learn hidden representations, researchers [6] suggested a model based on RNNs. Owing to extra hidden layers, experimental results on datasets from two real-world microblog platforms showed that the RNN outperformed state-of-the-art rumor detection models that used handcrafted features and that the RNN-based method detected rumors more quickly and accurately than existing techniques. A unique machine learning-based approach for the automatic detection of users distributing rumor information was proposed in [3]. The study used the concept of believability and the retweet network using word embedding to ensure user trust. In another study [7], researchers combined a feature engineering and deep learning model and developed a hierarchical social attention model that led to the improvement of rumor detection. In [8], a hybrid SVM classifier was proposed based on a graph kernel that captures high-order propagation patterns as well as semantic information, such as topics and sentiments. A deep attention model based on RNNs was presented in [1] to learn hidden representations of consecutive posts for spotting rumors. The proposed model used soft attention to probe recurrence to extract different features with a specific emphasis while simultaneously producing hidden representations that capture contextual fluctuations of relevant posts over time. Another study [9] investigated whether historical data-based knowledge could potentially aid in the detection of freshly formed rumors. We found and applied patterns from prior labeled data to assist in detecting emerging rumors because a disputed factual claim arouses specific behaviors, such as curiosity, skepticism, and astonishment. In [10], experiments were conducted to craft new features representing rumor diffusion and use them to correlate the different features. In the early detection of rumors, a study [10] proposed a probabilistic model based on eminent features, such as user and linguistic features. In [11], a method for early rumor identification that used convolutional neural networks to learn the hidden representations of individual rumor-related tweets to gain insights about their believability was proposed. Another study [12] presented a unique rumor detection approach that learned from the sequential reporting of news on social media to detect rumors in fresh stories. When compared with a state-of-the-art rumor detection system, the results show that the proposed classifier outperformed the state-of-the-art classifier. Furthermore, researchers [6] presented the propagation tree kernel, a kernel-based technique that captures high-order patterns distinguishing different types of rumors by comparing their propagation tree architectures. State-of-the-art rumor detection algorithms were outperformed by the proposed kernel-based methodology, which detected rumors faster and more accurately. Previous research has shown the importance of different social media features to help obtain more accurate results of rumor classification. A study [13] suggested a statistical and automatic approach based on a wide range of different attributes. In this approach, user profile attributes are used to develop a supervised learning model for rumor detection.
Real-world databases are highly susceptible to noisy, missing, incomplete, and inconsistent data owing to their large size and origin from various heterogeneous sources. Low-quality data result in low-quality mining outcomes. Data cleansing, integration, transformation, and reduction are just a few of the data pre-processing techniques accessible. In this research, many tweets contain noise and inconsistencies, for example,
• Many tweet texts and comments were of different origin, had much numerical data, emoji, and words that were shortened, which needed to be replaced by the right words.
• Many tweets had missing comment sections and metadata.
• Most metadata were skewed.
• Text was vectorized as per word embedding.
A tweet has different types of features that are used for information extraction. In previous studies, tweet text features using word-embedding methods were used to generate word vectors and analyze the results. In the present study, the focus was on tweet user features [7], tweet metadata features, and sentimental features (follower count, reply count, retweet count, verified, favorite count, number of symbols, profile image URL, user mentions, number of hashtags, default profile image, number of URLs, polarity, text length, location, post age, status count, friend count, account age, favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity) [4,14].
In tweet text analysis, the word-embedding model has been used to find the vectors of tweets, and these vectors have been input for text classification methods. In previous studies, the TF-IDF, word2vec, GloVe, and fastText word-embedding models have been widely used for text classification. In this study, the fastText word-embedding model was used for word vector generation [5,15,17]. This word-embedding method is an extension of the word2vec model; it represents each word as an n-gram of characters instead of learning vectors for words directly.
For the non-categorical skewed features, the log-transform method, and the standard scalar method were used. The standard scalar method standardizes a feature by removing the mean and scaling it to the unit variance. The log-transform method transforms the data into a normal distribution for more reliable prediction. The standard deviation was used to divide all numbers to obtain the unit variance.
To find the correlation between user metadata features or attributes, the point biserial method of Pearson correlation was used to identify the correlation between the features. It is used to determine the correlation between a continuous variable and a binary variable (
where M1 represents the mean of the positive binary variable group, M0 represents the mean of the group with a negative binary variable, sn represents the standard deviation, p represents the proportion of cases in the “0” group, q represents the proportion of cases in the “1” group, and the resultant correlation coefficient ranges between 0 and 1. If rpb > 0, then it signifies a positive correlation between the features.
The activation function of the layer determines the output of the layer given a set of inputs. The rectified linear unit (ReLU) and sigmoid activation functions were used in the implemented neural network models, and the ReLU function gives the output directly if the output is positive; otherwise, it gives 0, and the output range is between (0, ∞) (
The sigmoid function is a real, bounded, and differentiable function that defines all the input values, and the output range is (0, 1) (
Weight initialization is a procedure for setting the initial weights of a neural network using different methods, and it is used in the implemented neural network models. There are two methods of weight initialization: He-normal and Glorot-normal. In He-normal, the weights are initialized by considering the size of the previous layer. This helps achieve a faster and more accurate gradient descent. It draws weights from a normal distribution centered on 0 and with a standard deviation, as calculated from the formula below. For ReLU function,
where Y is the output of the layer before applying the activation function, (w1, w2, w3, ..., wn) are the weights of the respective nodes (x1, x2, x3, ..., xn), and fanin is the number of input units.
In Glorot normal, we used fanin and fanout to determine the initial weights of the nodes. It draws weights from a normal distribution centered on 0 and with a standard deviation, as calculated from the formula below:
where fanin is the number of input units and fanout is the number of output units. A Glorot-normal function was used with sigmoid activation functions in the implemented neural networks.
Batch normalization is a technique that transforms the data in such a way that the mean becomes 0 and the standard deviation becomes 1. When the input to the first layer of the neural network is already normalized, the output from the activation functions may no longer be on the same scale, resulting in internal covariate shifts in the data. To avoid this problem, batch normalization was applied to each input. Batch normalization is a two-step process in which the input is normalized, and rescaling and offsetting are performed later. In this method, we first computed the mean using the input from the
where hi are the inputs from layer h, and m is the number of neurons in layer h. Second, the standard deviation of hidden activations is calculated using (
Subsequently, the mean is subtracted from each input and divided by the sum of the standard deviations and smoothing term
Finally, rescaling parameter
The loss function is used to determine the error between the predicted and desired outputs. The primary loss function used for classification problems was cross-entropy [2,17]. In this study, we categorized tweets into rumor or non-rumor data; hence, binary cross-entropy was used (
where ŷ is the desired output, and y is the predicted output.
A Bi-LSTM is a sequence-processing model that consists of two LSTMs: one that takes the input forward and the other that takes it backward. Bi-LSTMs effectively improve the quantity of data available to the network and provide the algorithm with more context. This strategy is used to determine which words come after and before a word in a sentence. This is an important method in text classification that has been used for sequential learning [7,18].
In this research, the PHEME, Twitter15, and Twitter16 datasets were used to evaluate all models. The total number of tweets in these datasets is listed in Tables 1 and 2.
A collection of rumors and non-rumors reported on Twitter during breaking news events can be found in the PHEME dataset. It includes rumors about nine different events, and each rumor is annotated with its veracity value, which may be true, false, or unverified (Table 2).
The original data were used from the PHEME, Twitter15, and Twitter16 datasets, whose tweets were re-fetched over a period of 2 days to obtain the latest metadata values. No specific keyword was used for crawling; the data were fetched from Twitter using the tweet IDs present in the old dataset.
Previous research has suggested text classification approaches that use different text and metadata features of tweets for rumor detection. In this study, we analyzed the results using four different models of tweet classification, all of which use different features. The first model used word vector features, the second used user and metadata features, the third used tweet reaction features, and the fourth was an ensemble model that combined all three models and extracted the probabilities of each model on complete datasets for tweet text classification. The loss and accuracy of these models were calculated using deep learning classification approaches. The class target in this research was the authenticity of tweets classified into whether a tweet was a rumor or not.
In this model, word vectors were used for text sentence classification. The text was preprocessed by removing hyperlinks, emojis, hashtags, mentions, stop words, and numbers, replacing the contracted words with their expanded forms, and replacing the abbreviated words with their true forms. After these processes, tokenization and fastText word embedding were applied to create the embedding matrix. The generated vectors were input in the deep learning Bi-LSTM sequential methods. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 1).
In this model, tweet user metadata features were used for sentence classification. The Twitter features were pre-processed by applying the log-transformation method for data transformation and determining the correlation between the features using the Pearson correlation method. The generated features were input into the sequential TensorFlow model. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 2).
In this model, tweet reaction features were used for sentence classification. In the first step, the tweet texts were cleaned for similar text classification using word vectors, and then the tweet data were lemmatized. In the second step, the polarity of the tweets was found and the weighted polarities were determined by applying multiple retweet counts, favorite counts, and the sum of favorite and retweet count features or attributes. After performing feature selection, it was observed that favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity had a similar negative correlation to the label. Finally, in reaction-based feature selection, the favorite count, sentiments (polarity), and the number of hashtag features were used for classification. The generated features were input into the sequential TensorFlow model. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 3).
The last three models were combined to predict the probabilistic weight of each model on the complete dataset and treated as separate features to train the model. These weights determine the extent to which the rumor classification task is dependent on the models, which in turn provide the weight required to understand the importance of different tweet attributes. These probabilistic features were input into a sequential TensorFlow model for rumor and non-rumor classification. The effectiveness of this analysis was that the data input was applied to multiple hidden layers (Figure 4).
Experimental Setup: In this research, PHEME [14,15], Twitter15 [17], and Twitter16 [17] datasets were used to evaluate all the models. A total of 6,915 tweets were chosen from the datasets for this experiment, with 85% of the tweets used to train the model and 15% used for testing. These tweets were labeled into rumor and non-rumor categories, with 3,815 rumors in the non-rumor category, 2,281 in the rumor category, and 819 tweets had an unverified labeled; these were dropped from the experiment. The rumors were labeled as true rumors (1,309) and false rumors (972); true rumors were found to contain correct information at a later time but started spreading as rumors initially. False rumors were defined as incorrect information or deliberate misinformation after extensive search either from a third-party organization like Politifact or Gossipcop or by flagging such tweets with the help of users. Tweets labeled unverified were tweets whose veracity is still questionable, and people are still verifying them.
This experiment was built on a Twitter text categorization model that was applied to word vectors using deep learning methodologies. The output was examined using a dropout layer with a value of 0.5, four layers of Bi-LSTM with 64 nodes, one layer of Bi-LSTM with 32 nodes, a dense layer with 32 nodes, and the ReLU activation function. The model was then trained using a layer of batch normalization, which was processed with a dropout layer with a value of 0.5 and ultimately a dense layer with 1 node and a sigmoid activation function. This process was compiled with the Adam optimizer, and the loss was compiled using binary cross-entropy (Figure 5).
This model was trained using various parameters taken from various layers of the model (Table 3). Bi-LSTM deep learning models were used to assess the results. We calculated the loss values and the model training and testing accuracies after 150 epochs. The results indicate that the model was well trained, with an accuracy of 88.6% and a testing data accuracy of 80.1% (Table 4).
This experiment was based on a Twitter rumor text classification model using neural network-based approaches applied to Twitter user metadata features. In this process, a total of 27 different features were used for experimental purposes (Tabe 5). In the first process, log transformation was applied for data transformation. In the second process, we found a correlation between these features using the Pearson correlation method and 18 selected important features (
This model was trained using various parameters retrieved from various layers of the model (Table 7). Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. The results show that the model was well trained, with an accuracy of 77.5% and a testing data accuracy of 74.5% (Table 4).
This experiment was based on a Twitter rumor text classification model using neural network-based approaches applied to Twitter reaction features. In this process, in the first step, the tweet texts were cleaned as similar text classifications using word vectors, and then the tweet data were lemmatized. In the second step, we found the polarity of the tweets and weighted polarities by multiplying with retweet counts, favorite counts, and the sum of favorite and retweet count features or attributes (Table 8). After performing feature selection, it was observed that favorite weighted polarity, retweet-weighted polarity, and favorite retweet-weighted polarity had a similar negative correlation with the label. Finally, in reaction-based feature selection, the favorite count, sentiments (polarity), and the number of hashtag features were used for classification (Table 9). The generated features were input into the sequential TensorFlow model. In this process, the model was trained with the first dense layer with 32 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 16 nodes, a ReLU activation function, a dropout layer with a value of 0.2, a dense layer with 16 nodes, a ReLU activation function, and a final layer, which was a dense layer with a sigmoid activation function. This process was compiled with the Adam optimizer, and the loss was compiled using binary cross-entropy (Figure 7).
This model was trained using various parameters retrieved from various layers of the model (Table 10]. Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. We found that the model was well trained, with an accuracy of 63.5% and a testing data accuracy of 63.6% (Table 4).
This experiment predicted the probabilities from each of the last three models; these predicted probabilities were treated as separate features to train the model. This experiment was based on a Twitter rumor text classification model using sequential TensorFlow neural network-based approaches applied to Twitter ensemble probabilistic weights of the three proposed models. Ensemble probabilistic weight features were input into the sequential TensorFlow model. In this process, the model was trained with a dense layer with 64 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 64 nodes, a ReLU activation function, a dropout layer with a value of 0.3, a dense layer with 1 node, a sigmoid activation function, and a glorot uniform weight initializer. This process was compiled with the Adam optimizer, and the loss was compiled with binary cross-entropy (Figure 8).
This model was trained using various parameters taken from various layers of the model (Table 11). Sequential TensorFlow models were used to assess results. We calculated the loss values and the model training and testing accuracies after 150 epochs. From the results, the model was well trained with an accuracy of 88.9% and testing data accuracy of 82.5% (Table 4).
This experiment was based on the prediction of unseen test data. In this experiment, a total of 1038 unseen tweets were selected from PHEME, Twitter15, and Twitter16 datasets. The results were evaluated with all the models, and their accuracy was determined (Figure 9).
This study compared the results of four different rumor classification models. The results of the first experiment with rumor text classification using the word-embedding model suggested the importance of sequential text, with 80.1% validation accuracy. Experiments 2 and 3 showed the importance of user metadata and reaction features of the tweet with validation accuracies of 74.5% and 63.6%, respectively. Finally, Experiment 4 suggested the importance of all combined probabilistic features of the different models. These ensemble features provided a validation accuracy of 82.5%, which was higher than those of all the other models (Figure 9). In Experiment 5, we used unseen data and obtained the most accurate results from the ensemble probabilistic features model (Figure 10).
All models were combined and trained on the dataset to obtain their final weights. These weights determined the extent to which the rumor classification task was dependent on the model, which in turn provides the weights required to understand the importance of different tweet attributes. The main contributions of this research are as follows: (1) proposing a feature extraction mechanism and preprocessing techniques to make data more relevant; (2) constructing a hypothesis based on metadata in the user and tweet datasets and mapping the relationship to the problem; (3) correlating derived and extracted features to actualize attribute relationships and reduce dimensionality for effective resource utilization; (4) ensemble strategies for merging models as well as training and assessing the final model for a general answer; (5) formulating the relevance of feature variables by understanding their weight distribution; (6) defining loss metrics for dichotomous rumor data categorization problem; and (7) the accuracy of the ensemble model can be improved through live training and manual weight adjustment.
To understand the importance of the user metadata features in an ensemble model, the built-in feature importance function of the decision tree classifier was used to obtain the feature importance scores (Figures 11 and 12).
This study compared four different models using different features of tweet data. The results of the ensemble probabilistic features rumor classification model provided better training accuracy, validation accuracy, and loss value. Furthermore, we tested the models on unseen data and concluded that the ensemble model overestimates the importance of the Tweet text model because of the limited availability of data. The ensemble probabilistic feature rumor classification model performed better than the other individual feature models. This study demonstrated the significance of traits that played a significant role in the task of identifying tweets as rumors or non-rumors. Future work on improving these models will include live training of ensemble feature models using real-time data to increase the accuracy of the model.
Rumor text classification using word vector features.
Rumor text classification using user metadata features.
Rumor text classification using reaction features.
Rumor text classification using ensemble features.
Architecture for rumor text classification using word vectors.
Architecture for rumor text classification using twitter user metadata features.
Architecture for rumor text classification using Twitter reaction features.
Architecture for rumor text classification using ensemble probabilistic features.
Accuracy analysis of unseen data.
Training accuracy and validation accuracy comparisons of all models.
User metadata feature importance analysis.
Feature importance analysis of the ensemble model.
Table 1 . Statistics of Twitter15 and Twitter16 datasets.
Statistic | Twitter15 | Twitter16 |
---|---|---|
# of users | 276,663 | 173,487 |
# of source tweets | 1,490 | 818 |
# of threads | 331,612 | 204,820 |
# of non-rumors | 374 | 205 |
# of false rumors | 370 | 205 |
# of true rumors | 372 | 205 |
# of unverified rumors | 374 | 203 |
Avg. time length (hr) | 1,337 | 848 |
Avg. # of posts | 223 | 251 |
Max #of posts | 1,768 | 2,765 |
Min # of posts | 55 | 81 |
Table 2 . Statistics of PHEME dataset.
Events | Threads | Tweets | Rumors | Non-rumors | True | False | Unverified |
---|---|---|---|---|---|---|---|
Charlie Hebdo | 2,079 | 38,268 | 458 | 1,621 | 193 | 116 | 149 |
Sydney siege | 1,221 | 23,996 | 522 | 699 | 382 | 86 | 54 |
Ferguson | 1,143 | 24,175 | 284 | 859 | 10 | 8 | 266 |
Ottawa shooting | 890 | 12,284 | 470 | 420 | 329 | 72 | 69 |
German wings crash | 469 | 4,489 | 238 | 231 | 94 | 111 | 33 |
Putin missing | 238 | 835 | 126 | 112 | 0 | 9 | 117 |
Prince Toronto | 233 | 902 | 229 | 4 | 0 | 222 | 7 |
Gurlitt | 138 | 179 | 61 | 77 | 59 | 0 | 2 |
Ebola Essien | 14 | 226 | 14 | 0 | 0 | 14 | 0 |
Table 3 . Parameters received from different layers of rumor text classification model using word vectors.
Layer (type) | Output shape | Param# |
---|---|---|
embedding(Embedding) | (None,93,300) | 2,724,600 |
dropout(Dropout) | (None,93,300) | 0 |
bidirectional(Bidirectional LSTM) | (None,93,128) | 186,880 |
Bidirectional_1(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_2(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_3(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_4(Bidirectional LSTM) | (None,64) | 41,216 |
dense(Dense) | (None,32) | 2,080 |
batch_normalization(BatchNormalization) | (None,32) | 128 |
dropout_1(Dropout) | (None,32) | 0 |
Total params: 6,317,985 | ||
Trainable params: 526,721 | ||
Non-trainable params: 5,791,264 |
Table 4 . Loss and accuracy comparisons of different rumor text classification models.
Models | Loss | Accuracy (%) | ||
---|---|---|---|---|
Training | Validation | Training | Validation | |
Rumor text classification using word vector features | 0.274 | 0.449 | 88.6 | 80.1 |
Rumor text classification using user metadata features | 0.469 | 0.550 | 77.5 | 74.5 |
Rumor text classification using reaction features | 0.644 | 0.647 | 63.5 | 63.6 |
Rumor text classification using ensemble probabilistic features | 0.270 | 0.430 | 88.9 | 82.5 |
Table 5 . Tweet user metadata features.
Numerical features | Categorical features |
---|---|
Follower count | Is_reply |
Retweet count | Verified |
Favorite count | Is_quote_status |
No. of symbols | Profile_image_url |
No. of user mentions | profile_background_image_url |
No. of hashtags | Default profile image |
No. of URLs | Default profile |
Polarity | Profile_use_background_image |
Text length | Has_location |
Post age | Has_url |
Status count | |
Friends count | |
Favorites count of user | |
Listed count | |
Account age | |
Screen name length | |
The time gap between user-created time and tweeted time |
Table 6 . Extracted important features of tweet user metadata after applied standard scalar method.
Numerical features | Categorical features |
---|---|
Text length | Profile_use_background_image |
No. of URLs | Profile_background_image_url |
Post age | Default_profile_image |
Status count | Default_profile |
Listed count | Verified |
No. of symbols | |
Posted_in | |
No. of user mentions | |
Polarity | |
Favorites count of user | |
Favorites count of a tweet | |
Screen name length | |
No. of hashtags |
Table 7 . Parameters received from different layers of rumor text classification model using tweet user metadata features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 1216 |
dropout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_3(Dense) | (None,1) | 65 |
Total params: 5,441 | ||
Trainable params: 5,441 | ||
Non-trainable params: 0 |
Table 8 . Twitter reaction features.
Twitter reaction features |
---|
Retweet count |
Favorite count |
Sentiments |
No. of hashtags |
Favorite weighted polarity |
Retweet-weighted polarity |
Favorite retweet-weighted polarity |
Table 9 . Useful Twitter reaction features after applied feature correlation method.
Twitter reaction features |
---|
Favorite count |
Sentiments (Polarity) |
No. of hashtags |
Table 10 . Parameters received from different layers of rumor text classification model using tweet reaction features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (4598,32) | 256 |
drouput_1(Dropout) | (4598,32) | 0 |
dense_2(Dense) | (4598,16) | 528 |
dropout_2(Dropout) | (4598,16) | 0 |
dense_3(Dense) | (4598,16) | 272 |
dense_4(Dense) | (4598,1) | 17 |
Total params: 1,073 | ||
Trainable params: 1,073 | ||
Non-trainable params: 0 |
Table 11 . Parameters received from different layers of rumor text classification model using ensemble probabilistic features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 256 |
droupout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_8(Dense) | (None,1) | 65 |
Total params: 4,481 | ||
Trainable params: 4,481 | ||
Non-trainable params: 0 |
Alif Tri Handoyo, Hidayaturrahman, Criscentia Jessica Setiadi, Derwin Suhartono
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(4): 401-413 https://doi.org/10.5391/IJFIS.2022.22.4.401Soohyung Lee and Heesung Lee
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 261-266 https://doi.org/10.5391/IJFIS.2022.22.3.261Chan Sik Han and Keon Myung Lee
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(4): 317-337 https://doi.org/10.5391/IJFIS.2021.21.4.317Rumor text classification using word vector features.
|@|~(^,^)~|@|Rumor text classification using user metadata features.
|@|~(^,^)~|@|Rumor text classification using reaction features.
|@|~(^,^)~|@|Rumor text classification using ensemble features.
|@|~(^,^)~|@|Architecture for rumor text classification using word vectors.
|@|~(^,^)~|@|Architecture for rumor text classification using twitter user metadata features.
|@|~(^,^)~|@|Architecture for rumor text classification using Twitter reaction features.
|@|~(^,^)~|@|Architecture for rumor text classification using ensemble probabilistic features.
|@|~(^,^)~|@|Accuracy analysis of unseen data.
|@|~(^,^)~|@|Training accuracy and validation accuracy comparisons of all models.
|@|~(^,^)~|@|User metadata feature importance analysis.
|@|~(^,^)~|@|Feature importance analysis of the ensemble model.