International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338
Published online September 25, 2022
https://doi.org/10.5391/IJFIS.2022.22.3.325
© The Korean Institute of Intelligent Systems
Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3
1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India
Correspondence to :
Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.
Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network
No potential conflict of interest relevant to this article was reported.
E-mail: amitchandnia@gmail.com
E-mail: devesh988@yahoo.com
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338
Published online September 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.3.325
Copyright © The Korean Institute of Intelligent Systems.
Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3
1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India
Correspondence to:Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.
Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network
Rumor text classification using word vector features.
Rumor text classification using user metadata features.
Rumor text classification using reaction features.
Rumor text classification using ensemble features.
Architecture for rumor text classification using word vectors.
Architecture for rumor text classification using twitter user metadata features.
Architecture for rumor text classification using Twitter reaction features.
Architecture for rumor text classification using ensemble probabilistic features.
Accuracy analysis of unseen data.
Training accuracy and validation accuracy comparisons of all models.
User metadata feature importance analysis.
Feature importance analysis of the ensemble model.
Table 1 . Statistics of Twitter15 and Twitter16 datasets.
Statistic | Twitter15 | Twitter16 |
---|---|---|
# of users | 276,663 | 173,487 |
# of source tweets | 1,490 | 818 |
# of threads | 331,612 | 204,820 |
# of non-rumors | 374 | 205 |
# of false rumors | 370 | 205 |
# of true rumors | 372 | 205 |
# of unverified rumors | 374 | 203 |
Avg. time length (hr) | 1,337 | 848 |
Avg. # of posts | 223 | 251 |
Max #of posts | 1,768 | 2,765 |
Min # of posts | 55 | 81 |
Table 2 . Statistics of PHEME dataset.
Events | Threads | Tweets | Rumors | Non-rumors | True | False | Unverified |
---|---|---|---|---|---|---|---|
Charlie Hebdo | 2,079 | 38,268 | 458 | 1,621 | 193 | 116 | 149 |
Sydney siege | 1,221 | 23,996 | 522 | 699 | 382 | 86 | 54 |
Ferguson | 1,143 | 24,175 | 284 | 859 | 10 | 8 | 266 |
Ottawa shooting | 890 | 12,284 | 470 | 420 | 329 | 72 | 69 |
German wings crash | 469 | 4,489 | 238 | 231 | 94 | 111 | 33 |
Putin missing | 238 | 835 | 126 | 112 | 0 | 9 | 117 |
Prince Toronto | 233 | 902 | 229 | 4 | 0 | 222 | 7 |
Gurlitt | 138 | 179 | 61 | 77 | 59 | 0 | 2 |
Ebola Essien | 14 | 226 | 14 | 0 | 0 | 14 | 0 |
Table 3 . Parameters received from different layers of rumor text classification model using word vectors.
Layer (type) | Output shape | Param# |
---|---|---|
embedding(Embedding) | (None,93,300) | 2,724,600 |
dropout(Dropout) | (None,93,300) | 0 |
bidirectional(Bidirectional LSTM) | (None,93,128) | 186,880 |
Bidirectional_1(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_2(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_3(Bidirectional LSTM) | (None,93,128) | 98,816 |
bidirectional_4(Bidirectional LSTM) | (None,64) | 41,216 |
dense(Dense) | (None,32) | 2,080 |
batch_normalization(BatchNormalization) | (None,32) | 128 |
dropout_1(Dropout) | (None,32) | 0 |
Total params: 6,317,985 | ||
Trainable params: 526,721 | ||
Non-trainable params: 5,791,264 |
Table 4 . Loss and accuracy comparisons of different rumor text classification models.
Models | Loss | Accuracy (%) | ||
---|---|---|---|---|
Training | Validation | Training | Validation | |
Rumor text classification using word vector features | 0.274 | 0.449 | 88.6 | 80.1 |
Rumor text classification using user metadata features | 0.469 | 0.550 | 77.5 | 74.5 |
Rumor text classification using reaction features | 0.644 | 0.647 | 63.5 | 63.6 |
Rumor text classification using ensemble probabilistic features | 0.270 | 0.430 | 88.9 | 82.5 |
Table 5 . Tweet user metadata features.
Numerical features | Categorical features |
---|---|
Follower count | Is_reply |
Retweet count | Verified |
Favorite count | Is_quote_status |
No. of symbols | Profile_image_url |
No. of user mentions | profile_background_image_url |
No. of hashtags | Default profile image |
No. of URLs | Default profile |
Polarity | Profile_use_background_image |
Text length | Has_location |
Post age | Has_url |
Status count | |
Friends count | |
Favorites count of user | |
Listed count | |
Account age | |
Screen name length | |
The time gap between user-created time and tweeted time |
Table 6 . Extracted important features of tweet user metadata after applied standard scalar method.
Numerical features | Categorical features |
---|---|
Text length | Profile_use_background_image |
No. of URLs | Profile_background_image_url |
Post age | Default_profile_image |
Status count | Default_profile |
Listed count | Verified |
No. of symbols | |
Posted_in | |
No. of user mentions | |
Polarity | |
Favorites count of user | |
Favorites count of a tweet | |
Screen name length | |
No. of hashtags |
Table 7 . Parameters received from different layers of rumor text classification model using tweet user metadata features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 1216 |
dropout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_3(Dense) | (None,1) | 65 |
Total params: 5,441 | ||
Trainable params: 5,441 | ||
Non-trainable params: 0 |
Table 8 . Twitter reaction features.
Twitter reaction features |
---|
Retweet count |
Favorite count |
Sentiments |
No. of hashtags |
Favorite weighted polarity |
Retweet-weighted polarity |
Favorite retweet-weighted polarity |
Table 9 . Useful Twitter reaction features after applied feature correlation method.
Twitter reaction features |
---|
Favorite count |
Sentiments (Polarity) |
No. of hashtags |
Table 10 . Parameters received from different layers of rumor text classification model using tweet reaction features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (4598,32) | 256 |
drouput_1(Dropout) | (4598,32) | 0 |
dense_2(Dense) | (4598,16) | 528 |
dropout_2(Dropout) | (4598,16) | 0 |
dense_3(Dense) | (4598,16) | 272 |
dense_4(Dense) | (4598,1) | 17 |
Total params: 1,073 | ||
Trainable params: 1,073 | ||
Non-trainable params: 0 |
Table 11 . Parameters received from different layers of rumor text classification model using ensemble probabilistic features.
Layer (type) | Output shape | Param # |
---|---|---|
dense_1(Dense) | (None,64) | 256 |
droupout_1(Dropout) | (None,64) | 0 |
dense_2(Dense) | (None,64) | 4160 |
dropout_2(Dropout) | (None,64) | 0 |
dense_8(Dense) | (None,1) | 65 |
Total params: 4,481 | ||
Trainable params: 4,481 | ||
Non-trainable params: 0 |
Alif Tri Handoyo, Hidayaturrahman, Criscentia Jessica Setiadi, Derwin Suhartono
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(4): 401-413 https://doi.org/10.5391/IJFIS.2022.22.4.401Soohyung Lee and Heesung Lee
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 261-266 https://doi.org/10.5391/IJFIS.2022.22.3.261Chan Sik Han and Keon Myung Lee
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(4): 317-337 https://doi.org/10.5391/IJFIS.2021.21.4.317Rumor text classification using word vector features.
|@|~(^,^)~|@|Rumor text classification using user metadata features.
|@|~(^,^)~|@|Rumor text classification using reaction features.
|@|~(^,^)~|@|Rumor text classification using ensemble features.
|@|~(^,^)~|@|Architecture for rumor text classification using word vectors.
|@|~(^,^)~|@|Architecture for rumor text classification using twitter user metadata features.
|@|~(^,^)~|@|Architecture for rumor text classification using Twitter reaction features.
|@|~(^,^)~|@|Architecture for rumor text classification using ensemble probabilistic features.
|@|~(^,^)~|@|Accuracy analysis of unseen data.
|@|~(^,^)~|@|Training accuracy and validation accuracy comparisons of all models.
|@|~(^,^)~|@|User metadata feature importance analysis.
|@|~(^,^)~|@|Feature importance analysis of the ensemble model.