Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338

Published online September 25, 2022

https://doi.org/10.5391/IJFIS.2022.22.3.325

© The Korean Institute of Intelligent Systems

Ensemble Rumor Text Classification Model Applied to Different Tweet Features

Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3

1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India

Correspondence to :
Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)

Received: April 21, 2022; Revised: July 11, 2022; Accepted: August 11, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.

Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network

No potential conflict of interest relevant to this article was reported.

Amit Kumar Sharma received his M. Tech Degree in computer science and engineering with specialization in information security from Central University of Rajasthan, India. He is currently pursuing the Ph.D. degree in the Department of Computer Science & Engineering, Manipal University Jaipur, India. His research interests are in the fields of machine learning, deep learning, and text analysis.

E-mail: amitchandnia@gmail.com

Rakshith Alaham Gangeya is currently pursuing B. Tech in the field of computer science from Manipal University Jaipur, Rajasthan, India. Data science, machine learning, deep learning, web development, blockchain, applied statistics and probability, quantitative analysis, natural language processing, and game theory are some of his research interests.

E-mail: rakshith.189302117@muj.manipal.edu

Harshit Kumar is currently pursuing B. Tech in the field of computer science from Manipal University Jaipur, Rajasthan, India. His research interests include natural language processing, computer vision, blockchain, machine learning, deep learning, cyber security, quantum computing, and cloud computing.

E-mail:harshit.189302122@muj.manipal.edu

Sandeep Chaurasia (Senior Member, IEEE) is currently a Professor with the Department of CSE, School of Computing and I.T., Manipal University Jaipur, India. He has more than 12 years of rich experience in academics and one year in the industry. He has over 30 papers published in international and national journals, as well as in conference proceedings. His research interests are in the fields of machine learning, soft computing, deep learning, and AI.

E-mail: sandeep.chaurasia@jaipur.manipal.edu

Devesh Kumar Srivastava is a Professor in the Department of Information Technology, School of Computing & Information Technology, Manipal University Jaipur, India. His current research interests include software engineering, operating systems, data mining, big data, DBMS, web tech, computer architecture, computer networks, and Oops Technology.

E-mail: devesh988@yahoo.com

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 325-338

Published online September 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.3.325

Copyright © The Korean Institute of Intelligent Systems.

Ensemble Rumor Text Classification Model Applied to Different Tweet Features

Amit Kumar Sharma1,2, Rakshith Alaham Gangeya1, Harshit Kumar1, Sandeep Chaurasia1, and Devesh Kumar Srivastava3

1Department of Computer Science and Engineering, Manipal University Jaipur, India
2Department of Computer Science and Engineering, ICFAI Tech School, ICFAI University, Jaipur, India
3Department of Information Technology, Manipal University Jaipur, India

Correspondence to:Sandeep Chaurasia (sandeep.chaurasia@jaipur.manipal.edu)

Received: April 21, 2022; Revised: July 11, 2022; Accepted: August 11, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Today, social media has evolved into user-friendly and useful platforms to spread messages and receive information about different activities. Thus, social media users’ daily approaches to billions of messages, and checking the credibility of such information is a challenging task. False information or rumors are spread to misguide people. Previous approaches utilized the help of users or third parties to flag suspect information, but this was highly inefficient and redundant. Moreover, previous studies have focused on rumor classification using state-ofthe- art and deep learning methods with different tweet features. This analysis focused on combining three different feature models of tweets and suggested an ensemble model for classifying tweets as rumor and non-rumor. In this study, the experimental results of four different rumor and non-rumor classification models, which were based on a neural network model, were compared. The strength of this research is that the experiments were performed on different tweet features, such as word vectors, user metadata features, reaction features, and ensemble model probabilistic features, and the results were classified using layered neural network architectures. The correlations of the different features determine the importance and selection of useful features for experimental purposes. These findings suggest that the ensemble model performed well, and provided better validation accuracy and better results for unseen data.

Keywords: Rumor, Word embedding, User metadata features, Reaction features, Ensemble probabilistic features, Feature scaling, Correlation, Bi-LSTM, Neural network

Fig 1.

Figure 1.

Rumor text classification using word vector features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 2.

Figure 2.

Rumor text classification using user metadata features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 3.

Figure 3.

Rumor text classification using reaction features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 4.

Figure 4.

Rumor text classification using ensemble features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 5.

Figure 5.

Architecture for rumor text classification using word vectors.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 6.

Figure 6.

Architecture for rumor text classification using twitter user metadata features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 7.

Figure 7.

Architecture for rumor text classification using Twitter reaction features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 8.

Figure 8.

Architecture for rumor text classification using ensemble probabilistic features.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 9.

Figure 9.

Accuracy analysis of unseen data.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 10.

Figure 10.

Training accuracy and validation accuracy comparisons of all models.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 11.

Figure 11.

User metadata feature importance analysis.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Fig 12.

Figure 12.

Feature importance analysis of the ensemble model.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 325-338https://doi.org/10.5391/IJFIS.2022.22.3.325

Table 1 . Statistics of Twitter15 and Twitter16 datasets.

StatisticTwitter15Twitter16
# of users276,663173,487
# of source tweets1,490818
# of threads331,612204,820
# of non-rumors374205
# of false rumors370205
# of true rumors372205
# of unverified rumors374203
Avg. time length (hr)1,337848
Avg. # of posts223251
Max #of posts1,7682,765
Min # of posts5581

Table 2 . Statistics of PHEME dataset.

EventsThreadsTweetsRumorsNon-rumorsTrueFalseUnverified
Charlie Hebdo2,07938,2684581,621193116149
Sydney siege1,22123,9965226993828654
Ferguson1,14324,175284859108266
Ottawa shooting89012,2844704203297269
German wings crash4694,4892382319411133
Putin missing23883512611209117
Prince Toronto233902229402227
Gurlitt13817961775902
Ebola Essien142261400140
Total6,425105,3542,4024,0231,067638697

Table 3 . Parameters received from different layers of rumor text classification model using word vectors.

Layer (type)Output shapeParam#
embedding(Embedding)(None,93,300)2,724,600
dropout(Dropout)(None,93,300)0
bidirectional(Bidirectional LSTM)(None,93,128)186,880
Bidirectional_1(Bidirectional LSTM)(None,93,128)98,816
bidirectional_2(Bidirectional LSTM)(None,93,128)98,816
bidirectional_3(Bidirectional LSTM)(None,93,128)98,816
bidirectional_4(Bidirectional LSTM)(None,64)41,216
dense(Dense)(None,32)2,080
batch_normalization(BatchNormalization)(None,32)128
dropout_1(Dropout)(None,32)0
Total params: 6,317,985
Trainable params: 526,721
Non-trainable params: 5,791,264

Table 4 . Loss and accuracy comparisons of different rumor text classification models.

ModelsLossAccuracy (%)
TrainingValidationTrainingValidation
Rumor text classification using word vector features0.2740.44988.680.1
Rumor text classification using user metadata features0.4690.55077.574.5
Rumor text classification using reaction features0.6440.64763.563.6
Rumor text classification using ensemble probabilistic features0.2700.43088.982.5

Table 5 . Tweet user metadata features.

Numerical featuresCategorical features
Follower countIs_reply
Retweet countVerified
Favorite countIs_quote_status
No. of symbolsProfile_image_url
No. of user mentionsprofile_background_image_url
No. of hashtagsDefault profile image
No. of URLsDefault profile
PolarityProfile_use_background_image
Text lengthHas_location
Post ageHas_url
Status count
Friends count
Favorites count of user
Listed count
Account age
Screen name length
The time gap between user-created time and tweeted time

Table 6 . Extracted important features of tweet user metadata after applied standard scalar method.

Numerical featuresCategorical features
Text lengthProfile_use_background_image
No. of URLsProfile_background_image_url
Post ageDefault_profile_image
Status countDefault_profile
Listed countVerified
No. of symbols
Posted_in
No. of user mentions
Polarity
Favorites count of user
Favorites count of a tweet
Screen name length
No. of hashtags

Table 7 . Parameters received from different layers of rumor text classification model using tweet user metadata features.

Layer (type)Output shapeParam #
dense_1(Dense)(None,64)1216
dropout_1(Dropout)(None,64)0
dense_2(Dense)(None,64)4160
dropout_2(Dropout)(None,64)0
dense_3(Dense)(None,1)65
Total params: 5,441
Trainable params: 5,441
Non-trainable params: 0

Table 8 . Twitter reaction features.

Twitter reaction features
Retweet count
Favorite count
Sentiments
No. of hashtags
Favorite weighted polarity
Retweet-weighted polarity
Favorite retweet-weighted polarity

Table 9 . Useful Twitter reaction features after applied feature correlation method.

Twitter reaction features
Favorite count
Sentiments (Polarity)
No. of hashtags

Table 10 . Parameters received from different layers of rumor text classification model using tweet reaction features.

Layer (type)Output shapeParam #
dense_1(Dense)(4598,32)256
drouput_1(Dropout)(4598,32)0
dense_2(Dense)(4598,16)528
dropout_2(Dropout)(4598,16)0
dense_3(Dense)(4598,16)272
dense_4(Dense)(4598,1)17
Total params: 1,073
Trainable params: 1,073
Non-trainable params: 0

Table 11 . Parameters received from different layers of rumor text classification model using ensemble probabilistic features.

Layer (type)Output shapeParam #
dense_1(Dense)(None,64)256
droupout_1(Dropout)(None,64)0
dense_2(Dense)(None,64)4160
dropout_2(Dropout)(None,64)0
dense_8(Dense)(None,1)65
Total params: 4,481
Trainable params: 4,481
Non-trainable params: 0

Share this article on :

Related articles in IJFIS