Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 310-316

Published online September 25, 2021

https://doi.org/10.5391/IJFIS.2021.21.3.310

© The Korean Institute of Intelligent Systems

Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset

Joshua Evan Arijanto, Steven Geraldy, Cyrena Tania, and Derwin Suhartono

Department of Computer Science, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to :
Derwin Suhartono (dsuhartono@binus.edu)

Received: March 17, 2021; Revised: June 28, 2021; Accepted: July 8, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Personality traits can be inferred from a person’s behavioral patterns. One example is when writing posts on social media. Extracting information about individual personalities can yield enormous benefits for various applications such as recommendation systems, marketing, or hiring employees. The objective of this research is to build a personality prediction system that uses English texts from Twitter as a dataset to predict personality traits. This research uses the Big Five personality traits theory to analyze personality traits, which consist of openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several classifiers were used in this research, such as support vector machine, convolutional neural network, and variants of bidirectional encoder representations from transformers (BERT). To improve the performance, we implemented several feature extraction techniques, such as N-gram, linguistic inquiry and word count (LIWC), word embedding, and data augmentation. The best results were obtained by fine-tuning the BERT model and using it as the main classifier of the personality prediction system. We conclude that the BERT performance could be improved by using individual tweets instead of concatenated ones.

Keywords: Personality prediction, Twitter, Big Five personality traits, BERT

We would like to express our deep gratitude to Fabio Celli, Ph.D., for providing the dataset and constructive advice on this research.

No potential conflict of interest relevant to this article was reported.

Joshua Evan Arijanto is an undergraduate at Bina Nusantara University (BINUS). His research interests include natural language processing and personality prediction.

E-mail: joshua.arijanto@binus.ac.id


Steven Geraldy is an undergraduate at Bina Nusantara University (BINUS). His research interests include natural language processing and personality prediction.

E-mail: steven.geraldy@binus.ac.id


Cyrena Tania is an undergraduate at Bina Nusantara University (BINUS). Her research interests include natural language processing and personality prediction.

E-mail: cyrena.tania@binus.ac.id


Derwin Suhartono is faculty member of Bina Nusantara University, Indonesia. He got his Ph.D. degree in computer science from Universitas Indonesia in 2018. His research fields are natural language processing. Recently, he is continually doing research in argumentation mining and personality recognition. He actively involves in Indonesia Association of Computational Linguistics (INACL), a national scientific association in Indonesia. He has his professional memberships in ACM, INSTICC, and IACT. He also takes role as reviewer in several international conferences and journals.

E-mail: dsuhartono@binus.edu


Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 310-316

Published online September 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.3.310

Copyright © The Korean Institute of Intelligent Systems.

Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset

Joshua Evan Arijanto, Steven Geraldy, Cyrena Tania, and Derwin Suhartono

Department of Computer Science, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to:Derwin Suhartono (dsuhartono@binus.edu)

Received: March 17, 2021; Revised: June 28, 2021; Accepted: July 8, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Personality traits can be inferred from a person’s behavioral patterns. One example is when writing posts on social media. Extracting information about individual personalities can yield enormous benefits for various applications such as recommendation systems, marketing, or hiring employees. The objective of this research is to build a personality prediction system that uses English texts from Twitter as a dataset to predict personality traits. This research uses the Big Five personality traits theory to analyze personality traits, which consist of openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several classifiers were used in this research, such as support vector machine, convolutional neural network, and variants of bidirectional encoder representations from transformers (BERT). To improve the performance, we implemented several feature extraction techniques, such as N-gram, linguistic inquiry and word count (LIWC), word embedding, and data augmentation. The best results were obtained by fine-tuning the BERT model and using it as the main classifier of the personality prediction system. We conclude that the BERT performance could be improved by using individual tweets instead of concatenated ones.

Keywords: Personality prediction, Twitter, Big Five personality traits, BERT

Fig 1.

Figure 1.

Twitter 2010–2019 worldwide active monthly user [3].

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 2.

Figure 2.

Experiment flowchart for each single scenario.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 3.

Figure 3.

Cumulative frequencies of the WordPiece tokenized concatenated tweets.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 4.

Figure 4.

Cumulative frequencies of the WordPiece tokenized the tweets.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 5.

Figure 5.

Working procedure of the proposed method.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Table 1 . Dataset label distribution.

TraitOriginalFinal
HighLowHighLow
AGR456529401390

CON529369468323

EXT457441420371

NEU449449395396

OPN477421401390

AGR, agreeableness; CON, conscientiousness; EXT, extraversion; NEU, neuroticism; OPN, openness..


Table 2 . Experimented feature set (scenario) from dataset.

NameDescriptionClassifier
FS0(previous research, multilingual) Binary character N-gram, LIWC, metadataSVM, CNN
FS1(multilingual) TF-IDF character N-gram (N=3,4,5)
FS2TF-IDF Word N-gram (N=1,2,3)
FS3TF-IDF Word N-gram (N=2)
FS4TF-IDF Word N-gram (N=1,2,3), ANOVA
FS5TF-IDF word N-gram (N=1,2,3), LIWC, ANOVA
FSWTextCNN, BERT, BERT-LR, BERT-SVM
FSWAText + Data Augmentation
FSWSText (split)
FSWSAText (split) + Data Augmentation

Table 3 . Evaluation metric result of the fine-tuned BERT variants.

ScenarioA-AVGF-AVGP-AVGR-AVG
FSW0.550.550.560.59
FSWA0.5360.550.550.55
FSWS0.600.590.600.646
FSWSA0.580.550.570.57

The average was calculated for all five trait scores..


Table 4 . Accuracy and F1 score metric result of all the classifiers.

ModelA-AVGF-AVG
Celli_SVM0.591-
Celli_AutoWEKA0.67-
SVM0.6240.576
CNN0.5330.58
BERT0.600.59
BERT-LR0.5640.532
BERT-SVM0.6060.516

The average was calculated for all five trait scores..


Share this article on :

Related articles in IJFIS

Most KeyWord