Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 310-316

Published online September 25, 2021

https://doi.org/10.5391/IJFIS.2021.21.3.310

© The Korean Institute of Intelligent Systems

Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset

Joshua Evan Arijanto, Steven Geraldy, Cyrena Tania, and Derwin Suhartono

Department of Computer Science, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to :
Derwin Suhartono (dsuhartono@binus.edu)

Received: March 17, 2021; Revised: June 28, 2021; Accepted: July 8, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Personality traits can be inferred from a person’s behavioral patterns. One example is when writing posts on social media. Extracting information about individual personalities can yield enormous benefits for various applications such as recommendation systems, marketing, or hiring employees. The objective of this research is to build a personality prediction system that uses English texts from Twitter as a dataset to predict personality traits. This research uses the Big Five personality traits theory to analyze personality traits, which consist of openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several classifiers were used in this research, such as support vector machine, convolutional neural network, and variants of bidirectional encoder representations from transformers (BERT). To improve the performance, we implemented several feature extraction techniques, such as N-gram, linguistic inquiry and word count (LIWC), word embedding, and data augmentation. The best results were obtained by fine-tuning the BERT model and using it as the main classifier of the personality prediction system. We conclude that the BERT performance could be improved by using individual tweets instead of concatenated ones.

Keywords: Personality prediction, Twitter, Big Five personality traits, BERT

Billions of people around the world use social media to interact with others and share information. Through social media, people tend to express their feelings in words and characters. However, it is challenging to infer a person’s personality through words and characters on social media.

Personality is a key factor that affects people’s behavior. Personality is defined as a characteristic set of behaviors, cognitions, and emotional patterns formed by biological and environmental factors [1]. When encountering a certain situation, a person’s reaction, expression, and emotion vary depending on their personality. Research shows that a person’s personality information could help management teams decide on a working team because personality can affect work performance [2].

Social media platforms are full of personal information, as it records people’s behaviors and interactions. Twitter is one of the largest social media platforms in the world, registering 330 million active users as of 2019 [3] (Figure 1). As a microblogging platform, Twitter enables people to share information and communicate with each other in real time.

In this research, we aim to build a personality prediction system that uses English texts from Twitter as a dataset to predict a user’s personality. We used the dataset from a previous research [4], which contains multilingual tweets from 900 users labeled using Big Five personality traits. To build a text classification model, we propose a fine-tuned bidirectional encoding representation from transformers (BERT) as the main model.

There has been much research related to personality prediction using text datasets. Celli and Lepri [4] compared Big Five Traits and MBTI personality theory from a computing perspective using Twitter datasets. Using a support vector machine (SVM) and AutoWEKA as classifiers, the results show that prediction using MBTI classes achieved better accuracy than the one using Big Five Traits. Although Big Five Traits are much more informative, the variability of the performance also depends on the algorithm used for the prediction. Kazameini et al. [5] presented a deep learning model that outperformed the state-of-the-art on the stream of consciousness Essays dataset. The best model was built using a Bagged-SVM model over BERT word-embedding ensembles. The Bagged-SVM model was implemented using 10 SVM classifiers to predict in parallel. Another study by Aung and Mint [6] predicted personality based on the content of Facebook users. Several classifiers were compared with predict Big Five personality traits, including SVM, convolutional neural networks (CNN), XGBoost, multi-layer perceptron, and mean absolute error (MAE). Moreover, the linguistic features of the different approaches were also applied when building the prediction system. The best performance was achieved using the CNN model with the addition of linguistic inquiry and word count (LIWC) linguistic features and feature selection using the Pearson correlation coefficient. Rahman et al. [7] proposed a CNN approach by implementing several linguistic features, such as Mairesse baseline feature set, NRC Emotion Lexicon (EmoLex), andWord2vec word embedding. Activation functions such as sigmoid, tanh, and leaky rectified linear units (ReLUs) were compared using the proposed model with the best performance achieved using leaky ReLU. Leonardi et al. [8] presented a multilingual personality trait estimator using the Transformers model by exploiting its capabilities of working at the sentence-level model. The model was built using sentence embeddings from the transformer encoder model and used as the input of a neural network model that performs a continuous regression. The reported result shows a better score than the previous state-of-the-art methods using the myPersonality dataset. An approach by Carducci et al. [9] to predict personality traits from tweets by building a model trained using Facebook text datasets. The model was built using an SVM classifier and Fast-Text word embedding, and then trained with hyperparameter tuning to find the best model configuration. BERT are an open-source language representation model developed by Google in 2018. BERT is designed to pre-train deep bidirectional representations from the unlabeled text by jointly conditioning on both the left and right contexts in all layers [10]. In this manner, BERT considers the same word within a different sentence to have a different meaning.

The dataset used was obtained from previous studies [4]. It consists of 900 Twitter users with binary Big Five labels as y (high) or n (low) for each of the five personality traits (Table 1). Each user is given various Big Five personality tests, ranging from 10 to 44 items for each test. For a maximum of 40 tweets, data from each user were collected through Twitter advanced search queries. The dataset contains multilingual tweets in languages such as English, Italian, Spanish, and Dutch. For each user, the collected tweets were concatenated without a delimiter. The majority of sampled tweets were more than 17, with 794 of 900 users. Thus, in this study, we only used records with more than 17 sampled tweets. In addition, some records with tweets consisting of only links and repeated words were discarded, resulting in final datasets consisting of only 791 users.

Figure 2 presents an overview of the experiment. The dataset was preprocessed into ten different scenarios. Each scenario was tailored to suit specific classifiers. To validate the model during training, stratified 10-fold cross-validation was used. The classifiers were then evaluated in terms of accuracy, precision, recall, and F1 score. To utilize BERT as a classifier in some scenarios, we translated the multilingual dataset into English.

4.1 Preprocessing

Several preprocessing steps need to execute before extracting features from the tweets data.

  • 1) Number of tweets filtration

    We removed text data from less than 17 sampled tweets from the dataset. Bot-generated tweet data that consist of the same recurrence sentence and URL are removed as well.

  • 2) Multilingual text translation

    The multilingual text in the dataset was translated into English for some experiment scenarios.

  • 3) Text cleaning

    We performed several text cleaning methods to ensure that the model learned appropriate data. Several steps were performed, including retweet removal, URL removal, removal of more than three consecutive characters, changing the username to “user” token, changing numbers to “number” token, and acronym elaboration (such as changing “I’ve” to “I have”).

4.2 Feature Extraction

We applied several feature extraction methods to find the best linguistic features that suit the data and model.

  • 1) N-gram

    As stated in [10], multiple languages are much more comparable through character N-grams. Thus, character N-grams were extracted from the multilingual dataset. For the English-translated dataset, word N-grams were extracted. All N-grams were associated with their TF-IDF values.

  • 2) Linguistic inquiry and word count

    In some scenarios, 88 features of the LIWC were extracted from the dataset. These features map words to the psycholinguistic categories as a ratio.

  • 3) Word embedding

    We apply word embedding to every deep learning approach. It is used to convert text into a meaningful vector of numbers. For CNN, 300 dimensions global vectors (GloVe) [11] pre-trained word embeddings. For the BERT model, we used WordPiece word embedding.

  • 4) Analysis of variance (ANOVA)

    In some scenarios, ANOVA was used as the univariate feature selection method. The top 2,000 best F-value features were selected.

  • 5) Text split

    The BERT classifier had a maximum input sequence length of 512. To handle this, in some scenarios, the concatenated tweets of each user were split into single tweets. The binary labels of the concatenated tweets were then spread to all the single tweets.

  • 6) Data augmentation

    Easy data augmentation (EDA) was used to increase the number of training data in some scenarios. EDA yields strong results for smaller datasets [12].

4.3 Classifier

Various classifiers were tested and compared. For the BERT classifier, we compared the fine-tuning approach and the feature-based approach [13]. The classifiers were SVM, CNN, BERT, BERT-Logistic Regression (BERT-LR), and BERT-SVM.

The WordPiece embedding tokenized the concatenated tweets into token sequences of various lengths. Of the 791 concatenated tweets, 525 had fewer than 400 tokens (Figure 3). Thus, for the configuration of a maximum input sequence in the BERT variants, a length of 400 was used.

The text split method separates the concatenated tweets into individual tweets for each user. For scenarios that included the text split method, the WordPiece embedding tokenized tweets have mostly around 80 tokens (Figure 4). Overall, 7,547 of the 7,690 total tweets had less than 80 tokens. Therefore, for BERT variants that used text splits, a maximum input sequence length of 80 was used.

In this research, we experimented with ten different scenarios to find the best method for use in the classifiers. The scenarios have different preprocessing methods. Table 2 describes the ten experimental scenarios with their corresponding classifiers.

The previous research conducted on the same dataset is described as Scenario FS0 with SVM and AutoWEKA Classifier from Table 2. Apart from FS0, this research experimented on different text preprocessing, feature extractions, classifiers, and data manipulation methods (as stated in Section 4.2). We focused highly on the BERT classifier, which used different text preprocessing methods, as stated in Section 4.1, and the feature extraction method included in the BERT model.

All predictive models were evaluated using the F1 measure metric, but we also included accuracy, recall, and precision metrics in our research. Each predictive model was trained into five specific models to predict each trait of the Big Five traits individually. Furthermore, the five trait-specific model results are averaged to be used as a comparison in Tables 3 and 4.

The fine-tuned BERT classifier had the highest F1 score among all classifiers (Table 4). Table 3 shows the average of each metric: accuracy (A-AVG), precision (P-AVG), recall (R-AVG), and F1 score (F-AVG). The bold value represents the best scenario for the fine-tuned BERT model.

Based on Table 3, the fine-tuned BERT achieved the highest score in all metrics in the FSWS scenario, which uses the text split and word embedding. Moreover, the text split method increased the performance of the fine-tuned BERT model. The performance of FSWS is higher than that of FSW, and FSWSA is higher than that of FSWA.

Although data augmentation could increase the amount of dataset, the results showed that it decreased the performance of the fine-tuned BERT. The performance at FSWA is lower than that of FSW, and FSWSA is lower than that of FSWS.

For each classifier, several scenarios were tested and compared. Table 4 lists the best performances achieved for each classifier. The bold values represent the highest values for each metric. In addition, Celli SVM and Celli AutoWEKA are the previous research performances conducted on the same dataset. The average accuracy (A-AVG) was used as an evaluation metric based on previous research comparison purposes. The finetuned BERT, BERT-SVM, and SVM performed better than the Celli SVM model, but worse than the Celli AutoWEKA model.

The F1 score was used as the main evaluation metric to avoid the accuracy paradox. The highest value of the average F1 score (F-AVG) was achieved using the fine-tuned BERT model. However, the feature-based approach of BERT, which is BERT-LR and BERT-SVM, performed poorly in comparison with fine-tuned BERT and SVM. This indicates that, for the current dataset, the fine-tuned approach is better than the feature-based approach BERT. The fine-tuned BERT successfully surpassed the performance of CNN with GloVe word embeddings and SVM.

The fine-tuned BERT model was chosen as the main model owing to the best F1 score evaluation metric result.

5.1 Working Procedure

The final working procedure for predicting personality is based on the fine-tuned trained BERT model, as shown in Figure 5. The method is composed of four chronological main steps: text preprocessing, text tokenization, feature extraction, and classification. The text preprocessing method removes URLs and retweets, changes usernames and digits to tokens, and elaborates acronyms. The preprocessed tweets were then tokenized using the trained BERT tokenizer from the experiment. The maximum token length used was 80, based on the previous dataset on which the BERT was trained. Feature extraction was performed by trained BERT, which resulted in WordPiece embeddings. The WordPiece embeddings were then passed on the trained single-layer perceptron to classify each trait from each tweet as a binary value. To achieve a single binary class for all tweets, a voting mechanism was used to choose the dominant classified binary class. There are five fine-tuned trained BERT models for each specific Big Five traits. Finally, there were five binary classes from each trait that represent the five dimensions of personality expressed in the analyzed tweets.

The aim of this study was to find the best approach to predict personality based on the Big Five personality theory using Twitter as our data source. To do so, we compared several classifiers using preprocessed feature sets. The classifiers used in this research were SVM, CNN, and BERT. Based on the results, the deep learning approach performed better with the BERT model and attained the highest performance score. In the BERT model, applying the text split method to the feature set increased the performance score, while applying data augmentation reduced the performance score. For this dataset, using pre-trained BERT with a fine-tuning approach yields a better result compared with the feature-based approach. The LIWC linguistic feature was also applied, and it showed a better result compared with the feature sets that did not use it. The insufficient number of datasets is the main limitation of this study, as a deep learning model requires a large number of datasets. In future works, it is advisable to implement a semi-supervised approach to obtain a larger dataset or choose a larger dataset in advance.

Fig. 1.

Twitter 2010–2019 worldwide active monthly user [3].


Fig. 2.

Experiment flowchart for each single scenario.


Fig. 3.

Cumulative frequencies of the WordPiece tokenized concatenated tweets.


Fig. 4.

Cumulative frequencies of the WordPiece tokenized the tweets.


Fig. 5.

Working procedure of the proposed method.


Table. 1.

Table 1. Dataset label distribution.

TraitOriginalFinal
HighLowHighLow
AGR456529401390

CON529369468323

EXT457441420371

NEU449449395396

OPN477421401390

AGR, agreeableness; CON, conscientiousness; EXT, extraversion; NEU, neuroticism; OPN, openness..


Table. 2.

Table 2. Experimented feature set (scenario) from dataset.

NameDescriptionClassifier
FS0(previous research, multilingual) Binary character N-gram, LIWC, metadataSVM, CNN
FS1(multilingual) TF-IDF character N-gram (N=3,4,5)
FS2TF-IDF Word N-gram (N=1,2,3)
FS3TF-IDF Word N-gram (N=2)
FS4TF-IDF Word N-gram (N=1,2,3), ANOVA
FS5TF-IDF word N-gram (N=1,2,3), LIWC, ANOVA
FSWTextCNN, BERT, BERT-LR, BERT-SVM
FSWAText + Data Augmentation
FSWSText (split)
FSWSAText (split) + Data Augmentation

Table. 3.

Table 3. Evaluation metric result of the fine-tuned BERT variants.

ScenarioA-AVGF-AVGP-AVGR-AVG
FSW0.550.550.560.59
FSWA0.5360.550.550.55
FSWS0.600.590.600.646
FSWSA0.580.550.570.57

The average was calculated for all five trait scores..


Table. 4.

Table 4. Accuracy and F1 score metric result of all the classifiers.

ModelA-AVGF-AVG
Celli_SVM0.591-
Celli_AutoWEKA0.67-
SVM0.6240.576
CNN0.5330.58
BERT0.600.59
BERT-LR0.5640.532
BERT-SVM0.6060.516

The average was calculated for all five trait scores..


  1. Corr, PJ, and Matthews, G (2009). The Cambridge Handbook of Personality Psychology. Cambridge, UK: Cambridge University Press
    CrossRef
  2. Lambiotte, R, and Kosinski, M (2014). Tracking the digital footprints of personality. Proceedings of the IEEE. 102, 1934-1939. https://doi.org/10.1109/JPROC.2014.2359054
    CrossRef
  3. Statista. (2021) . Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. Available: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
  4. Celli, F, and Lepri, B . Is big five better than MBTI? A personality computing challenge using Twitter data., Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it), 2018, Torino, Italy.
  5. Kazameini, A, Fatehi, S, Mehta, Y, Eetemadi, S, and Cambria, E . Personality trait detection using bagged SVM over BERT word embedding ensembles., Proceedings of the 4th Widening Natural Language Processing Workshop, 2020, Seattle, WA.
  6. Aung, ZMM, and Myint, PH . Personality prediction based on content of Facebook users: a literature review., Proceedings of 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2019, Toyama, Japan, Array, pp.34-38. https://doi.org/10.1109/SNPD.2019.8935692
  7. Rahman, MA, Al Faisal, A, Khanam, T, Amjad, M, and Siddik, MS . Personality detection from text using convolutional neural network., Proceedings of 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019, Dhaka, Bangladesh, Array, pp.1-6. https://doi.org/10.1109/ICASERT.2019.8934548
  8. Leonardi, S, Monti, D, Rizzo, G, and Morisio, M (). Multilingual transformer-based personality traits estimation. Information. 11, 2020. article no. 179
  9. Carducci, G, Rizzo, G, Monti, D, Palumbo, E, and Morisio, M (2018). Twitpersonality: computing personality traits from tweets using word embeddings and supervised learning. Information. 9. article no. 127
    CrossRef
  10. Devlin, J, Chang, MW, Lee, K, and Toutanova, K. (2018) . Bert: pre-training of deep bidirectional transformers for language understanding. Available: https://arxiv.org/abs/1810.04805
  11. Lecluze, C, Rigouste, L, Giguet, E, and Lucas, N (2013). Which granularity to bootstrap a multilingual method of document alignment: character N-grams or word N-grams?. Procedia-Social and Behavioral Sciences. 95, 473-481. https://doi.org/10.1016/j.sbspro.2013.10.671
    CrossRef
  12. Pennington, J, Socher, R, and Manning, CD . Glove: global vectors for word representation., Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, Doha, Qatar, pp.1532-1543.
  13. Wei, J, and Zou, K . Eda: easy data augmentation techniques for boosting performance on text classification tasks., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, Hong Kong, China, pp.6381-6387.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 310-316

Published online September 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.3.310

Copyright © The Korean Institute of Intelligent Systems.

Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset

Joshua Evan Arijanto, Steven Geraldy, Cyrena Tania, and Derwin Suhartono

Department of Computer Science, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to:Derwin Suhartono (dsuhartono@binus.edu)

Received: March 17, 2021; Revised: June 28, 2021; Accepted: July 8, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Personality traits can be inferred from a person’s behavioral patterns. One example is when writing posts on social media. Extracting information about individual personalities can yield enormous benefits for various applications such as recommendation systems, marketing, or hiring employees. The objective of this research is to build a personality prediction system that uses English texts from Twitter as a dataset to predict personality traits. This research uses the Big Five personality traits theory to analyze personality traits, which consist of openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several classifiers were used in this research, such as support vector machine, convolutional neural network, and variants of bidirectional encoder representations from transformers (BERT). To improve the performance, we implemented several feature extraction techniques, such as N-gram, linguistic inquiry and word count (LIWC), word embedding, and data augmentation. The best results were obtained by fine-tuning the BERT model and using it as the main classifier of the personality prediction system. We conclude that the BERT performance could be improved by using individual tweets instead of concatenated ones.

Keywords: Personality prediction, Twitter, Big Five personality traits, BERT

1. Introduction

Billions of people around the world use social media to interact with others and share information. Through social media, people tend to express their feelings in words and characters. However, it is challenging to infer a person’s personality through words and characters on social media.

Personality is a key factor that affects people’s behavior. Personality is defined as a characteristic set of behaviors, cognitions, and emotional patterns formed by biological and environmental factors [1]. When encountering a certain situation, a person’s reaction, expression, and emotion vary depending on their personality. Research shows that a person’s personality information could help management teams decide on a working team because personality can affect work performance [2].

Social media platforms are full of personal information, as it records people’s behaviors and interactions. Twitter is one of the largest social media platforms in the world, registering 330 million active users as of 2019 [3] (Figure 1). As a microblogging platform, Twitter enables people to share information and communicate with each other in real time.

In this research, we aim to build a personality prediction system that uses English texts from Twitter as a dataset to predict a user’s personality. We used the dataset from a previous research [4], which contains multilingual tweets from 900 users labeled using Big Five personality traits. To build a text classification model, we propose a fine-tuned bidirectional encoding representation from transformers (BERT) as the main model.

2. Related Work

There has been much research related to personality prediction using text datasets. Celli and Lepri [4] compared Big Five Traits and MBTI personality theory from a computing perspective using Twitter datasets. Using a support vector machine (SVM) and AutoWEKA as classifiers, the results show that prediction using MBTI classes achieved better accuracy than the one using Big Five Traits. Although Big Five Traits are much more informative, the variability of the performance also depends on the algorithm used for the prediction. Kazameini et al. [5] presented a deep learning model that outperformed the state-of-the-art on the stream of consciousness Essays dataset. The best model was built using a Bagged-SVM model over BERT word-embedding ensembles. The Bagged-SVM model was implemented using 10 SVM classifiers to predict in parallel. Another study by Aung and Mint [6] predicted personality based on the content of Facebook users. Several classifiers were compared with predict Big Five personality traits, including SVM, convolutional neural networks (CNN), XGBoost, multi-layer perceptron, and mean absolute error (MAE). Moreover, the linguistic features of the different approaches were also applied when building the prediction system. The best performance was achieved using the CNN model with the addition of linguistic inquiry and word count (LIWC) linguistic features and feature selection using the Pearson correlation coefficient. Rahman et al. [7] proposed a CNN approach by implementing several linguistic features, such as Mairesse baseline feature set, NRC Emotion Lexicon (EmoLex), andWord2vec word embedding. Activation functions such as sigmoid, tanh, and leaky rectified linear units (ReLUs) were compared using the proposed model with the best performance achieved using leaky ReLU. Leonardi et al. [8] presented a multilingual personality trait estimator using the Transformers model by exploiting its capabilities of working at the sentence-level model. The model was built using sentence embeddings from the transformer encoder model and used as the input of a neural network model that performs a continuous regression. The reported result shows a better score than the previous state-of-the-art methods using the myPersonality dataset. An approach by Carducci et al. [9] to predict personality traits from tweets by building a model trained using Facebook text datasets. The model was built using an SVM classifier and Fast-Text word embedding, and then trained with hyperparameter tuning to find the best model configuration. BERT are an open-source language representation model developed by Google in 2018. BERT is designed to pre-train deep bidirectional representations from the unlabeled text by jointly conditioning on both the left and right contexts in all layers [10]. In this manner, BERT considers the same word within a different sentence to have a different meaning.

3. Dataset

The dataset used was obtained from previous studies [4]. It consists of 900 Twitter users with binary Big Five labels as y (high) or n (low) for each of the five personality traits (Table 1). Each user is given various Big Five personality tests, ranging from 10 to 44 items for each test. For a maximum of 40 tweets, data from each user were collected through Twitter advanced search queries. The dataset contains multilingual tweets in languages such as English, Italian, Spanish, and Dutch. For each user, the collected tweets were concatenated without a delimiter. The majority of sampled tweets were more than 17, with 794 of 900 users. Thus, in this study, we only used records with more than 17 sampled tweets. In addition, some records with tweets consisting of only links and repeated words were discarded, resulting in final datasets consisting of only 791 users.

4. Methodology

Figure 2 presents an overview of the experiment. The dataset was preprocessed into ten different scenarios. Each scenario was tailored to suit specific classifiers. To validate the model during training, stratified 10-fold cross-validation was used. The classifiers were then evaluated in terms of accuracy, precision, recall, and F1 score. To utilize BERT as a classifier in some scenarios, we translated the multilingual dataset into English.

4.1 Preprocessing

Several preprocessing steps need to execute before extracting features from the tweets data.

  • 1) Number of tweets filtration

    We removed text data from less than 17 sampled tweets from the dataset. Bot-generated tweet data that consist of the same recurrence sentence and URL are removed as well.

  • 2) Multilingual text translation

    The multilingual text in the dataset was translated into English for some experiment scenarios.

  • 3) Text cleaning

    We performed several text cleaning methods to ensure that the model learned appropriate data. Several steps were performed, including retweet removal, URL removal, removal of more than three consecutive characters, changing the username to “user” token, changing numbers to “number” token, and acronym elaboration (such as changing “I’ve” to “I have”).

4.2 Feature Extraction

We applied several feature extraction methods to find the best linguistic features that suit the data and model.

  • 1) N-gram

    As stated in [10], multiple languages are much more comparable through character N-grams. Thus, character N-grams were extracted from the multilingual dataset. For the English-translated dataset, word N-grams were extracted. All N-grams were associated with their TF-IDF values.

  • 2) Linguistic inquiry and word count

    In some scenarios, 88 features of the LIWC were extracted from the dataset. These features map words to the psycholinguistic categories as a ratio.

  • 3) Word embedding

    We apply word embedding to every deep learning approach. It is used to convert text into a meaningful vector of numbers. For CNN, 300 dimensions global vectors (GloVe) [11] pre-trained word embeddings. For the BERT model, we used WordPiece word embedding.

  • 4) Analysis of variance (ANOVA)

    In some scenarios, ANOVA was used as the univariate feature selection method. The top 2,000 best F-value features were selected.

  • 5) Text split

    The BERT classifier had a maximum input sequence length of 512. To handle this, in some scenarios, the concatenated tweets of each user were split into single tweets. The binary labels of the concatenated tweets were then spread to all the single tweets.

  • 6) Data augmentation

    Easy data augmentation (EDA) was used to increase the number of training data in some scenarios. EDA yields strong results for smaller datasets [12].

4.3 Classifier

Various classifiers were tested and compared. For the BERT classifier, we compared the fine-tuning approach and the feature-based approach [13]. The classifiers were SVM, CNN, BERT, BERT-Logistic Regression (BERT-LR), and BERT-SVM.

The WordPiece embedding tokenized the concatenated tweets into token sequences of various lengths. Of the 791 concatenated tweets, 525 had fewer than 400 tokens (Figure 3). Thus, for the configuration of a maximum input sequence in the BERT variants, a length of 400 was used.

The text split method separates the concatenated tweets into individual tweets for each user. For scenarios that included the text split method, the WordPiece embedding tokenized tweets have mostly around 80 tokens (Figure 4). Overall, 7,547 of the 7,690 total tweets had less than 80 tokens. Therefore, for BERT variants that used text splits, a maximum input sequence length of 80 was used.

In this research, we experimented with ten different scenarios to find the best method for use in the classifiers. The scenarios have different preprocessing methods. Table 2 describes the ten experimental scenarios with their corresponding classifiers.

The previous research conducted on the same dataset is described as Scenario FS0 with SVM and AutoWEKA Classifier from Table 2. Apart from FS0, this research experimented on different text preprocessing, feature extractions, classifiers, and data manipulation methods (as stated in Section 4.2). We focused highly on the BERT classifier, which used different text preprocessing methods, as stated in Section 4.1, and the feature extraction method included in the BERT model.

5. Results

All predictive models were evaluated using the F1 measure metric, but we also included accuracy, recall, and precision metrics in our research. Each predictive model was trained into five specific models to predict each trait of the Big Five traits individually. Furthermore, the five trait-specific model results are averaged to be used as a comparison in Tables 3 and 4.

The fine-tuned BERT classifier had the highest F1 score among all classifiers (Table 4). Table 3 shows the average of each metric: accuracy (A-AVG), precision (P-AVG), recall (R-AVG), and F1 score (F-AVG). The bold value represents the best scenario for the fine-tuned BERT model.

Based on Table 3, the fine-tuned BERT achieved the highest score in all metrics in the FSWS scenario, which uses the text split and word embedding. Moreover, the text split method increased the performance of the fine-tuned BERT model. The performance of FSWS is higher than that of FSW, and FSWSA is higher than that of FSWA.

Although data augmentation could increase the amount of dataset, the results showed that it decreased the performance of the fine-tuned BERT. The performance at FSWA is lower than that of FSW, and FSWSA is lower than that of FSWS.

For each classifier, several scenarios were tested and compared. Table 4 lists the best performances achieved for each classifier. The bold values represent the highest values for each metric. In addition, Celli SVM and Celli AutoWEKA are the previous research performances conducted on the same dataset. The average accuracy (A-AVG) was used as an evaluation metric based on previous research comparison purposes. The finetuned BERT, BERT-SVM, and SVM performed better than the Celli SVM model, but worse than the Celli AutoWEKA model.

The F1 score was used as the main evaluation metric to avoid the accuracy paradox. The highest value of the average F1 score (F-AVG) was achieved using the fine-tuned BERT model. However, the feature-based approach of BERT, which is BERT-LR and BERT-SVM, performed poorly in comparison with fine-tuned BERT and SVM. This indicates that, for the current dataset, the fine-tuned approach is better than the feature-based approach BERT. The fine-tuned BERT successfully surpassed the performance of CNN with GloVe word embeddings and SVM.

The fine-tuned BERT model was chosen as the main model owing to the best F1 score evaluation metric result.

5.1 Working Procedure

The final working procedure for predicting personality is based on the fine-tuned trained BERT model, as shown in Figure 5. The method is composed of four chronological main steps: text preprocessing, text tokenization, feature extraction, and classification. The text preprocessing method removes URLs and retweets, changes usernames and digits to tokens, and elaborates acronyms. The preprocessed tweets were then tokenized using the trained BERT tokenizer from the experiment. The maximum token length used was 80, based on the previous dataset on which the BERT was trained. Feature extraction was performed by trained BERT, which resulted in WordPiece embeddings. The WordPiece embeddings were then passed on the trained single-layer perceptron to classify each trait from each tweet as a binary value. To achieve a single binary class for all tweets, a voting mechanism was used to choose the dominant classified binary class. There are five fine-tuned trained BERT models for each specific Big Five traits. Finally, there were five binary classes from each trait that represent the five dimensions of personality expressed in the analyzed tweets.

6. Conclusion

The aim of this study was to find the best approach to predict personality based on the Big Five personality theory using Twitter as our data source. To do so, we compared several classifiers using preprocessed feature sets. The classifiers used in this research were SVM, CNN, and BERT. Based on the results, the deep learning approach performed better with the BERT model and attained the highest performance score. In the BERT model, applying the text split method to the feature set increased the performance score, while applying data augmentation reduced the performance score. For this dataset, using pre-trained BERT with a fine-tuning approach yields a better result compared with the feature-based approach. The LIWC linguistic feature was also applied, and it showed a better result compared with the feature sets that did not use it. The insufficient number of datasets is the main limitation of this study, as a deep learning model requires a large number of datasets. In future works, it is advisable to implement a semi-supervised approach to obtain a larger dataset or choose a larger dataset in advance.

Fig 1.

Figure 1.

Twitter 2010–2019 worldwide active monthly user [3].

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 2.

Figure 2.

Experiment flowchart for each single scenario.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 3.

Figure 3.

Cumulative frequencies of the WordPiece tokenized concatenated tweets.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 4.

Figure 4.

Cumulative frequencies of the WordPiece tokenized the tweets.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Fig 5.

Figure 5.

Working procedure of the proposed method.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 310-316https://doi.org/10.5391/IJFIS.2021.21.3.310

Table 1 . Dataset label distribution.

TraitOriginalFinal
HighLowHighLow
AGR456529401390

CON529369468323

EXT457441420371

NEU449449395396

OPN477421401390

AGR, agreeableness; CON, conscientiousness; EXT, extraversion; NEU, neuroticism; OPN, openness..


Table 2 . Experimented feature set (scenario) from dataset.

NameDescriptionClassifier
FS0(previous research, multilingual) Binary character N-gram, LIWC, metadataSVM, CNN
FS1(multilingual) TF-IDF character N-gram (N=3,4,5)
FS2TF-IDF Word N-gram (N=1,2,3)
FS3TF-IDF Word N-gram (N=2)
FS4TF-IDF Word N-gram (N=1,2,3), ANOVA
FS5TF-IDF word N-gram (N=1,2,3), LIWC, ANOVA
FSWTextCNN, BERT, BERT-LR, BERT-SVM
FSWAText + Data Augmentation
FSWSText (split)
FSWSAText (split) + Data Augmentation

Table 3 . Evaluation metric result of the fine-tuned BERT variants.

ScenarioA-AVGF-AVGP-AVGR-AVG
FSW0.550.550.560.59
FSWA0.5360.550.550.55
FSWS0.600.590.600.646
FSWSA0.580.550.570.57

The average was calculated for all five trait scores..


Table 4 . Accuracy and F1 score metric result of all the classifiers.

ModelA-AVGF-AVG
Celli_SVM0.591-
Celli_AutoWEKA0.67-
SVM0.6240.576
CNN0.5330.58
BERT0.600.59
BERT-LR0.5640.532
BERT-SVM0.6060.516

The average was calculated for all five trait scores..


References

  1. Corr, PJ, and Matthews, G (2009). The Cambridge Handbook of Personality Psychology. Cambridge, UK: Cambridge University Press
    CrossRef
  2. Lambiotte, R, and Kosinski, M (2014). Tracking the digital footprints of personality. Proceedings of the IEEE. 102, 1934-1939. https://doi.org/10.1109/JPROC.2014.2359054
    CrossRef
  3. Statista. (2021) . Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. Available: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
  4. Celli, F, and Lepri, B . Is big five better than MBTI? A personality computing challenge using Twitter data., Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it), 2018, Torino, Italy.
  5. Kazameini, A, Fatehi, S, Mehta, Y, Eetemadi, S, and Cambria, E . Personality trait detection using bagged SVM over BERT word embedding ensembles., Proceedings of the 4th Widening Natural Language Processing Workshop, 2020, Seattle, WA.
  6. Aung, ZMM, and Myint, PH . Personality prediction based on content of Facebook users: a literature review., Proceedings of 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2019, Toyama, Japan, Array, pp.34-38. https://doi.org/10.1109/SNPD.2019.8935692
  7. Rahman, MA, Al Faisal, A, Khanam, T, Amjad, M, and Siddik, MS . Personality detection from text using convolutional neural network., Proceedings of 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019, Dhaka, Bangladesh, Array, pp.1-6. https://doi.org/10.1109/ICASERT.2019.8934548
  8. Leonardi, S, Monti, D, Rizzo, G, and Morisio, M (). Multilingual transformer-based personality traits estimation. Information. 11, 2020. article no. 179
  9. Carducci, G, Rizzo, G, Monti, D, Palumbo, E, and Morisio, M (2018). Twitpersonality: computing personality traits from tweets using word embeddings and supervised learning. Information. 9. article no. 127
    CrossRef
  10. Devlin, J, Chang, MW, Lee, K, and Toutanova, K. (2018) . Bert: pre-training of deep bidirectional transformers for language understanding. Available: https://arxiv.org/abs/1810.04805
  11. Lecluze, C, Rigouste, L, Giguet, E, and Lucas, N (2013). Which granularity to bootstrap a multilingual method of document alignment: character N-grams or word N-grams?. Procedia-Social and Behavioral Sciences. 95, 473-481. https://doi.org/10.1016/j.sbspro.2013.10.671
    CrossRef
  12. Pennington, J, Socher, R, and Manning, CD . Glove: global vectors for word representation., Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, Doha, Qatar, pp.1532-1543.
  13. Wei, J, and Zou, K . Eda: easy data augmentation techniques for boosting performance on text classification tasks., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, Hong Kong, China, pp.6381-6387.

Share this article on :

Related articles in IJFIS

Most KeyWord