Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 283-289

Published online December 25, 2019

https://doi.org/10.5391/IJFIS.2019.19.4.283

© The Korean Institute of Intelligent Systems

Identifying Personality Traits for Indonesian User from Twitter Dataset

Nicholaus Hendrik Jeremy, Cristian Prasetyo, and Derwin Suhartono

Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia

Correspondence to :
Derwin Suhartono (dsuhartono@binus.edu)

Received: August 13, 2019; Revised: December 18, 2019; Accepted: December 21, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Social media allows the user to convey their actual self and share their life experiences through numerous ways. This behavior in turn reflects the user’s personality. In this paper, we experiment to automatically predict user’s personality based on Big Five Personality Trait on Twitter. Our focus is towards Indonesian user. Not only word n-gram, Twitter metadata is also used in a certain combination to determine the feature that will be used to predict the personality. Our research also attempts to find optimum setting based on the number of n-gram, classifier, and twitter metadata. Our experiment yields 0.7482 at most on F-Measure. We conclude that among all scenario, twitter metadata is the least impactful feature, while word n-gram impacts the most.

Keywords: Social media, Personality prediction, Twitter, Big five

Nicholaus Hendrik Jeremy is an under-graduate from Bina Nusantara University (BINUS). On March 2019 until February 2020, he has the responsibility to be a research assistant at BINUS School of Computer Science. His research interest includes natural language processing and personality prediction.

E-mail: nicholaus.jeremy@binus.ac.id


Cristian Prasetyo is a student of Computer Science of Bina Nusantara University (BINUS). On March 2019 until February 2020, he has the responsibility to be a research assistant at BINUS School of Computer Science. His research interest includes artificial intelligence, natural language processing, and linguistics.

E-mail: nicholaushendrik@gmail.com


Derwin Suhartono is faculty member of Bina Nusantara University, Indonesia. He got his PhD degree in computer science from Universitas Indonesia in 2018. His research fields are natural language processing. Recently, he is continually doing research in argumentation mining and personality recognition. He actively involves in Indonesia Association of Computational Linguistics (INACL), a national scientific association in Indonesia. He has his professional memberships in ACM, INSTICC, and IACT. He also takes role as reviewer in several international conferences and journals.

E-mail: dsuhartono@binus.edu


Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 283-289

Published online December 25, 2019 https://doi.org/10.5391/IJFIS.2019.19.4.283

Copyright © The Korean Institute of Intelligent Systems.

Identifying Personality Traits for Indonesian User from Twitter Dataset

Nicholaus Hendrik Jeremy, Cristian Prasetyo, and Derwin Suhartono

Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia

Correspondence to:Derwin Suhartono (dsuhartono@binus.edu)

Received: August 13, 2019; Revised: December 18, 2019; Accepted: December 21, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Social media allows the user to convey their actual self and share their life experiences through numerous ways. This behavior in turn reflects the user’s personality. In this paper, we experiment to automatically predict user’s personality based on Big Five Personality Trait on Twitter. Our focus is towards Indonesian user. Not only word n-gram, Twitter metadata is also used in a certain combination to determine the feature that will be used to predict the personality. Our research also attempts to find optimum setting based on the number of n-gram, classifier, and twitter metadata. Our experiment yields 0.7482 at most on F-Measure. We conclude that among all scenario, twitter metadata is the least impactful feature, while word n-gram impacts the most.

Keywords: Social media, Personality prediction, Twitter, Big five

Fig 1.

Figure 1.

Number of social media user compared to total population in Indonesia in million according to Hootsuite and We Are Social [2428]. All data is retrieved per January of each year.

The International Journal of Fuzzy Logic and Intelligent Systems 2019; 19: 283-289https://doi.org/10.5391/IJFIS.2019.19.4.283

Fig 2.

Figure 2.

Flow of the experiment.

The International Journal of Fuzzy Logic and Intelligent Systems 2019; 19: 283-289https://doi.org/10.5391/IJFIS.2019.19.4.283

Table 1 . Personalities in big five and its description according to their score.

PersonalityHighLow
OPNAdventurous, abstractPrefer regularity, conventional
CONDisciplined, reliable, strictDisorganized, impulsive, laid back
EXTFriendly, joyousSolitude, independent
AGRCooperative, honestSceptical, suspicious
NEUSelf-conscious, prone to negativityContained, calm

Table 2 . Class distribution.

HighLow
AGR278 (54.7%)230 (45.3%)
CON131 (25.8%)376 (74.2%)
EXT363 (71.5%)145 (28.5%)
NEU221 (43.5%)287 (56.5%)
OPN272 (53.5%)236 (45.5%)

Table 3 . Twitter metadata used compared to related works.

Our metadataList of metadata from [4]List of metadata from [3]List of metadata from [17]
Amount of followerFollowers tweets ratioAmount of tweetsAmount of followers
Amount of followingFavorite tweets to tweets ratioAmount of followersAmount of following
Amount of tweetsHashtag to words ratioTotal of tweets and rewteetsAmount of mentionsb
Amount of favoritesRetweets to retweeted ratioAmount of favoritesAmount of repliesb
Amount of retweetsListed countaUser’s genderaAmount of hashtagsb
Amount of retweetedLink coloraListed countaAmount of urlsb
Amount of mentionText colora-Average word per tweet
Amount of quoteBorder colora-Density of social network
Amount of repliesBackground colora--
Amount of hashtagDefault profile picturea--

aColor hex code, listed count, profile picture, and user gender is not stored in the dataset. It is possible that the user has changed any of it or they have their account suspended. Revising the account risks not only more resource, but also requires revising the manual personality labelling done by the expert, as their personality may have changed [18, 19].

bThe metadata uses both sum and average per tweet [17].


Table 4 . Result on how different word n-gram amount and appended metadata affects the result.

DatasetP-AVGR-AVGF-AVG
10000.76840.7250.718
15000.77280.7360.7306
20000.78080.74240.7372
25000.78740.7490.7436
30000.78760.7490.7428
35000.79320.75420.7468
40000.79320.75260.7452
45000.79260.75060.7426
50000.79120.75060.7426
1000 + metadata0.7450.71640.7152
1500 + metadata0.74580.72160.7192
2000 + metadata0.75360.7290.727
2500 + metadata0.77380.74540.7414
3000 + metadata0.77120.74460.7408
3500 + metadata0.7820.75240.7482
4000 + metadata0.78140.74780.7422
4500 + metadata0.7830.74660.7399
5000 + metadata0.7780.74040.7332

Table 5 . Result on how different combination of metadata affects the result.

DatasetP-AVGR-AVGF-AVG
Not appended0.76840.7250.718
Appended with our own list of metadata0.7450.71640.7152
Appended based on [4]0.76420.730.722
Appended based on [17]0.75260.72340.722
Appended based on [3]0.75340.72060.7178

Table 6 . Result on how different combination of metadata affects the result.

ClassifierP-AVGR-AVGF-AVG
J480.70020.70320.6994
k-NN0.69960.70040.6996
Naïve Bayes0.7450.71640.7152
Random Forest0.7440.74840.744
SMO0.76620.7470.7218

Share this article on :

Related articles in IJFIS

Most KeyWord