Article Search
닫기

## Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 261-266

Published online September 25, 2022

https://doi.org/10.5391/IJFIS.2022.22.3.261

© The Korean Institute of Intelligent Systems

## Efficient Neural Network Design for Identifying Potential Game Item Buyers

Soohyung Lee and Heesung Lee

Railroad Electrical and Electronic Engineering, Korea National University of Transportation, Uiwang, Korea

Correspondence to :
Heesung Lee (hslee0717@ut.ac.kr)

Received: August 26, 2021; Revised: April 5, 2022; Accepted: June 28, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

As competition in the mobile game market continues to intensify, the number of users accessing specific games is decreasing, and marketing costs are rising significantly. Marketing methods to increase average revenue per user (ARPU) are being improved accordingly. In this study, we present a method for predicting the future purchases of new users who start playing mobile games. The proposed system enables effective marketing based on the purchasing propensity of a user over a short period of time. In the game data, the proportion of nonpurchasers compared to purchasers is extremely large because the number of purchasers is small when the game is first released. Therefore, this study use an under-sampling method to handle the unbalanced data. The loss functions used in neural networks are customized to learn a neural network with a high probability that a predicted purchaser will be an actual purchaser. Experiments show that proposed method can predict the purchasing propensity over a month-long period using the 7-day data of a new user.

Keywords: Mobile game, Purchasing behavior prediction, Imbalanced data, Neural network, ARPU

As competition in the mobile game market intensifies, the number of users accessing specific games decreases and marketing costs increase significantly. Accordingly, marketing methods for increasing the average revenue per user (ARPU), which is the average amount paid by game users over a certain period, are being developed [1]. Current marketing systems save user game behaviors and purchase patterns over long periods. They then learn the stored data and recommend effective items to the user. A long timeframe is needed to accumulate data and train an artificial intelligence (AI)-based model. However, games start to promote the purchase of items from the moment the user starts the game for the first time. Therefore, these methods have the disadvantage of providing marketing even to users who do not plan to purchase game items during the data accumulation and learning periods because time is required to accumulate the data; they also require a large amount of data storage space because they store both item purchaser and non-purchaser data [2]. To reduce these shortcomings, in this paper, we propose a system that analyzes the user’s initial game behavior and classifies whether the user is a future game item purchaser or non-purchaser within a short timeframe. At the beginning of the game, non-purchasers form a significantly higher proportion of total users than purchasers; thus, the data to be processed by the item recommendation system is highly skewed. In this study, we resampled the data to handle this imbalance, and customized function to train the data using neural networks. To effectively classify purchasers and non-purchasers, the performance is evaluated using various AI-based algorithms, such as multilayer perceptron (MLP) [3,4], convolutional neural networks (CNN) [57], and long short-term memory (LSTM) [8,9]. The remainder of this paper is organized as follows. In Section 2, we provide a new item purchaser identification system that includes data preprocessing, resampling, and a customized loss function. In Section 3, the proposed scheme is applied to actual game data and its effectiveness is demonstrated by comparing it with various neural networks. Finally, concluding remarks are presented in Section 4.

### 2.1 Preprocessing

This study used game data obtained from 95,761 users playing SuperPlanet’s mobile game [10], “Guardians of the Video Game,” as shown in Figure 1.

Immediately after first playing the game, it was observed that the number of item non-purchasers was much larger than the number of item purchasers; of the 95,761 players, 2,743 (2.9%) were item purchasers. Therefore, if all other users are classified as non-purchasers, the overall recognition rate becomes 97.1%. When an AI system learns using a database in which the amount of data belonging to each class is unbalanced, the system is highly likely to make biased predictions. To solve this problem, we have used resampling to make the number of training data entries equal in both the purchaser and non-purchaser classes, as shown in Figure 2.

Undersampling is a method of removing more visible majority class instances [2]. Through undersampling, non-purchaser data with a large weight are randomly extracted at the same number as the purchaser data with a small weight. The game item purchaser data were divided in a ratio of 8:2 for training and testing, respectively. For both the non-purchaser and purchaser data, 2,194 users were applied to the training data. The actual percentage of non-purchasers is 97.1%, approximately 34 times the percentage of purchasers at 2.9%. Therefore, the number of test data points is 18,666, which is 34 times the number of purchaser test data points.

The data used consisted of 25 features, such as goods, grades, and the number of wins achieved by each user, which were stored every day for 6 days after the user joined the game. User-purchased items and their purchase amounts were also added to the database for 30 days after registration. These data were obtained from 95,761 new users and saved each time they logged into the game. Missing values occurred when the user did not log in for a day or longer. In this study, we looked for dates on which we could predict a user’s purchasing behavior. Therefore, we created a 6 × 25 feature matrix using the average data from days 1 to N (1–6). We also applied the Pearson correlation coefficient (PCC) [11,12] to determine the correlation between variables. PCC is an analysis tool that can be used to understand the actual influence and characteristics of the variables applied. If the absolute value of the correlation coefficient is 1, the observed value appears exactly along a straight line, and as it approaches zero, it deviates from the straight line. Figure 3 shows the value of the PCC calculated using each feature and whether the item was purchased. In this study, features with coefficients greater than 0.1 are selected and used.

### 2.2 Custom Loss Functions

In a binary classification problem, the actual value and the value predicted by the model are represented by a 2 × 2 matrix, as shown in Table 1.

The terms “true” and “false” indicate whether the predicted result is correct [13], while the terms “positive” and “negative” indicate those cases in which the model classifies users as purchasers or non-purchasers. Therefore, we can interpret this as follows:

• • True Positive (TP): Number of purchasers correctly estimated as purchasers

• • True Negative (TN): Number of non-purchasers correctly estimated as non-purchasers

• • False Positive (FP): Number of non-purchasers estimated as purchasers

• • False Negative (FN): Number of purchasers estimated as non-purchasers

The classification error is the most widely used index for data learning in neural networks and is defined as

error=1-TP+TNTP+TN+FP+FN.

The goal of this study is to create a model with a high probability that the predicted purchaser is the actual purchaser. However, when learning using a loss function, as indicated in Eq. (1), there is a problem in that the predicted purchasers include many non-purchasers. When training the model, we focus more on predicting the purchaser class than on predicting all classes and propose four new loss functions and compare them with Eq. (1). The first loss function is customized to increase the precision, and is defined by

error=1-precision=1-TPTP+FP.

Precision is the percentage of actual purchasers that the model predicts as purchasers. The smaller the loss function is, the more likely the model is to correctly predict whether the user is a purchaser. The second loss function is defined as follows:

error=2-(precision+recall)=2-(TPTP+FP+TPTP+FN).

Because loss function (2) only considers the precision, it is likely to be biased toward non-purchasers. To solve this problem, the predicted value, which is the percentage of actual purchasers predicted by the model, is not biased when the recall is considered. The third loss function is defined as follows:

error=FP+FNTP+TN+FP+FN.

Eq. (4) is a simplified version of Eq. (3) which carries out simplification by penalizing incorrect answers instead of providing extra points for correct answers. This loss function is intended to simplify the unexpected effects. The last function is a generalized version of Eq. (4), and is represented as:

error=n1FP+n2FNTP+TN+FP+FN,

where n1 and n2 are coefficients and n1+n2 = 1. We weighted FP and FN to improve the classification performance; the optimal values were selected experimentally.

In this section, we evaluate the experimental results using the precision and recall values. We focused more on precision because we needed fewer non-purchasers in our predictive purchaser list. The test data consisted of 549 purchasers and 18,666 non-purchasers. The experimental results for determining the coefficients mentioned in (5) are listed in Table 2. In this experiment, data from the first day of game access were used.

We used the MLP [14] shown in Figure 4, whose layers and experimental parameters are listed in Table 3.

When n1 is 0.1–0.3 and 0.9, all predictions are biased to non-purchasers and cannot be displayed. We obtained the highest precision value when n1 was 0.75, and n2 was 0.25. We then compared our loss functions (2), (3), (4), and (5) with the classification error (CE) and binary cross-entropy (BCE), defined by Eqs. (1) and (6), respectively.

error=-i=12tilog(si),

where ti is the ground truth and si is the output of the MLP. The data on the first day after connecting to game (1) and the average of the data on the first and second days (1–2) was used.

In addition, the average of the data from the first to the sixth day (1–6) was used. A comparison of the results is presented in Table 4.

The results in Eqs. (2) and (5) show that the best precision is 100. This means that the list of users who predict that the MLP model is a purchaser consists of all the actual purchasers. We can observe that the proportion of actual purchasers among the predicted purchasers is much higher than when the traditional loss function is used. The results of the deep neural networks, that is, CNN and LSTM, are compared with those of MLP and are listed in Table 5. The loss function for all neural networks uses Eq. (5), where n1 is 0.75.

The comparison in Table 5 shows that MLP and LSTM have the highest precision. CNN has a convolution layer to extract meaningful features from data, but MLP and LSTM start classification without a feature extraction process. We believe that MLP and LSTM have excellent precision because meaningful features are selected from the game data using PCC. Because the number of parameters of MLP is much smaller than the number of parameters used by deep learning-based neural networks, MLP is an excellent choice for game item selection systems. Experiments show that MLP can predict the purchasing propensity over a month-long period using the 1-day data of new users. The layers and parameters of CNN and LSTM are listed in Tables 6 and 7, respectively.

In this study, we presented a method for predicting in-game item purchases by new users who have very recently started playing the game. The proposed system enables effective marketing based on a user’s purchasing propensity over a short period of time. In our game data, 2,743 out of 95,761 users were purchasers, and the proportion of non-purchasers compared to purchasers was extremely large. This is a general trend for all game data, because the number of purchasers is small when the game is first made available. Therefore, this study used an undersampling method to handle the unbalanced data. The loss functions used in neural networks are customized to create a model with a high probability of a predicted purchaser becoming an actual purchaser. In the future, we will proceed with research on applying the latest deep learning-based classifiers, as well as with studies to improve their performance. We also plan to include tasks in the game that can predict the user’s psychological or mental state as external factors. Subsequently, a study will be conducted to predict whether a game item is purchased by combining it with internal factors to increase ARPU.

### Conflict of Interest

Fig. 1.

Splash screen for the game “Guardians of the Video Game.”

Fig. 2.

Data resampling process.

Fig. 3.

Pearson correlation coefficient with purchase.

Fig. 4.

Multi-layer perceptron.

Table. 1.

Table 1. TP, FP, FN, and TN.

PurchaserNon-purchaser
Estimated purchaserTrue Positive (TP)False Positive (FP)
Estimated non-purchaserFalse Negative (FN)True Negative (TN)

Table. 2.

Table 2. Comparison of weights based on custom loss function.

n1n2PrecisionRecall
0.10.9Predict all as purchasers
0.20.8Predict all as purchasers
0.30.7Predict all as purchasers
0.40.64.3891.99
0.50.57.3683.79
0.60.416.7364.26
0.70.380.9745.7
0.750.2510047.27
0.80.210046.68
0.90.110046.09

Table. 3.

Table 3. MLP parameters.

LayerNumber of nodesActivation function
BatchNormalization--
Dense128ReLU
Dense128ReLU
Dense64ReLU
Dense64ReLU
Dense (output)1Sigmoid

Table. 4.

Table 4. Comparison of results based on the loss function.

LossDay

11–21–31–41–51–6
CE (1)Precision10.768.5715.8114.2414.2716.26
Recall7584.8579.2383.4387.0184.10

BCE (6)Precision13.3010.1116.3314.7212.3315.23
Recall65.4383.1479.4281.5686.0987.43

(2)Precision100100100100100100
Recall20.517.2024.5925.147.2415.34

(3)Precision2.957.8511.028.4812.8412.34
Recall10088.4586.1689.0189.4291.68

(4)Precision7.357.1610.6212.2611.0111.92
Recall81.8487.1283.7983.6190.7287.25

(5)Precision10050.1930.9532.6029.9227.29
n1 = 0.75Recall48.0550.3866.1266.4871.3276.52

Table. 5.

Table 5. Comparison of the results when using a neural network.

NetworkDay

11–21–31–41–51–6
MLPPrecision10050.1930.9532.6029.9227.29
Recall48.0550.3866.1266.4871.3276.52

CNNPrecision38.9127.7529.5525.1034.7024.71
Recall51.3757.7746.4559.4058.2671.53

LSTMPrecision10052.8026.6322.0326.1129.60
Recall46.4842.8055.0148.4243.7857.86

Table. 6.

Table 6. CNN parameters.

LayerNumber of nodesActivation function
BatchNormalization--
Conv1D256ReLU
GlobalMaxPooling1D--
Dense (output)1Sigmoid

Table. 7.

Table 7. LSTM parameters.

LayerNumber of nodesActivation function
BatchNormalization--
LSTM128-
Dense (output)1Sigmoid

1. Hamari, J, and Lehdonvirta, V (2010). Game design as marketing: how game mechanics create demand for virtual goods. International Journal of Business Science & Applied Management. 5, 9.
2. Lee, S, Kim, K, and Lee, H (2021). Artificial intelligent-based game item purchaser identification system using imbalanced data. Journal of Korean Institute of Intelligent System. 32, 158-163. https://doi.org/10.5391/JKIIS.2021.31.2.158
3. Lee, H, Hong, S, and Kim, E (2009). A new genetic feature selection with neural network ensemble. International Journal of Computer Mathematics. 86, 1105-1117. https://doi.org/10.1080/00207160701724760
4. Lee, H, Hong, S, and Kim, E (2009). Neural network ensemble with probabilistic fusion and its application to gait recognition. Neurocomputing. 72, 1557-1564. https://doi.org/10.1016/j.neucom.2008.09.009
5. Kim, K, Hong, S, Choi, B, and Kim, E (2018). Probabilistic ship detection and classification using deep learning. Applied Sciences. 8. article no 936
6. Kim, J, Hong, S, and Kim, E (2021). Novel on-road vehicle detection system using multi-stage convolutional neural network. IEEE Access. 9, 94371-94385. https://doi.org/10.1109/ACCESS.2021.3093698
7. Hyun, J, Seong, H, and Kim, E (2021). Universal pooling: a new pooling method for convolutional neural networks. Expert Systems with Applications. 180. article no 115084
8. Hong, S, Lee, S, and Lee, H (2020). A study on gait recognition using deep neural network ensemble. The Transactions of the Korean Institute of Electrical Engineers. 69, 1125-1130. https://doi.org/10.5370/KIEE.2020.69.7.1125
9. Adil, M, Javaid, N, Qasim, U, Ullah, I, Shafiq, M, and Choi, JG (2020). LSTM and bat-based RUSBoost approach for electricity theft detection. Applied Sciences. 10. article no 4378
10. Google Play. (2022) . Guardians of the Video Game. Available: http://play.google.com/store/apps/details?id=com.superplanet.videogameguardians
11. Benesty, J, Chen, J, Huang, Y, and Cohen, I (2009). Pearson correlation coefficient. Noise Reduction in Speech Processing. Heidelberg, Germany: Springer, pp. 1-4 https://doi.org/10.1007/978-3-642-00296-0_5
12. Mu, Y, Liu, X, and Wang, L (2018). A Pearson’s correlation coefficient based decision tree and its parallel implementation. Information Sciences. 435, 40-58. https://doi.org/10.1016/j.ins.2017.12.059
13. Jain, S, White, M, and Radivojac, P . Recovering true classifier performance in positive-unlabeled learning., Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017, San Francisco, CA, Array, pp.2066-2072. https://doi.org/10.1609/aaai.v31i1.10937
14. Lee, H, Lee, H, and Kim, E (2014). A new gait recognition system based on hierarchical fair competition-based parallel genetic algorithm and selective neural network ensemble. International Journal of Control, Automation and Systems. 12, 202-207. https://doi.org/10.1007/s12555-012-0154-6

Soohyung Lee received the B.S. in railroad electrical and electronics engineering at Korea National University of Transportation (KNUT), Uiwang-si, Gyeinggido, Korea, in 2020, and is currently pursuing a M.S. in railroad electrical and electronics engineering at KNUT. His current research interests include deep learning, Generative model and intelligent railroad system.

Heesung Lee received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from Yonsei University, Seoul, Korea, in 2003, 2005, and 2010, respectively. From 2011 to 2014, he was a managing researcher with the S1 Corporation, Seoul, Korea. Since 2015, he has been with the railroad electrical and electronics engineering at Korea National University of Transportation (KNUT), Uiwang-si, Gyeonggi-do, Korea, where he is currently an associate professor. His current research interests include computational intelligence, biometrics, and intelligent railroad system.

### Article

#### Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 261-266

Published online September 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.3.261

## Efficient Neural Network Design for Identifying Potential Game Item Buyers

Soohyung Lee and Heesung Lee

Railroad Electrical and Electronic Engineering, Korea National University of Transportation, Uiwang, Korea

Correspondence to:Heesung Lee (hslee0717@ut.ac.kr)

Received: August 26, 2021; Revised: April 5, 2022; Accepted: June 28, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

As competition in the mobile game market continues to intensify, the number of users accessing specific games is decreasing, and marketing costs are rising significantly. Marketing methods to increase average revenue per user (ARPU) are being improved accordingly. In this study, we present a method for predicting the future purchases of new users who start playing mobile games. The proposed system enables effective marketing based on the purchasing propensity of a user over a short period of time. In the game data, the proportion of nonpurchasers compared to purchasers is extremely large because the number of purchasers is small when the game is first released. Therefore, this study use an under-sampling method to handle the unbalanced data. The loss functions used in neural networks are customized to learn a neural network with a high probability that a predicted purchaser will be an actual purchaser. Experiments show that proposed method can predict the purchasing propensity over a month-long period using the 7-day data of a new user.

Keywords: Mobile game, Purchasing behavior prediction, Imbalanced data, Neural network, ARPU

### 1. Introduction

As competition in the mobile game market intensifies, the number of users accessing specific games decreases and marketing costs increase significantly. Accordingly, marketing methods for increasing the average revenue per user (ARPU), which is the average amount paid by game users over a certain period, are being developed [1]. Current marketing systems save user game behaviors and purchase patterns over long periods. They then learn the stored data and recommend effective items to the user. A long timeframe is needed to accumulate data and train an artificial intelligence (AI)-based model. However, games start to promote the purchase of items from the moment the user starts the game for the first time. Therefore, these methods have the disadvantage of providing marketing even to users who do not plan to purchase game items during the data accumulation and learning periods because time is required to accumulate the data; they also require a large amount of data storage space because they store both item purchaser and non-purchaser data [2]. To reduce these shortcomings, in this paper, we propose a system that analyzes the user’s initial game behavior and classifies whether the user is a future game item purchaser or non-purchaser within a short timeframe. At the beginning of the game, non-purchasers form a significantly higher proportion of total users than purchasers; thus, the data to be processed by the item recommendation system is highly skewed. In this study, we resampled the data to handle this imbalance, and customized function to train the data using neural networks. To effectively classify purchasers and non-purchasers, the performance is evaluated using various AI-based algorithms, such as multilayer perceptron (MLP) [3,4], convolutional neural networks (CNN) [57], and long short-term memory (LSTM) [8,9]. The remainder of this paper is organized as follows. In Section 2, we provide a new item purchaser identification system that includes data preprocessing, resampling, and a customized loss function. In Section 3, the proposed scheme is applied to actual game data and its effectiveness is demonstrated by comparing it with various neural networks. Finally, concluding remarks are presented in Section 4.

### 2.1 Preprocessing

This study used game data obtained from 95,761 users playing SuperPlanet’s mobile game [10], “Guardians of the Video Game,” as shown in Figure 1.

Immediately after first playing the game, it was observed that the number of item non-purchasers was much larger than the number of item purchasers; of the 95,761 players, 2,743 (2.9%) were item purchasers. Therefore, if all other users are classified as non-purchasers, the overall recognition rate becomes 97.1%. When an AI system learns using a database in which the amount of data belonging to each class is unbalanced, the system is highly likely to make biased predictions. To solve this problem, we have used resampling to make the number of training data entries equal in both the purchaser and non-purchaser classes, as shown in Figure 2.

Undersampling is a method of removing more visible majority class instances [2]. Through undersampling, non-purchaser data with a large weight are randomly extracted at the same number as the purchaser data with a small weight. The game item purchaser data were divided in a ratio of 8:2 for training and testing, respectively. For both the non-purchaser and purchaser data, 2,194 users were applied to the training data. The actual percentage of non-purchasers is 97.1%, approximately 34 times the percentage of purchasers at 2.9%. Therefore, the number of test data points is 18,666, which is 34 times the number of purchaser test data points.

The data used consisted of 25 features, such as goods, grades, and the number of wins achieved by each user, which were stored every day for 6 days after the user joined the game. User-purchased items and their purchase amounts were also added to the database for 30 days after registration. These data were obtained from 95,761 new users and saved each time they logged into the game. Missing values occurred when the user did not log in for a day or longer. In this study, we looked for dates on which we could predict a user’s purchasing behavior. Therefore, we created a 6 × 25 feature matrix using the average data from days 1 to N (1–6). We also applied the Pearson correlation coefficient (PCC) [11,12] to determine the correlation between variables. PCC is an analysis tool that can be used to understand the actual influence and characteristics of the variables applied. If the absolute value of the correlation coefficient is 1, the observed value appears exactly along a straight line, and as it approaches zero, it deviates from the straight line. Figure 3 shows the value of the PCC calculated using each feature and whether the item was purchased. In this study, features with coefficients greater than 0.1 are selected and used.

### 2.2 Custom Loss Functions

In a binary classification problem, the actual value and the value predicted by the model are represented by a 2 × 2 matrix, as shown in Table 1.

The terms “true” and “false” indicate whether the predicted result is correct [13], while the terms “positive” and “negative” indicate those cases in which the model classifies users as purchasers or non-purchasers. Therefore, we can interpret this as follows:

• • True Positive (TP): Number of purchasers correctly estimated as purchasers

• • True Negative (TN): Number of non-purchasers correctly estimated as non-purchasers

• • False Positive (FP): Number of non-purchasers estimated as purchasers

• • False Negative (FN): Number of purchasers estimated as non-purchasers

The classification error is the most widely used index for data learning in neural networks and is defined as

$error=1-TP+TNTP+TN+FP+FN.$

The goal of this study is to create a model with a high probability that the predicted purchaser is the actual purchaser. However, when learning using a loss function, as indicated in Eq. (1), there is a problem in that the predicted purchasers include many non-purchasers. When training the model, we focus more on predicting the purchaser class than on predicting all classes and propose four new loss functions and compare them with Eq. (1). The first loss function is customized to increase the precision, and is defined by

$error=1-precision=1-TPTP+FP.$

Precision is the percentage of actual purchasers that the model predicts as purchasers. The smaller the loss function is, the more likely the model is to correctly predict whether the user is a purchaser. The second loss function is defined as follows:

$error=2-(precision+recall)=2-(TPTP+FP+TPTP+FN).$

Because loss function (2) only considers the precision, it is likely to be biased toward non-purchasers. To solve this problem, the predicted value, which is the percentage of actual purchasers predicted by the model, is not biased when the recall is considered. The third loss function is defined as follows:

$error=FP+FNTP+TN+FP+FN.$

Eq. (4) is a simplified version of Eq. (3) which carries out simplification by penalizing incorrect answers instead of providing extra points for correct answers. This loss function is intended to simplify the unexpected effects. The last function is a generalized version of Eq. (4), and is represented as:

$error=n1FP+n2FNTP+TN+FP+FN,$

where n1 and n2 are coefficients and n1+n2 = 1. We weighted FP and FN to improve the classification performance; the optimal values were selected experimentally.

### 3. Experiment

In this section, we evaluate the experimental results using the precision and recall values. We focused more on precision because we needed fewer non-purchasers in our predictive purchaser list. The test data consisted of 549 purchasers and 18,666 non-purchasers. The experimental results for determining the coefficients mentioned in (5) are listed in Table 2. In this experiment, data from the first day of game access were used.

We used the MLP [14] shown in Figure 4, whose layers and experimental parameters are listed in Table 3.

When n1 is 0.1–0.3 and 0.9, all predictions are biased to non-purchasers and cannot be displayed. We obtained the highest precision value when n1 was 0.75, and n2 was 0.25. We then compared our loss functions (2), (3), (4), and (5) with the classification error (CE) and binary cross-entropy (BCE), defined by Eqs. (1) and (6), respectively.

$error=-∫i=12tilog(si),$

where ti is the ground truth and si is the output of the MLP. The data on the first day after connecting to game (1) and the average of the data on the first and second days (1–2) was used.

In addition, the average of the data from the first to the sixth day (1–6) was used. A comparison of the results is presented in Table 4.

The results in Eqs. (2) and (5) show that the best precision is 100. This means that the list of users who predict that the MLP model is a purchaser consists of all the actual purchasers. We can observe that the proportion of actual purchasers among the predicted purchasers is much higher than when the traditional loss function is used. The results of the deep neural networks, that is, CNN and LSTM, are compared with those of MLP and are listed in Table 5. The loss function for all neural networks uses Eq. (5), where n1 is 0.75.

The comparison in Table 5 shows that MLP and LSTM have the highest precision. CNN has a convolution layer to extract meaningful features from data, but MLP and LSTM start classification without a feature extraction process. We believe that MLP and LSTM have excellent precision because meaningful features are selected from the game data using PCC. Because the number of parameters of MLP is much smaller than the number of parameters used by deep learning-based neural networks, MLP is an excellent choice for game item selection systems. Experiments show that MLP can predict the purchasing propensity over a month-long period using the 1-day data of new users. The layers and parameters of CNN and LSTM are listed in Tables 6 and 7, respectively.

### 4. Conclusion

In this study, we presented a method for predicting in-game item purchases by new users who have very recently started playing the game. The proposed system enables effective marketing based on a user’s purchasing propensity over a short period of time. In our game data, 2,743 out of 95,761 users were purchasers, and the proportion of non-purchasers compared to purchasers was extremely large. This is a general trend for all game data, because the number of purchasers is small when the game is first made available. Therefore, this study used an undersampling method to handle the unbalanced data. The loss functions used in neural networks are customized to create a model with a high probability of a predicted purchaser becoming an actual purchaser. In the future, we will proceed with research on applying the latest deep learning-based classifiers, as well as with studies to improve their performance. We also plan to include tasks in the game that can predict the user’s psychological or mental state as external factors. Subsequently, a study will be conducted to predict whether a game item is purchased by combining it with internal factors to increase ARPU.

### Fig 1.

Figure 1.

Splash screen for the game “Guardians of the Video Game.”

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 261-266https://doi.org/10.5391/IJFIS.2022.22.3.261

### Fig 2.

Figure 2.

Data resampling process.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 261-266https://doi.org/10.5391/IJFIS.2022.22.3.261

### Fig 3.

Figure 3.

Pearson correlation coefficient with purchase.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 261-266https://doi.org/10.5391/IJFIS.2022.22.3.261

### Fig 4.

Figure 4.

Multi-layer perceptron.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 261-266https://doi.org/10.5391/IJFIS.2022.22.3.261

TP, FP, FN, and TN.

PurchaserNon-purchaser
Estimated purchaserTrue Positive (TP)False Positive (FP)
Estimated non-purchaserFalse Negative (FN)True Negative (TN)

Comparison of weights based on custom loss function.

n1n2PrecisionRecall
0.10.9Predict all as purchasers
0.20.8Predict all as purchasers
0.30.7Predict all as purchasers
0.40.64.3891.99
0.50.57.3683.79
0.60.416.7364.26
0.70.380.9745.7
0.750.2510047.27
0.80.210046.68
0.90.110046.09

MLP parameters.

LayerNumber of nodesActivation function
BatchNormalization--
Dense128ReLU
Dense128ReLU
Dense64ReLU
Dense64ReLU
Dense (output)1Sigmoid

Comparison of results based on the loss function.

LossDay

11–21–31–41–51–6
CE (1)Precision10.768.5715.8114.2414.2716.26
Recall7584.8579.2383.4387.0184.10

BCE (6)Precision13.3010.1116.3314.7212.3315.23
Recall65.4383.1479.4281.5686.0987.43

(2)Precision100100100100100100
Recall20.517.2024.5925.147.2415.34

(3)Precision2.957.8511.028.4812.8412.34
Recall10088.4586.1689.0189.4291.68

(4)Precision7.357.1610.6212.2611.0111.92
Recall81.8487.1283.7983.6190.7287.25

(5)Precision10050.1930.9532.6029.9227.29
n1 = 0.75Recall48.0550.3866.1266.4871.3276.52

Comparison of the results when using a neural network.

NetworkDay

11–21–31–41–51–6
MLPPrecision10050.1930.9532.6029.9227.29
Recall48.0550.3866.1266.4871.3276.52

CNNPrecision38.9127.7529.5525.1034.7024.71
Recall51.3757.7746.4559.4058.2671.53

LSTMPrecision10052.8026.6322.0326.1129.60
Recall46.4842.8055.0148.4243.7857.86

CNN parameters.

LayerNumber of nodesActivation function
BatchNormalization--
Conv1D256ReLU
GlobalMaxPooling1D--
Dense (output)1Sigmoid

LSTM parameters.

LayerNumber of nodesActivation function
BatchNormalization--
LSTM128-
Dense (output)1Sigmoid

### References

1. Hamari, J, and Lehdonvirta, V (2010). Game design as marketing: how game mechanics create demand for virtual goods. International Journal of Business Science & Applied Management. 5, 9.
2. Lee, S, Kim, K, and Lee, H (2021). Artificial intelligent-based game item purchaser identification system using imbalanced data. Journal of Korean Institute of Intelligent System. 32, 158-163. https://doi.org/10.5391/JKIIS.2021.31.2.158
3. Lee, H, Hong, S, and Kim, E (2009). A new genetic feature selection with neural network ensemble. International Journal of Computer Mathematics. 86, 1105-1117. https://doi.org/10.1080/00207160701724760
4. Lee, H, Hong, S, and Kim, E (2009). Neural network ensemble with probabilistic fusion and its application to gait recognition. Neurocomputing. 72, 1557-1564. https://doi.org/10.1016/j.neucom.2008.09.009
5. Kim, K, Hong, S, Choi, B, and Kim, E (2018). Probabilistic ship detection and classification using deep learning. Applied Sciences. 8. article no 936
6. Kim, J, Hong, S, and Kim, E (2021). Novel on-road vehicle detection system using multi-stage convolutional neural network. IEEE Access. 9, 94371-94385. https://doi.org/10.1109/ACCESS.2021.3093698
7. Hyun, J, Seong, H, and Kim, E (2021). Universal pooling: a new pooling method for convolutional neural networks. Expert Systems with Applications. 180. article no 115084
8. Hong, S, Lee, S, and Lee, H (2020). A study on gait recognition using deep neural network ensemble. The Transactions of the Korean Institute of Electrical Engineers. 69, 1125-1130. https://doi.org/10.5370/KIEE.2020.69.7.1125
9. Adil, M, Javaid, N, Qasim, U, Ullah, I, Shafiq, M, and Choi, JG (2020). LSTM and bat-based RUSBoost approach for electricity theft detection. Applied Sciences. 10. article no 4378
10. Google Play. (2022) . Guardians of the Video Game. Available: http://play.google.com/store/apps/details?id=com.superplanet.videogameguardians
11. Benesty, J, Chen, J, Huang, Y, and Cohen, I (2009). Pearson correlation coefficient. Noise Reduction in Speech Processing. Heidelberg, Germany: Springer, pp. 1-4 https://doi.org/10.1007/978-3-642-00296-0_5
12. Mu, Y, Liu, X, and Wang, L (2018). A Pearson’s correlation coefficient based decision tree and its parallel implementation. Information Sciences. 435, 40-58. https://doi.org/10.1016/j.ins.2017.12.059
13. Jain, S, White, M, and Radivojac, P . Recovering true classifier performance in positive-unlabeled learning., Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017, San Francisco, CA, Array, pp.2066-2072. https://doi.org/10.1609/aaai.v31i1.10937
14. Lee, H, Lee, H, and Kim, E (2014). A new gait recognition system based on hierarchical fair competition-based parallel genetic algorithm and selective neural network ensemble. International Journal of Control, Automation and Systems. 12, 202-207. https://doi.org/10.1007/s12555-012-0154-6