International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 245-251
Published online September 25, 2022
https://doi.org/10.5391/IJFIS.2022.22.3.245
© The Korean Institute of Intelligent Systems
Jin Hee Yoon^{1} and Seung Hoe Choi^{2}
^{1}Department of Mathematics and Statistics, Sejong University, Seoul, Korea
^{2}Liberal Arts and Sciences, Korea Aerospace University, Goyang, Korea
Correspondence to :
Seung Hoe Choi (shchoi@kau.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study introduces two kinds of Pythagorean exponent and their applications. And we propose the mathematical and statistical analysis for them, and suggest they can be applied to sports data, which is related to the winning percentage. The first one is M_{pe} which is obtained from mathematical analysis, and calculated based on a deterministic model. The second one is a S_{pe} obtained from statistical analysis, and it is derived based on probabilistic model. For the application, we compare the two results using the records of teams participating in Major League Baseball (MLB) from 1901 to the present. The value of M_{pe} calculated by the mathematical method is approximately 2:039 and the value of S_{pe} inferred from the statistical method using the regression model of the winning percentage is 1:854. Furthermore, the value of M_{pe} of the 16 teams which have participated in MLB since 1901 statistically varies depending on post season advances and actual winning percentage, and the sufficiency of the two results compared using mean absolute percentage error (MAPE) and correlation between the predicted and observed winning percentage.
Keywords: Pythagorean exponent, MLB, Winning percentage, Regression, MAPE
In a game where a ball is controlled using the human body, such as soccer, basketball, volleyball, American football, or handball, or in a ball game using equipment such as baseball, table tennis, cricket, water polo, or lacrosse, scores and loss are the basis for victory [1,2]. Since the winner of those games is supposed to get more scores than losses, scoring, and losing points affect the winning percentage. Thus, athletic performance is widely studied by predicting and assessing it [3,4].
A creator of Sabermetrics, Bill James [5] suggested Pythagorean expectation which can be compared with the actual winning percentage of 16 teams in Major League Baseball (MLB). He estimated Pythagorean winning rate
the ratio of runs scored
This study introduces the Pythagorean exponent related to the winning percentage
In order to directly calculate the Pythagorean exponent
Since all of the teams involved in the MLB have recorded a positive number of runs scored every season, their winning percentage, runs scored and runs allowed satisfy
From the above equation, we can directly calculate the Pythagorean exponent
Runs scored and runs allowed are recorded in ball games and these can be considered independent variables. Additionally, a team’s winning percentage can be considered a statistical variable as well [4]. In this sense, a statistical model that runs scored and runs allowed to affect the winning percentage was designed. If runs scored are higher than those allowed in a ball game, then the game can be won. However, this does not mean that higher runs scored directly lead to a higher winning percentage. As low runs allowed is still another key to winning as well, it can be said that runs scored and runs allowed affect winning percentage independently.
From 1901 to 2019, the runs scored and runs allowed by all teams in MLB did not have a statistically linear relationship with one another (
Also, the regression model that indicates the effects that the difference of runs scored and allowed is
where,
Since Cincinnati Red Stockings, the first professional baseball team was founded in 1869, a number of baseball teams have been established, and 30 teams are currently in the MLB. The modern form of the major league began in 1901 with the birth of the American League (AL), and in 1903 the world series between the two champion teams took place for the first time [18,19]. In 1969, when the mound was lowered to 10 inches, the MLB was divided into western and eastern regions, and since 1994 the western, central, and eastern regions were reorganized. This study was carried out using the records of 16 teams including 15 teams that have been participating in the MLB since 1901 and the New York Yankees which joined in 1903. The whole dataset used in this paper was offered by
Table 1 shows the regression coefficients estimated by the least squares method according to the regression model (
Table 1 shows the average runs scored and runs allowed by the MLB teams from 1901 to 2019. The Pythagorean exponent can be calculated using the average of winning percentage, runs scored, and runs allowed. The Pythagorean exponent value (2.039) is calculated using each team’s average runs scored and runs allowed according to
The variables that explains the winning percentage may be statistically related to each other. First, we examined the linear relationship of the calculated Pythagorean exponents according to each team’s runs scored and runs allowed. The correlation between the Pythagorean exponent calculated using the deterministic model and the one estimated using the stochastic model {(
It is meaningful to compare the regression coefficients of runs scored and runs allowed in the regression model (
The calculated Pythagorean exponent
This shows that the winning percentage of each team is predictable using its runs scored and runs allowed and the estimated value of the Pythagorean exponent because the statistical Pythagorean exponent of each team is not affected by its winning percentage and number of post-season advances. Meanwhile, the Pythagorean exponent calculated using numerical analysis statistically differs depending on the team’s grade. Therefore, it might be better if the prediction of a team’s winning percentage using a directly calculated Pythagorean exponent is carried out separately depending on each team’s post-season advance.
Since Bill James [5] suggested the relationship between the Pytha-gorean exponent and the expectation, it is necessary to compare the winning percentage of each team using the
Efficiency of prediction can be explained with two metrics: (1) using an error between predicted value
A metric using the linearity between predicted value
where,
Table 2 shows the efficiency of the predicted values based on the numerical method and the statistical method.
The difference of
This study investigated the relationship between the actual winning percentage and the Pythagorean exponent proposed by baseball statistician Bill James [5]. From 1901 to 2019, the Pythagorean exponent was determined based on the deterministic and stochastic models based on the runs scored and runs allowed of the 16 teams participating in the MLB. As a result, the value calculated by numerical analysis method
Since this paper demonstrated that the statistically estimated Pythagorean expectation can predict winning percentage, it may be meaningful to compare the results of other prediction methods with that of Pythagorean exponent in future studies.
No potential conflict of interest relevant to this article was reported.
Table 1. Coefficient of regression and Pythagorean exponent of each team.
Statistical Pythagorean exponent | Numerical Pythagorean exponent | Post-season advances | |||||||
---|---|---|---|---|---|---|---|---|---|
Team | Average winning percentage | Average runs scored | Average runs allowed | ||||||
Atlanta Braves | 0.486 | 4.180 | 4.310 | 0.964 | 0.664 | 0.742 | 1.844 | 2.016 | 25 |
Baltimore Orioles | 0.474 | 4.309 | 4.594 | 0.957 | 0.748 | 0.948 | 1.898 | 1.901 | 14 |
Boston Redsox | 0.519 | 4.664 | 4.472 | 0.956 | 0.82 | 0.798 | 1.903 | 2.085 | 24 |
Chicago Cubs | 0.505 | 4.351 | 4.292 | 0.941 | 0.65 | 0.845 | 1.808 | 1.197 | 20 |
Chicago Whitesox | 0.502 | 4.358 | 4.340 | 0.94 | 0.847 | 1.043 | 1.746 | 1.889 | 9 |
Cincinnati Reds | 0.500 | 4.304 | 4.326 | 0.911 | 0.741 | 0.788 | 1.859 | 2.239 | 15 |
Cleveland Indians | 0.513 | 4.536 | 4.417 | 0.93 | 0.964 | 0.916 | 1.788 | 1.958 | 14 |
Detroit Tigers | 0.504 | 4.613 | 4.583 | 0.942 | 0.787 | 0.909 | 1.837 | 2.191 | 16 |
LA Dodgers | 0.526 | 4.307 | 4.076 | 0.943 | 0.968 | 0.8 | 1.823 | 1.991 | 33 |
Minnesota Twins | 0.480 | 4.389 | 4.578 | 0.951 | 0.891 | 0.801 | 1.882 | 1.865 | 16 |
New York Yankees | 0.570 | 4.864 | 4.173 | 0.964 | 1.047 | 0.653 | 1.883 | 2.814 | 55 |
Oakland Athletics | 0.488 | 4.433 | 4.568 | 0.965 | 0.715 | 0.758 | 1.915 | 2.477 | 28 |
Philadelphia Phillies | 0.465 | 4.226 | 4.585 | 0.964 | 0.709 | 0.948 | 1.882 | 1.923 | 14 |
Pittsburgh Pirates | 0.508 | 4.362 | 4.274 | 0.941 | 0.735 | 0.791 | 1.875 | 1.848 | 17 |
San Francisco Giants | 0.534 | 4.469 | 4.149 | 0.939 | 0.765 | 0.793 | 1.792 | 1.276 | 26 |
St. Louis Cardinals | 0.521 | 4.442 | 4.249 | 0.94 | 0.855 | 0.743 | 1.818 | 3.297 | 29 |
Table 2. Efficiencies of the predicted values.
Measure | CORR | |||
---|---|---|---|---|
Team | ||||
Atlanta Braves | 4.548 | 4.766 | 0.959 | 0.959 |
Baltimore Orioles | 4.864 | 4.864 | 0.951 | 0.951 |
Boston Redsox | 3.606 | 3.795 | 0.961 | 0.961 |
Chicago Cubs | 4.465 | 6.056 | 0.946 | 0.945 |
Chicago Whitesox | 4.062 | 4.190 | 0.941 | 0.941 |
Cincinnati Reds | 4.503 | 5.024 | 0.923 | 0.923 |
Cleveland Indians | 4.296 | 4.269 | 0.928 | 0.928 |
Detroit Tigers | 4.166 | 4.716 | 0.943 | 0.942 |
LA Dodgers | 3.879 | 3.933 | 0.941 | 0.941 |
Minnesota Twins | 4.036 | 4.042 | 0.955 | 0.955 |
New York Yankees | 3.717 | 7.265 | 0.943 | 0.945 |
Oakland Athletics | 4.414 | 6.669 | 0.970 | 0.970 |
Philadelphia Phillies | 4.878 | 4.885 | 0.952 | 0.952 |
Pittsburgh Pirates | 4.390 | 4.400 | 0.948 | 0.948 |
San Francisco Giants | 4.072 | 5.118 | 0.933 | 0.933 |
St. Louis Cardinals | 4.082 | 9.785 | 0.945 | 0.946 |
MLB League | 4.280 | 5.236 | 0.946 | 0.946 |
^{*}
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 245-251
Published online September 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.3.245
Copyright © The Korean Institute of Intelligent Systems.
Jin Hee Yoon^{1} and Seung Hoe Choi^{2}
^{1}Department of Mathematics and Statistics, Sejong University, Seoul, Korea
^{2}Liberal Arts and Sciences, Korea Aerospace University, Goyang, Korea
Correspondence to:Seung Hoe Choi (shchoi@kau.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study introduces two kinds of Pythagorean exponent and their applications. And we propose the mathematical and statistical analysis for them, and suggest they can be applied to sports data, which is related to the winning percentage. The first one is M_{pe} which is obtained from mathematical analysis, and calculated based on a deterministic model. The second one is a S_{pe} obtained from statistical analysis, and it is derived based on probabilistic model. For the application, we compare the two results using the records of teams participating in Major League Baseball (MLB) from 1901 to the present. The value of M_{pe} calculated by the mathematical method is approximately 2:039 and the value of S_{pe} inferred from the statistical method using the regression model of the winning percentage is 1:854. Furthermore, the value of M_{pe} of the 16 teams which have participated in MLB since 1901 statistically varies depending on post season advances and actual winning percentage, and the sufficiency of the two results compared using mean absolute percentage error (MAPE) and correlation between the predicted and observed winning percentage.
Keywords: Pythagorean exponent, MLB, Winning percentage, Regression, MAPE
In a game where a ball is controlled using the human body, such as soccer, basketball, volleyball, American football, or handball, or in a ball game using equipment such as baseball, table tennis, cricket, water polo, or lacrosse, scores and loss are the basis for victory [1,2]. Since the winner of those games is supposed to get more scores than losses, scoring, and losing points affect the winning percentage. Thus, athletic performance is widely studied by predicting and assessing it [3,4].
A creator of Sabermetrics, Bill James [5] suggested Pythagorean expectation which can be compared with the actual winning percentage of 16 teams in Major League Baseball (MLB). He estimated Pythagorean winning rate
the ratio of runs scored
This study introduces the Pythagorean exponent related to the winning percentage
In order to directly calculate the Pythagorean exponent
Since all of the teams involved in the MLB have recorded a positive number of runs scored every season, their winning percentage, runs scored and runs allowed satisfy
From the above equation, we can directly calculate the Pythagorean exponent
Runs scored and runs allowed are recorded in ball games and these can be considered independent variables. Additionally, a team’s winning percentage can be considered a statistical variable as well [4]. In this sense, a statistical model that runs scored and runs allowed to affect the winning percentage was designed. If runs scored are higher than those allowed in a ball game, then the game can be won. However, this does not mean that higher runs scored directly lead to a higher winning percentage. As low runs allowed is still another key to winning as well, it can be said that runs scored and runs allowed affect winning percentage independently.
From 1901 to 2019, the runs scored and runs allowed by all teams in MLB did not have a statistically linear relationship with one another (
Also, the regression model that indicates the effects that the difference of runs scored and allowed is
where,
Since Cincinnati Red Stockings, the first professional baseball team was founded in 1869, a number of baseball teams have been established, and 30 teams are currently in the MLB. The modern form of the major league began in 1901 with the birth of the American League (AL), and in 1903 the world series between the two champion teams took place for the first time [18,19]. In 1969, when the mound was lowered to 10 inches, the MLB was divided into western and eastern regions, and since 1994 the western, central, and eastern regions were reorganized. This study was carried out using the records of 16 teams including 15 teams that have been participating in the MLB since 1901 and the New York Yankees which joined in 1903. The whole dataset used in this paper was offered by
Table 1 shows the regression coefficients estimated by the least squares method according to the regression model (
Table 1 shows the average runs scored and runs allowed by the MLB teams from 1901 to 2019. The Pythagorean exponent can be calculated using the average of winning percentage, runs scored, and runs allowed. The Pythagorean exponent value (2.039) is calculated using each team’s average runs scored and runs allowed according to
The variables that explains the winning percentage may be statistically related to each other. First, we examined the linear relationship of the calculated Pythagorean exponents according to each team’s runs scored and runs allowed. The correlation between the Pythagorean exponent calculated using the deterministic model and the one estimated using the stochastic model {(
It is meaningful to compare the regression coefficients of runs scored and runs allowed in the regression model (
The calculated Pythagorean exponent
This shows that the winning percentage of each team is predictable using its runs scored and runs allowed and the estimated value of the Pythagorean exponent because the statistical Pythagorean exponent of each team is not affected by its winning percentage and number of post-season advances. Meanwhile, the Pythagorean exponent calculated using numerical analysis statistically differs depending on the team’s grade. Therefore, it might be better if the prediction of a team’s winning percentage using a directly calculated Pythagorean exponent is carried out separately depending on each team’s post-season advance.
Since Bill James [5] suggested the relationship between the Pytha-gorean exponent and the expectation, it is necessary to compare the winning percentage of each team using the
Efficiency of prediction can be explained with two metrics: (1) using an error between predicted value
A metric using the linearity between predicted value
where,
Table 2 shows the efficiency of the predicted values based on the numerical method and the statistical method.
The difference of
This study investigated the relationship between the actual winning percentage and the Pythagorean exponent proposed by baseball statistician Bill James [5]. From 1901 to 2019, the Pythagorean exponent was determined based on the deterministic and stochastic models based on the runs scored and runs allowed of the 16 teams participating in the MLB. As a result, the value calculated by numerical analysis method
Since this paper demonstrated that the statistically estimated Pythagorean expectation can predict winning percentage, it may be meaningful to compare the results of other prediction methods with that of Pythagorean exponent in future studies.
Table 1 . Coefficient of regression and Pythagorean exponent of each team.
Statistical Pythagorean exponent | Numerical Pythagorean exponent | Post-season advances | |||||||
---|---|---|---|---|---|---|---|---|---|
Team | Average winning percentage | Average runs scored | Average runs allowed | ||||||
Atlanta Braves | 0.486 | 4.180 | 4.310 | 0.964 | 0.664 | 0.742 | 1.844 | 2.016 | 25 |
Baltimore Orioles | 0.474 | 4.309 | 4.594 | 0.957 | 0.748 | 0.948 | 1.898 | 1.901 | 14 |
Boston Redsox | 0.519 | 4.664 | 4.472 | 0.956 | 0.82 | 0.798 | 1.903 | 2.085 | 24 |
Chicago Cubs | 0.505 | 4.351 | 4.292 | 0.941 | 0.65 | 0.845 | 1.808 | 1.197 | 20 |
Chicago Whitesox | 0.502 | 4.358 | 4.340 | 0.94 | 0.847 | 1.043 | 1.746 | 1.889 | 9 |
Cincinnati Reds | 0.500 | 4.304 | 4.326 | 0.911 | 0.741 | 0.788 | 1.859 | 2.239 | 15 |
Cleveland Indians | 0.513 | 4.536 | 4.417 | 0.93 | 0.964 | 0.916 | 1.788 | 1.958 | 14 |
Detroit Tigers | 0.504 | 4.613 | 4.583 | 0.942 | 0.787 | 0.909 | 1.837 | 2.191 | 16 |
LA Dodgers | 0.526 | 4.307 | 4.076 | 0.943 | 0.968 | 0.8 | 1.823 | 1.991 | 33 |
Minnesota Twins | 0.480 | 4.389 | 4.578 | 0.951 | 0.891 | 0.801 | 1.882 | 1.865 | 16 |
New York Yankees | 0.570 | 4.864 | 4.173 | 0.964 | 1.047 | 0.653 | 1.883 | 2.814 | 55 |
Oakland Athletics | 0.488 | 4.433 | 4.568 | 0.965 | 0.715 | 0.758 | 1.915 | 2.477 | 28 |
Philadelphia Phillies | 0.465 | 4.226 | 4.585 | 0.964 | 0.709 | 0.948 | 1.882 | 1.923 | 14 |
Pittsburgh Pirates | 0.508 | 4.362 | 4.274 | 0.941 | 0.735 | 0.791 | 1.875 | 1.848 | 17 |
San Francisco Giants | 0.534 | 4.469 | 4.149 | 0.939 | 0.765 | 0.793 | 1.792 | 1.276 | 26 |
St. Louis Cardinals | 0.521 | 4.442 | 4.249 | 0.94 | 0.855 | 0.743 | 1.818 | 3.297 | 29 |
Table 2 . Efficiencies of the predicted values.
Measure | CORR | |||
---|---|---|---|---|
Team | ||||
Atlanta Braves | 4.548 | 4.766 | 0.959 | 0.959 |
Baltimore Orioles | 4.864 | 4.864 | 0.951 | 0.951 |
Boston Redsox | 3.606 | 3.795 | 0.961 | 0.961 |
Chicago Cubs | 4.465 | 6.056 | 0.946 | 0.945 |
Chicago Whitesox | 4.062 | 4.190 | 0.941 | 0.941 |
Cincinnati Reds | 4.503 | 5.024 | 0.923 | 0.923 |
Cleveland Indians | 4.296 | 4.269 | 0.928 | 0.928 |
Detroit Tigers | 4.166 | 4.716 | 0.943 | 0.942 |
LA Dodgers | 3.879 | 3.933 | 0.941 | 0.941 |
Minnesota Twins | 4.036 | 4.042 | 0.955 | 0.955 |
New York Yankees | 3.717 | 7.265 | 0.943 | 0.945 |
Oakland Athletics | 4.414 | 6.669 | 0.970 | 0.970 |
Philadelphia Phillies | 4.878 | 4.885 | 0.952 | 0.952 |
Pittsburgh Pirates | 4.390 | 4.400 | 0.948 | 0.948 |
San Francisco Giants | 4.072 | 5.118 | 0.933 | 0.933 |
St. Louis Cardinals | 4.082 | 9.785 | 0.945 | 0.946 |
MLB League | 4.280 | 5.236 | 0.946 | 0.946 |
^{*}