The Power of Nations Code
Checking if Michale Beckley's GDP x GDP Per Capita measure can predict war outcomes. Excerpts from the article are here. Data comes from Harvard Dataverse
reiterwars.tab
has wars from past few centuries, power1.tab
carries GDP
, CINC
, total population tpop
and MB's measure
y
. We could reconstruct the measure from GDP
and tpop
(double-checked, it works, same as y
). Code was based on Stata wars
do.do
code in the same repo.
My zipped version of the data is here
In order to run the regression, MB creates fractions of each measure,
to compare two sides of the measure, through a value between 0 and 1.
These fractions can now 'decide' the outcome of the war, win
1 or 0,
whether the initiator (side a) won the conflict or not. The fractions
are in the form of,
$$ y_{frac} = \frac{y_A}{y_A + y_B} $$
Same approach is used for CINC and GDP.
import pandas as pd
dfp = pd.read_csv('power1.tab',sep='\t')
dfw = pd.read_csv('reiterwars.tab',sep='\t')
dfw = dfw[dfw.joiner==0]
# join in the reference data, twice, once for side a, other for side b
dfj1 = dfw.merge(dfp, left_on=['year','init_ccode'], right_on=['year','ccode'],how='left')
dfj1['tpopa'] = dfj1['tpop']
dfj1['cinca'] = dfj1['cinc']
dfj1['gdpa'] = dfj1['gdp']
dfj1['ya'] = dfj1['y']
cols = ['init_ccode','init_name','larger_war_name','target_ccode','target_name','year','tpopa','cinca','gdpa','ya','annual_outcome']
dfj1 = dfj1[cols]
dfj2 = dfj1.merge(dfp, left_on=['year','target_ccode'], right_on=['year','ccode'],how='left')
dfj2['tpopb'] = dfj2['tpop']
dfj2['cincb'] = dfj2['cinc']
dfj2['gdpb'] = dfj2['gdp']
dfj2['yb'] = dfj2['y']
dfj2 = dfj2[cols + ['tpopb','gdpb','cincb','yb']]
dfj2.loc[dfj2.annual_outcome==1,'win'] = 1
dfj2.loc[dfj2.annual_outcome==2,'win'] = 0
dfj2 = dfj2[dfj2.annual_outcome != 0]
dfj2['cincfrac']=dfj2.cinca/(dfj2.cinca+dfj2.cincb)
dfj2['gdpfrac']=dfj2.gdpa/(dfj2.gdpa+dfj2.gdpb)
dfj2['yfrac']=dfj2.ya/(dfj2.ya+dfj2.yb)
dfj2.to_csv('beckley-wars.csv')
import statsmodels.formula.api as smf
results = smf.ols('win ~ cincfrac', data=dfj2).fit()
print ('%0.2f' % results.rsquared)
results = smf.ols('win ~ gdpfrac', data=dfj2).fit()
print ('%0.2f' % results.rsquared)
results = smf.ols('win ~ yfrac', data=dfj2).fit()
print ('%0.2f' % results.rsquared)
0.07
0.12
0.26
$R^2$ of the regression that predicts war outcome using the new measure is 0.26 (highest score being 1), better than CINC or GDP.
Preprocessed data for the regression is here.
Additional Metrics
import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix
df = pd.read_csv('beckley-wars.csv')
df = df.dropna()
predicted = df.yfrac > 0.5
report = classification_report(df.win, predicted)
print(report)
print (confusion_matrix(df.win, predicted))
precision recall f1-score support
0.0 0.60 0.60 0.60 42
1.0 0.83 0.83 0.83 99
accuracy 0.76 141
macro avg 0.71 0.71 0.71 141
weighted avg 0.76 0.76 0.76 141
[[25 17]
[17 82]]