Codebook 1
Run this first, then you can skip to other parts
import statsmodels.formula.api as smf
import pandas as pd
df = pd.read_csv('../bdm2s2_nation_year_data_may2002.csv')
df = df.sort_index(by=['ccode','year'],ascending=True)
# take out all W,S effects from Polity's 'democracy', the residuals
# are what remains in 'democracy' after W,S is accounted for
resdem = smf.ols('democ ~ W + S', data=df).fit()
df['demres'] = resdem.resid
Growth and lagged change in W
If ccode at t-2 is different for current, than delta W for this instance makes no sense, drop them.
df2 = df.copy()
df2['polchange'] = (df2.W-df2.shift(2).W)**2
df2['DW20'] = df2.W-df2.shift(2).W
df2['dummy'] = df2.ccode-df2.shift(2).ccode
df2 = df2[df2.dummy == 0]
print len(df2)
19059
import statsmodels.formula.api as smf
results = smf.ols('WB_growth ~ W + S + DW20 + polchange + np.log(pop)', data=df2).fit()
print results.summary()
OLS Regression Results
==============================================================================
Dep. Variable: WB_growth R-squared: 0.005
Model: OLS Adj. R-squared: 0.003
Method: Least Squares F-statistic: 3.315
Date: Tue, 25 Aug 2015 Prob (F-statistic): 0.00545
Time: 20:18:07 Log-Likelihood: -10866.
No. Observations: 3457 AIC: 2.174e+04
Df Residuals: 3451 BIC: 2.178e+04
Df Model: 5
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept 4.6960 0.550 8.532 0.000 3.617 5.775
W 0.3292 0.382 0.861 0.390 -0.421 1.079
S -0.7225 0.341 -2.121 0.034 -1.390 -0.055
DW20 1.8791 0.646 2.910 0.004 0.613 3.145
polchange -2.4816 1.047 -2.371 0.018 -4.534 -0.429
np.log(pop) -0.0261 0.056 -0.468 0.640 -0.136 0.083
==============================================================================
Omnibus: 408.738 Durbin-Watson: 1.512
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4176.816
Skew: -0.006 Prob(JB): 0.00
Kurtosis: 8.385 Cond. No. 99.7
==============================================================================
%load_ext rpy2.ipython
%R library(lme4)
import pandas as pd
df['Klepto'] = (df['TAXGDP']-df['Expenditure']).abs()
df = df[['regyr','ccode','aid_gdp','lrgdpc','pop','year','W','S','Klepto','laglrgdpc']]
import statsmodels.formula.api as smf
results = smf.ols('Klepto ~ W + S + aid_gdp + np.log(pop)', data=df).fit()
print results.summary()
OLS Regression Results
==============================================================================
Dep. Variable: Klepto R-squared: 0.205
Model: OLS Adj. R-squared: 0.202
Method: Least Squares F-statistic: 65.82
Date: Mon, 06 Apr 2015 Prob (F-statistic): 1.43e-49
Time: 20:29:15 Log-Likelihood: -3297.9
No. Observations: 1027 AIC: 6606.
Df Residuals: 1022 BIC: 6630.
Df Model: 4
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept 4.8089 1.223 3.933 0.000 2.410 7.208
W -3.9128 0.879 -4.453 0.000 -5.637 -2.189
S 3.9783 0.675 5.892 0.000 2.653 5.303
aid_gdp 39.3581 2.858 13.770 0.000 33.750 44.967
np.log(pop) -0.1965 0.115 -1.702 0.089 -0.423 0.030
==============================================================================
Omnibus: 510.462 Durbin-Watson: 0.657
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3672.715
Skew: 2.179 Prob(JB): 0.00
Kurtosis: 11.176 Cond. No. 141.
==============================================================================
import statsmodels.formula.api as smf
results = smf.ols('lrgdpc ~ W + S + aid_gdp + np.log(pop) + laglrgdpc', data=df).fit()
print results.summary()
OLS Regression Results
==============================================================================
Dep. Variable: lrgdpc R-squared: 0.993
Model: OLS Adj. R-squared: 0.993
Method: Least Squares F-statistic: 7.123e+04
Date: Mon, 06 Apr 2015 Prob (F-statistic): 0.00
Time: 22:08:39 Log-Likelihood: 3221.6
No. Observations: 2541 AIC: -6431.
Df Residuals: 2535 BIC: -6396.
Df Model: 5
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept 0.0560 0.020 2.861 0.004 0.018 0.094
W 0.0262 0.007 3.992 0.000 0.013 0.039
S -0.0086 0.004 -1.916 0.055 -0.017 0.000
aid_gdp -0.0706 0.021 -3.339 0.001 -0.112 -0.029
np.log(pop) -0.0012 0.001 -1.323 0.186 -0.003 0.001
laglrgdpc 0.9954 0.002 482.672 0.000 0.991 0.999
==============================================================================
Omnibus: 476.268 Durbin-Watson: 1.702
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4091.930
Skew: -0.635 Prob(JB): 0.00
Kurtosis: 9.086 Cond. No. 221.
==============================================================================
%R -i df
%R resp_lmer <- lmer(Klepto ~ W + S + log(pop) + aid_gdp + ( 1 | regyr), data = df)
%R -o res res = summary(resp_lmer)
print res
Linear mixed model fit by REML ['lmerMod']
Formula: Klepto ~ W + S + log(pop) + aid_gdp + (1 | regyr)
Data: df
REML criterion at convergence: 6569.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.9820 -0.6057 -0.1908 0.3338 7.5056
Random effects:
Groups Name Variance Std.Dev.
regyr (Intercept) 5.245 2.290
Residual 32.296 5.683
Number of obs: 1027, groups: regyr, 122
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.1930 1.2722 4.868
W -4.0960 0.8626 -4.748
S 3.5409 0.6541 5.414
log(pop) -0.2682 0.1175 -2.284
aid_gdp 42.9981 2.9760 14.448
Correlation of Fixed Effects:
(Intr) W S lg(pp)
W -0.256
S -0.171 -0.651
log(pop) -0.903 0.195 -0.033
aid_gdp -0.417 0.071 0.043 0.329
%R -i df
%R resp_lmer <- lmer(lrgdpc ~ W + S + log(pop) + aid_gdp + laglrgdpc + ( 1 | regyr), data = df)
%R -o res res = summary(resp_lmer)
print res
Linear mixed model fit by REML ['lmerMod']
Formula: lrgdpc ~ W + S + log(pop) + aid_gdp + laglrgdpc + (1 | regyr)
Data: df
REML criterion at convergence: -6453.3
Scaled residuals:
Min 1Q Median 3Q Max
-9.6023 -0.4385 0.0367 0.5086 4.5498
Random effects:
Groups Name Variance Std.Dev.
regyr (Intercept) 0.0003723 0.01930
Residual 0.0042945 0.06553
Number of obs: 2541, groups: regyr, 217
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.0593032 0.0229343 2.6
W 0.0184700 0.0065761 2.8
S -0.0042964 0.0043957 -1.0
log(pop) -0.0017439 0.0009528 -1.8
aid_gdp -0.0347754 0.0226003 -1.5
laglrgdpc 0.9957743 0.0023887 416.9
Correlation of Fixed Effects:
(Intr) W S lg(pp) ad_gdp
W 0.152
S -0.138 -0.644
log(pop) -0.678 0.086 -0.044
aid_gdp -0.580 -0.058 0.044 0.453
laglrgdpc -0.918 -0.296 0.122 0.370 0.472
Construction
import statsmodels.formula.api as smf
results = smf.ols('build ~ W + np.log(pop) + rgdpch + demres', data=df).fit()
print results.summary()
OLS Regression Results
==============================================================================
Dep. Variable: build R-squared: 0.047
Model: OLS Adj. R-squared: 0.045
Method: Least Squares F-statistic: 19.00
Date: Thu, 03 Sep 2015 Prob (F-statistic): 2.91e-15
Time: 09:54:14 Log-Likelihood: -7501.5
No. Observations: 1534 AIC: 1.501e+04
Df Residuals: 1529 BIC: 1.504e+04
Df Model: 4
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
-------------------------------------------------------------------------------
Intercept 109.3877 5.666 19.306 0.000 98.274 120.502
W -10.4276 3.533 -2.952 0.003 -17.357 -3.498
np.log(pop) 0.8733 0.552 1.583 0.114 -0.209 1.956
rgdpch 0.0018 0.000 7.663 0.000 0.001 0.002
demres -0.1573 0.084 -1.880 0.060 -0.321 0.007
==============================================================================
Omnibus: 444.538 Durbin-Watson: 0.092
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1421.627
Skew: 1.436 Prob(JB): 1.98e-309
Kurtosis: 6.741 Cond. No. 5.21e+04
==============================================================================