thirdwave

Github Mirror

Patent Codebook

File 1, File 2, File 3, File 4, File 5, File 6, File 7

TFP is total factor productivity - the portion of output not explained by the amount of inputs used in production. As such, its level is determined by how efficiently and intensely the inputs are utilized in production.

Needs Python 3

Do pip install linearmodels

import pandas as pd
dfcomp = pd.read_stata('compinn_BLS.dta')
dfpi = pd.read_stata('CapitalRentalPriceIndex2000.dta')
dfcs = pd.read_stata('PatentsCompustatImportsRPI.dta')
print (dfpi.columns)
print (dfcomp.columns)
print (dfcs.columns)
Index([u'naics', u'yr1987', u'yr1988', u'yr1989', u'yr1990', u'yr1991',
       u'yr1992', u'yr1993', u'yr1994', u'yr1995', u'yr1996', u'yr1997',
       u'yr1998', u'yr1999', u'yr2000', u'yr2001', u'yr2002', u'yr2003',
       u'yr2004', u'yr2005', u'yr2006', u'yr2007'],
      dtype='object')
Index([u'NAICS', u'year', u'lab_hrs', u'cap', u'cap_sh', u'cap_ind', u'mat',
       u'mat_sh', u'def', u'lab', u'lab_sh', u'tfp', u'lab_pr', u'out_ind',
       u'output', u'import', u'imp_problem', u'cap_stock', u'imp_pen',
       u'impdef', u'lnimp', u'lnimpdef', u'lntfp', u'lnlab_pr'],
      dtype='object')
Index([u'year', u'conm', u'oiadp', u'oibdp', u'ppegt', u'ppent', u'sale',
       u'xad', u'xrd', u'sich', u'sic', u'gvkey', u'allpats', u'allcites',
       u'allcites_cor', u'allnscites', u'allnscites_cor', u'gvkeyag',
       u'gallpats', u'gallcites', u'gallcites_cor', u'gallnscites',
       u'gallnscites_cor', u'gmtchflg', u'sic4', u'imports', u'merge_comp_imp',
       u'emp', u'pay', u'prode', u'prodh', u'prodw', u'vship', u'matcost',
       u'vadd', u'invest', u'invent', u'energy', u'cap', u'equip', u'plant',
       u'piship', u'pimat', u'piinv', u'pien', u'dtfp5', u'tfp5', u'dtfp4',
       u'tfp4', u'share', u'merge_compimp_nber', u'naics6', u'naics4',
       u'naics3', u'naics2', u'crp4', u'crp3', u'crp2', u'crp_index'],
      dtype='object')
dfcs2 = dfcs[['year','allpats','tfp4','naics4']].dropna()
from linearmodels import PanelOLS
dfcs3 = dfcs2.set_index(['year','naics4'])
mod = PanelOLS(dfcs3.tfp4, dfcs3[['allpats']], entity_effects=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
print (res)
                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                   tfp4   R-squared:                        0.0058
Estimator:                   PanelOLS   R-squared (Between):              0.0439
No. Observations:               57314   R-squared (Within):               0.0058
Date:                Mon, Oct 15 2018   R-squared (Overall):              0.0140
Time:                        21:55:07   Log-likelihood                -1.569e+05
Cov. Estimator:             Clustered                                           
                                        F-statistic:                      334.66
Entities:                          37   P-value                           0.0000
Avg Obs:                       1549.0   Distribution:                 F(1,57276)
Min Obs:                       76.000                                           
Max Obs:                       2328.0   F-statistic (robust):             18.063
                                        P-value                           0.0000
Time periods:                      81   Distribution:                 F(1,57276)
Avg Obs:                       707.58                                           
Min Obs:                       2.0000                                           
Max Obs:                       8456.0                                           

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
allpats        0.0044     0.0010     4.2501     0.0000      0.0023      0.0064
==============================================================================

F-test for Poolability: 114.44
P-value: 0.0000
Distribution: F(36,57276)
print (dfcs3.allpats.mean())
print (dfcs3.tfp4.mean())
print (dfcs3.tfp4.std())
10.96653522699515
1.6352721
3.8797197