thirdwave

Github Mirror

Temperature Increase

The feared 1.5 C increase is reported relative to a global mean temperature between years 1850-1890 which was roughly 13.6°C.

Berkeley Data

The most basic plot looks at Earth's average temparature. We use data from Berkeley, this data is as raw as its gets, looked at the "Land + Ocean (1850 – Recent)" section and used the "Monthly Global Average Temperature (annual summary)" data.

import pandas as pd

anoms = [12.23, 12.45, 13.06, 13.98, 14.95, 15.67, 15.96, 15.79, 15.20, 14.26, 13.24, 12.50]
anoms = np.array(anoms)
df = pd.read_csv('http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt',delim_whitespace=True,comment='%',header=None)
df = df[[0,1,2]]
df.columns = ['year','month','anom']
# data reprsents temp as a combo of base val plus an 'anomaly'
df['temp'] = df.anom + anoms[df.month.astype(int) - 1]
df['dt'] = df.apply(lambda x: pd.to_datetime("%d-%02d-01" %(x.year,x.month)), axis=1)
df1 = df.set_index('dt')
df1 = df1.rolling(70).mean()
df1 = df1.dropna()
df1 = df1[df1.index > '1901-01-01']  

df1.temp.plot()
print (df1.tail(5)[['temp','tempyoy']])
df1.temp.tail(1).plot(style='r.',markersize=10)
df1['tempyoy'] = (df1.temp - df1.temp.shift(12)) / df1.temp.shift(12) * 100.0
df1['temp'].to_csv('global_temperature.csv')
plt.savefig('berkeley-temp.png')
                 temp   tempyoy
dt                             
2022-08-01  14.927229 -0.137335
2022-09-01  14.955557 -0.126215
2022-10-01  14.981657 -0.100308
2022-11-01  14.991786 -0.106611
2022-12-01  14.988243 -0.097030

Increase is quite visible. The Berkeley dataset is updated monthly.

Let's continue with statistical tests. Is the temparature increase real? There are many methods we can use here; Within the context of pure time series methods, the best chance of climate change deniers had was arguing that temperature time series data could be random walk. Example for RW is stock price movement where values "can go up or down, in an unpredictable fashion". So deniers try to chalk up the temperature increase in the past 70 years to this kind of movement. They cannot even argue variations of random walk BTW, it has to be pure random walk, if there is a drift, or a trend there (indicating up movement) they are screwed.

It turns out they are statistical tests for that. We used dataset from GISS, temperature anomalies between 1880-2010 are captured here (anomalies are relative to the 1951-80 base period means). We used ADF test, the implementation we used is able to test for combination of hypothesis' -- pure RW, no pure RW no drift no trend, etc. Well, the test shows GISS data is not pure random walk, it is random walk with a trend (trend stationarity).

At this point deniers can try to shift gears and argue for mean-reversion (opposite of random walk), "what goes up must come down and vice versa" surely there is a certain amount of mean-reversion in this data. Traders love mean-reversion by the way, and if you could trade on temperature data wouldn't you? Hell yeah! Buy low in the winter, sell high in the summer. But the deniers can never prove full-mean reversion on this data, and since the ADF test blew through all threshold values all indicators are screaming out a trend.

There are other methods as well, such as cointegration, that took care of the attribution part of the equation. That final analysis was the one that truly sealed the deal. It is game-over for the deniers.

ACF / PACF

Here we use temparature anomaly data from GISS, anomalies are relative to the 1951-80 base period means.

import statsmodels.api as sm
import pandas as pd
dfclim = pd.read_csv('climate-giss.csv',index_col=0,parse_dates=True)
plt.hold(True)
sm.graphics.tsa.plot_acf(dfclim.Temp.values.squeeze(), lags=50)
plt.savefig('climate_02.png')
plt.hold(False)

import statsmodels.api as sm
plt.hold(True)
sm.graphics.tsa.plot_pacf(dfclim.Temp, lags=50)
plt.savefig('climate_03.png')
plt.hold(False)

Dickey Fuller Tests

%load_ext rpy2.ipython
%R library(urca)
series = dfclim.Temp
%R -i series
%R  adf <- ur.df(series, type = 'trend',selectlags="AIC")
%R -o adfout adfout <- summary(adf)
print adfout

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression trend 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-80.967  -9.868   0.214  10.370  85.279 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -10.578706   1.163642  -9.091   <2e-16 ***
z.lag.1      -0.277517   0.020894 -13.282   <2e-16 ***
tt            0.014425   0.001428  10.105   <2e-16 ***
z.diff.lag   -0.224780   0.024657  -9.116   <2e-16 ***
---

Residual standard error: 16.81 on 1562 degrees of freedom
Multiple R-squared:  0.2205,    Adjusted R-squared:  0.219 
F-statistic: 147.3 on 3 and 1562 DF,  p-value: < 2.2e-16


Value of test-statistic is: -13.2819 58.8243 88.2169 

Critical values for test statistics: 
      1pct  5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2  6.09  4.68  4.03
phi3  8.27  6.25  5.34

import statsmodels.api as sm
arima_mod1 = sm.tsa.ARIMA(dfclim.Temp, (1,1,2)).fit()
print arima_mod1.aic
13238.7766703
arima_mod2 = sm.tsa.ARIMA(df.Temp, (2,0,1)).fit()
print arima_mod2.aic
13277.4918251
import statsmodels.api as sm
dfclim = pd.read_csv('climate.csv',sep='\s*',header=None,names=['Temp'])
df = dfclim.copy()
df.index = pd.Index(sm.tsa.datetools.dates_from_range('1880m1', '2010m8'))
df.to_csv('climate2.csv')
print dfclim.head(4)
   Temp
0   -42
1   -17
2   -21
3   -33
from numpy import log, polyfit, sqrt, std, subtract

def hurst(ts):
    lags = range(2, 100)
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
    poly = polyfit(log(lags), log(tau), 1)
    return poly[0]*2.0

print hurst(dfclim.Temp)
0.040611546891

Carbon Levels In the the Atmosphere

Units are mole fraction in ppm, per month

import urllib.request as urllib2, io
import pandas as pd

url = "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt"
r = urllib2.urlopen(url).read()
file = io.BytesIO(r)
df = pd.read_csv(file,comment='#',header=None,sep='\s*')
df['Date'] = df.apply(lambda x: str(int(x[0])) + "-" + str(int(x[1])) + "-1", axis=1)
df['Date'] = pd.to_datetime(df.Date)
df['ppm'] = df[3]
df = df.set_index('Date')
df[df.index > "2018-01-01"]['ppm'].rolling(10).mean().plot(title='Carbon Level (mole fraction in ppm)')
plt.savefig('carbon.png')
df['ppm'].rolling(10).mean().plot(color='blue',title='Carbon Levels (mole fraction in ppm)')
print (df.ppm.tail(5))
plt.savefig('carbon2.png')
Date
2023-11-01    420.46
2023-12-01    421.86
2024-01-01    422.80
2024-02-01    424.55
2024-03-01    425.38
Name: ppm, dtype: float64

Longer time span, since the 50s

Monthly Carbon YoY Increase

df['ppmyoy'] = (df.ppm - df.ppm.shift(12)) / df.ppm.shift(12) * 100.0
print (df['ppmyoy'].dropna().head(4))
print (df['ppmyoy'].tail(4))
print (df['ppmyoy'].dropna().mean())
Date
1959-03-01    0.300919
1959-04-01    0.085053
1959-05-01    0.245662
1959-06-01    0.286849
Name: ppmyoy, dtype: float64
Date
2023-12-01    0.684981
2024-01-01    0.791456
2024-02-01    1.011182
2024-03-01    1.042780
Name: ppmyoy, dtype: float64
0.44902725303018143
df['ppmyoy'].plot(title="Carbon Level Increase (YoY)")
plt.savefig('carbon3.png')

Methane Levels In the the Atmosphere

Data comes from NOAA units are nanomol/mol, abbreviated as ppb.

cols = ['year','month','decimal','average','average_unc','trend','trend_unc']
url = "https://gml.noaa.gov/webdata/ccgg/trends/ch4/ch4_mm_gl.txt"
import pandas as pd
df = pd.read_csv(url,comment='#',sep='\s*',header=None)
df.columns = cols
df = df.set_index('decimal')
df['dt'] = df.apply(lambda x: pd.to_datetime("%d-%02d-01" %(x.year,x.month)), axis=1)
df[['dt','average']].to_csv('methane.csv',index=None)
df1 = df.set_index('dt')
df1.average.plot(title='Methane Level (nanomol/mol)')
print (df1.average.tail(6))
plt.legend(['nanomol/mol (ppb)'])
plt.savefig('methane.png')
dt
2023-07-01    1914.10
2023-08-01    1918.07
2023-09-01    1925.64
2023-10-01    1931.03
2023-11-01    1932.05
2023-12-01    1932.23
Name: average, dtype: float64

Carbon and Temperature

Plotted carbon levels in the atmosphere and global temperature, trying to gauge a relation between the two. Carbon data comes from here.

import urllib.request as urllib2, io
import pandas as pd

url = "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt"
r = urllib2.urlopen(url).read()
file = io.BytesIO(r)
df = pd.read_csv(file,comment='#',header=None,sep='\s*')
df['dt'] = df.apply(lambda x: pd.to_datetime("%d-%02d-01" %(x[0],x[1])), axis=1)
df[['dt',3]].to_csv('carbon.csv',index=None)
g1 = df.groupby(0)[3].mean()
dfc = pd.read_csv('climate-giss.csv',index_col=0,parse_dates=True)
dfc['year'] = dfc.apply(lambda x: x.name.year,axis=1)
dfc['mon'] = dfc.apply(lambda x: x.name.month,axis=1)
dfc['TempC'] = ((dfc.Temp-32)*5.0)/9.0
g2 = dfc.groupby('year')['TempC'].mean()
g = pd.concat([g1, g2], axis=1).dropna()
g.columns = ['Carbon','Temparature']

ax1 = g['Carbon'].plot(color='blue', grid=True, label='Carbon (ppm)')
ax2 = g['Temparature'].plot(color='red', grid=True, label='Temparature (C)',secondary_y=True)
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
plt.legend(h1+h2, l1+l2, loc=2)
plt.savefig('carbontemp.png')

Strong correlation, but does that mean causation?

Running a Granger causality test, which tries to reject the hypothesis that second time series (carbon) does not cause the first series (temperature).

import statsmodels.tsa.stattools as t
res = t.grangercausalitytests(g[['Temparature','Carbon']],maxlag=3)
print (res)

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=34.1256 , p=0.0000  , df_denom=49, df_num=1
ssr based chi2 test:   chi2=36.2149 , p=0.0000  , df=1
likelihood ratio test: chi2=27.4837 , p=0.0000  , df=1
parameter F test:         F=34.1256 , p=0.0000  , df_denom=49, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=11.4685 , p=0.0001  , df_denom=46, df_num=2
ssr based chi2 test:   chi2=25.4301 , p=0.0000  , df=2
likelihood ratio test: chi2=20.6321 , p=0.0000  , df=2
parameter F test:         F=11.4685 , p=0.0001  , df_denom=46, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=5.3519  , p=0.0032  , df_denom=43, df_num=3
ssr based chi2 test:   chi2=18.6695 , p=0.0003  , df=3
likelihood ratio test: chi2=15.8641 , p=0.0012  , df=3
parameter F test:         F=5.3519  , p=0.0032  , df_denom=43, df_num=3
{1: ({'ssr_ftest': (34.12560659900744, 4.0997667759058264e-07, 49.0, 1), 'ssr_chi2test': (36.214929452007894, 1.7671160136200755e-09, 1), 'lrtest': (27.483689910062537, 1.584249217215468e-07, 1), 'params_ftest': (34.12560659900759, 4.099766775905632e-07, 49.0, 1.0)}, [<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b4946518>, <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b4946630>, array([[0., 1., 0.]])]), 2: ({'ssr_ftest': (11.468479812558117, 9.099787320501968e-05, 46.0, 2), 'ssr_chi2test': (25.43010741045496, 3.005538798970397e-06, 2), 'lrtest': (20.632104155556192, 3.309752413988568e-05, 2), 'params_ftest': (11.468479812556222, 9.099787320513459e-05, 46.0, 2.0)}, [<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b4946898>, <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b49469b0>, array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])]), 3: ({'ssr_ftest': (5.351917020399335, 0.003196324450636187, 43.0, 3), 'ssr_chi2test': (18.669477978137216, 0.0003199702464963456, 3), 'lrtest': (15.864090764221487, 0.0012091064117637019, 3), 'params_ftest': (5.351917020398325, 0.0031963244506395256, 43.0, 3.0)}, [<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b4946c50>, <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f54b4946d68>, array([[0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0.]])])}

The hypothesis is rejected at a very strong level. Carbon content in atmo did cause a global increase in temperature.

Carbon, Methane Effects

Carbon or methane.. which one caused more global temperature increase? Below is the correlation matrix between all three time series', temperature, carbon, and methane levels.

import pandas as pd

df1 = pd.read_csv('global_temperature.csv',index_col='dt',parse_dates=True)
df1 = df1.rolling(50).mean()
df2 = pd.read_csv('carbon.csv',index_col='dt',parse_dates=True)
df3 = pd.read_csv('methane.csv',index_col='dt',parse_dates=True)

df4 = pd.merge(df1,df2,left_index=True, right_index=True,how='left')
df5 = pd.merge(df4,df3,left_index=True, right_index=True,how='left')
df5 = df5.interpolate(method='linear',limit_direction='backward')
df5.columns = ['temp','carbon','methane']

print (df5.corr())
             temp    carbon   methane
temp     1.000000  0.497317  0.489216
carbon   0.497317  1.000000  0.974705
methane  0.489216  0.974705  1.000000

Carbon and methane equally contributed to global warming.