pyg.timeseries.ewma

The ewm functions implement the concept of time which we think is worthwhile explaining. We start with an example

[1]:
from pyg import *; import numpy as np; import pandas as pd
rtn = pd.Series(np.random.normal(0.01,1,5218), drange(2000,2020, '1b'))
price = cumsum(rtn); price.name = 'price'
smooth = ewma(price, 50); smooth.name = 'smooth'
pd.concat([price, smooth], axis = 1).plot()
[1]:
<AxesSubplot:>
../_images/lab_tutorial_ts_ewma_time_1_1.png
[2]:
## now suppose somewhow we lost 4 years of data...
bad = price.copy()
bad[dt(2008):dt(2012)] = np.nan
smooth_bad = ewma(bad, 50); smooth_bad.name = 'smooth_bad'
pd.concat([bad, smooth_bad], axis = 1).plot()
[2]:
<AxesSubplot:>
../_images/lab_tutorial_ts_ewma_time_2_1.png
[8]:
smooth_with_time = ewma(bad, 50, time = 'b')
smooth_with_time.name = 'smooth_with_business_day_clock'
pd.concat([bad, smooth_bad, smooth_with_time], axis = 1).plot()
[8]:
<AxesSubplot:>
../_images/lab_tutorial_ts_ewma_time_3_1.png

What happened here? How can smooth with clock track better? The answer is that if you provide a clock, ewma can recognise that 4 years have passed. The old data is irrelevant, it forgets the old position and start with most of the weight on the more recent observations

What happens if the clock does not move at all?

Suppose we now choose to calculate daily ewma but we are doing this with intraday data. If the number of data points per day is constant and known, then this can be done with ease. Using time parameter, we can do this even for an irregularly spaced timeseries. We first just create some fake data:

[4]:
import datetime
bar = datetime.timedelta(minutes = 5)
all_bars = [t for t in drange(2020, 2021, bar)]
ts = pd.Series(np.random.normal(0.01/24, 1, len(all_bars)), all_bars)
price = cumsum(ts); price.name = 'intraday'

bars = np.array([t for t in drange(2020, 2021, bar) if t.hour>=8 and t.hour<=17]) ## trading hours
irregular = bars[np.random.normal(0,1,len(bars))>0]

ts = pd.Series(np.random.normal(0.01/24, 1, len(bars)), bars)
price = cumsum(ts); price.name = 'intraday'
price = price[irregular] ## remove half the bars randomly

days = drange(2020,2021,1)

daily = price.reindex(days, method = 'ffill'); daily.name = 'daily'
pd.concat([price, daily], axis = 1).ffill().plot()
[4]:
<AxesSubplot:>
../_images/lab_tutorial_ts_ewma_time_5_1.png
  • By setting clock to daily we tell ewma that all the hourly data in the same day are ‘on same clock’. And it will use historic end-of-day prices while updating today the last point until the cloc moves to tomorrow’s reading

  • If we set the clock to fraction, it will update continuously throuout the day

[5]:
smooth_intra = ewma(price, 20, 'f'); smooth_intra.name = 'smooth_intraday' ## roughly matching using irregular bars
smooth_intra_using_d = ewma(price, 20, time = 'd'); smooth_intra_using_d.name = 'smooth_intraday_using_daily_clock'
smooth_daily = ewma(daily, 20); smooth_daily.name = 'smooth_daily'
pd.concat([smooth_daily, smooth_intra_using_d, smooth_intra], axis = 1).ffill()[dt(2020,12,15):].plot()
[5]:
<AxesSubplot:>
../_images/lab_tutorial_ts_ewma_time_7_1.png
  • smooth_daily is calculated on daily basis and is constant within the day and experiences jumps on EOD

  • time = ‘d’ option front-runs daily, but at the price of being more volatile intra-day. On end-of-day the two version aggree

  • time = ‘f’ is a smoother version of daily. It is leading, but not by much

[7]:
pd.concat([smooth_intra_using_d.reindex(days, method = 'ffill'), smooth_daily], axis = 1)
## on end-of-day we have an exact match between time = 'd' and daily smooth
[7]:
smooth_intraday_using_daily_clock smooth_daily
2020-01-01 NaN NaN
2020-01-02 16.752182 16.752182
2020-01-03 9.725194 9.725194
2020-01-04 5.864732 5.864732
2020-01-05 0.578778 0.578778
... ... ...
2020-12-28 280.375383 280.375383
2020-12-29 282.254090 282.254090
2020-12-30 283.934041 283.934041
2020-12-31 285.343263 285.343263
2021-01-01 286.681044 286.681044

367 rows × 2 columns

What are valid time parameters?

  • None: If None is provided, any (non-nan) observation is considered to be a clock ticking

  • i: index of timeseries. The clock ticks also for nan observations. This is the default for pandas

  • f: fraction of day

  • b/d/w/m/q/y: business day/daily/weekly/monthly/quarterly or yearly

  • Calendar: the business day as defined by the calendar provided

  • For full control, you can provide a timeseries of non-decreasing times matching the original array

[ ]: