pyg.timeseries decorators¶
There are a few decorators that are relevant to timeseries analysis ## pd2np and compiled We write most of our underlying functions assuming the function parameters are 1-d numpy arrays. If you want them numba.jit compiled, please use the compiled operator.
[1]:
from pyg import *
import pandas as pd; import numpy as np
@pd2np
@compiled
def sumsq(a, total = 0.0):
res = np.empty_like(a)
for i in range(a.shape[0]):
if np.isnan(a[i]):
res[i] = np.nan
else:
total += a[i]**2
res[i] = total
return res
It is not surpising that sumsq works for arrays. Notice how np.isnan is handled to ensure nans are skipped.
[2]:
a = np.arange(5)
sumsq(a)
[2]:
array([ 0, 1, 5, 14, 30])
pd2np will convert a pandas Series to arrays, run the function and convert back to pandas. This will only work for a 1-dimensional objects, so no df nor 2-d np.ndarray.
[3]:
s = pd.Series(a, drange(-4))
sumsq(s)
[3]:
2021-02-27 0
2021-02-28 1
2021-03-01 5
2021-03-02 14
2021-03-03 30
dtype: int32
loop¶
We decorate sumsq with the loop decorator. Once we introduce loop, The function will loop over columns of a DataFrame or a numpy array:
[4]:
@loop(pd.DataFrame, dict, list, np.ndarray)
@pd2np
@compiled
def sumsq(a, total = 0):
res = np.empty_like(a)
for i in range(a.shape[0]):
if np.isnan(a[i]):
res[i] = np.nan
else:
total += a[i]**2
res[i] = total
return res
df = pd.DataFrame(dict(a = a, b = a+1), drange(-4))
df
[4]:
a | b | |
---|---|---|
2021-02-27 | 0 | 1 |
2021-02-28 | 1 | 2 |
2021-03-01 | 2 | 3 |
2021-03-02 | 3 | 4 |
2021-03-03 | 4 | 5 |
[5]:
sumsq(df)
[5]:
a | b | |
---|---|---|
2021-02-27 | 0 | 1 |
2021-02-28 | 1 | 5 |
2021-03-01 | 5 | 14 |
2021-03-02 | 14 | 30 |
2021-03-03 | 30 | 55 |
Indeed, since we asked it to loop over dict, list and numpy array (2d)
[6]:
sumsq(dict(a = a, b = a+1))
[6]:
{'a': array([ 0, 1, 5, 14, 30]), 'b': array([ 1, 5, 14, 30, 55])}
[7]:
sumsq(df.values)
[7]:
array([[ 0, 1],
[ 1, 5],
[ 5, 14],
[14, 30],
[30, 55]])
presync: manage indexing and date stamps¶
Suppose the function takes two (or more) timeseries.
[8]:
@presync(index = 'inner')
@loop(pd.DataFrame, np.ndarray)
@pd2np
def product(a, b):
return a * b
[9]:
a = np.arange(5); b = np.arange(5)
product(a,b)
[9]:
array([ 0, 1, 4, 9, 16])
What happens when the weights and the timeseries are unsynchronized?
[10]:
a_ = pd.Series(a, drange(-4)) ; a_.name = 'a'
b_ = pd.Series(b, drange(-3,1)); b_.name = 'b'
pd.concat([a_, b_], axis=1)
[10]:
a | b | |
---|---|---|
2021-02-27 | 0.0 | NaN |
2021-02-28 | 1.0 | 0.0 |
2021-03-01 | 2.0 | 1.0 |
2021-03-02 | 3.0 | 2.0 |
2021-03-03 | 4.0 | 3.0 |
2021-03-04 | NaN | 4.0 |
[11]:
product(a_, b_) ## just the inner values
[11]:
2021-02-28 0
2021-03-01 2
2021-03-02 6
2021-03-03 12
Freq: D, dtype: int32
[12]:
product.oj(a_, b_) ## outer join
[12]:
2021-02-27 NaN
2021-02-28 0.0
2021-03-01 2.0
2021-03-02 6.0
2021-03-03 12.0
2021-03-04 NaN
Freq: D, dtype: float64
[13]:
product.oj.ffill(a_, b_) ## outer join and forward-fill
[13]:
2021-02-27 NaN
2021-02-28 0.0
2021-03-01 2.0
2021-03-02 6.0
2021-03-03 12.0
2021-03-04 16.0
Freq: D, dtype: float64
presync and numpy arrays¶
When we deal with thousands of equities, one way of speeding calculations is by stacking them all onto huge dataframes. This does work but one is always busy fiddling with ‘the universe’ one is trading. We took a slightly different approach:
We define a global timestamp.
We then sample each timeseries to that global timestamp, dropping the early history where the data is all nan. (df_fillna(ts, index, method = ‘fnna’)).
We then do our research on these numpy arrays.
Finally, once we are done, we resample back to the global timestamp.
While we are in numpy arrays, we can ‘inner join’ by recognising the ‘end’ of each array shares the same date. Indeed df_index, df_reindex and presync all work seemlessly on np.ndarray as well as DataFrames, under that assumption that the end of all arrays are in sync.
We find this approach saves on memory and on computation time. It also lends itself to being able to retrieve and create specific universes for specific trading ideas. It is not without its own issues but that is a separate discussion.
[14]:
a = np.arange(5); b = np.arange(1,5)
a, b
[14]:
(array([0, 1, 2, 3, 4]), array([1, 2, 3, 4]))
[15]:
product(a, b)
[15]:
array([ 1, 4, 9, 16])
[16]:
us = calendar('US')
dates = pd.Index(us.drange('-40y', 0 ,'1b'))
[17]:
universe = dictable(stock = ['msft', 'appl', 'tsla'], n = [10000, 8000, 7000])
universe = universe(ts = lambda n: pd.Series(np.random.normal(0,1,n+1), us.drange('-%ib'%n, 0, '1b'))[np.random.normal(0,1,n+1)>-1])
universe
[17]:
dictable[3 x 3]
stock|n |ts
msft |10000|1982-11-03 -1.309868
| |1982-11-04 -0.737816
| |1982-11-05 0.460173
| |1982-11-08 -0.895898
| |1982-11-09 -0.813305
appl |8000 |1990-07-04 0.040855
| |1990-07-05 -1.327995
| |1990-07-06 0.114328
| |1990-07-09 -1.626176
| |1990-07-10 -0.031428
tsla |7000 |1994-05-04 -1.259911
| |1994-05-05 1.014304
| |1994-05-09 -0.035104
| |1994-05-10 -1.265964
| |1994-05-11 -0.001664
[18]:
universe = universe(rtn = lambda ts: ts.values)
universe = universe(price = lambda rtn : cumsum(rtn))
universe = universe(vol = lambda rtn: ewmstd(rtn, 30))
universe
[18]:
dictable[3 x 6]
stock|n |ts |rtn |price |vol
msft |10000|1982-11-03 -1.309868|[-1.3098679 -0.73781612 0.4601727 ... -0.327291|[-1.3098679 -2.04768402 -1.58751132 ... 4.750977|[ nan nan nan ... 1.02923517 1
| |1982-11-04 -0.737816| 0.67289106] | 5.89220017] |
| |1982-11-05 0.460173| | |
| |1982-11-08 -0.895898| | |
| |1982-11-09 -0.813305| | |
appl |8000 |1990-07-04 0.040855|[ 0.04085499 -1.32799499 0.11432766 ... -1.017795|[ 4.08549924e-02 -1.28714000e+00 -1.17281234e+00 .|[ nan nan nan ... 0.88535052 0
| |1990-07-05 -1.327995| -0.82540937] | 7.67908570e+01 7.59654476e+01] |
| |1990-07-06 0.114328| | |
| |1990-07-09 -1.626176| | |
| |1990-07-10 -0.031428| | |
tsla |7000 |1994-05-04 -1.259911|[-1.25991126 1.01430418 -0.0351036 ... -0.174814|[-1.25991126 -0.24560708 -0.28071068 ... 24.331768|[ nan nan nan ... 0.94944115 0
| |1994-05-05 1.014304| -0.69279468] | 23.93415613] |
| |1994-05-09 -0.035104| | |
| |1994-05-10 -1.265964| | |
| |1994-05-11 -0.001664| | |
[19]:
presync(lambda tss: np.array(tss).T)(universe.vol)
[19]:
array([[1.01584217, 0.95105069, nan],
[1.02939552, 0.99139701, nan],
[1.01323584, 0.97982437, nan],
...,
[1.02923517, 0.88535052, 0.94944115],
[1.018515 , 0.91252795, 0.93434464],
[1.01216505, 0.91053212, 0.93155713]])
[20]:
universe = universe.do(lambda value: np_reindex(value, dates), 'rtn', 'price', 'vol')
universe
[20]:
dictable[3 x 6]
stock|n |ts |rtn |price |vol
msft |10000|1982-11-03 -1.309868|1988-11-21 -1.309868|1988-11-21 -1.309868 |1988-11-21 NaN
| |1982-11-04 -0.737816|1988-11-22 -0.737816|1988-11-22 -2.047684 |1988-11-22 NaN
| |1982-11-05 0.460173|1988-11-23 0.460173|1988-11-23 -1.587511 |1988-11-23 NaN
| |1982-11-08 -0.895898|1988-11-24 -0.895898|1988-11-24 -2.483409 |1988-11-24 NaN
| |1982-11-09 -0.813305|1988-11-25 -0.813305|1988-11-25 -3.296714 |1988-11-25 NaN
appl |8000 |1990-07-04 0.040855|1995-04-20 0.040855|1995-04-20 0.040855|1995-04-20 NaN
| |1990-07-05 -1.327995|1995-04-21 -1.327995|1995-04-21 -1.287140|1995-04-21 NaN
| |1990-07-06 0.114328|1995-04-24 0.114328|1995-04-24 -1.172812|1995-04-24 NaN
| |1990-07-09 -1.626176|1995-04-25 -1.626176|1995-04-25 -2.798988|1995-04-25 NaN
| |1990-07-10 -0.031428|1995-04-26 -0.031428|1995-04-26 -2.830417|1995-04-26 NaN
tsla |7000 |1994-05-04 -1.259911|1998-09-11 -1.259911|1998-09-11 -1.259911|1998-09-11 NaN
| |1994-05-05 1.014304|1998-09-14 1.014304|1998-09-14 -0.245607|1998-09-14 NaN
| |1994-05-09 -0.035104|1998-09-15 -0.035104|1998-09-15 -0.280711|1998-09-15 NaN
| |1994-05-10 -1.265964|1998-09-16 -1.265964|1998-09-16 -1.546674|1998-09-16 NaN
| |1994-05-11 -0.001664|1998-09-17 -0.001664|1998-09-17 -1.548338|1998-09-17 NaN
[21]:
vol = pd.concat(universe.vol, axis = 1); vol.columns = universe.stock
vol
[21]:
msft | appl | tsla | |
---|---|---|---|
1988-11-21 | NaN | NaN | NaN |
1988-11-22 | NaN | NaN | NaN |
1988-11-23 | NaN | NaN | NaN |
1988-11-24 | NaN | NaN | NaN |
1988-11-25 | NaN | NaN | NaN |
... | ... | ... | ... |
2021-02-25 | 1.063016 | 0.890791 | 0.931185 |
2021-02-26 | 1.045735 | 0.880376 | 0.963182 |
2021-03-01 | 1.029235 | 0.885351 | 0.949441 |
2021-03-02 | 1.018515 | 0.912528 | 0.934345 |
2021-03-03 | 1.012165 | 0.910532 | 0.931557 |
8423 rows × 3 columns