pyg.timeseries decorators¶

There are a few decorators that are relevant to timeseries analysis ## pd2np and compiled We write most of our underlying functions assuming the function parameters are 1-d numpy arrays. If you want them numba.jit compiled, please use the compiled operator.

[1]:

from pyg import *
import pandas as pd; import numpy as np
@pd2np
@compiled
def sumsq(a, total = 0.0):
    res = np.empty_like(a)
    for i in range(a.shape[0]):
        if np.isnan(a[i]):
            res[i] = np.nan
        else:
            total += a[i]**2
            res[i] = total
    return res

It is not surpising that sumsq works for arrays. Notice how np.isnan is handled to ensure nans are skipped.

[2]:

a = np.arange(5)
sumsq(a)

[2]:

array([ 0,  1,  5, 14, 30])

pd2np will convert a pandas Series to arrays, run the function and convert back to pandas. This will only work for a 1-dimensional objects, so no df nor 2-d np.ndarray.

[3]:

s = pd.Series(a, drange(-4))
sumsq(s)

[3]:

2021-02-27     0
2021-02-28     1
2021-03-01     5
2021-03-02    14
2021-03-03    30
dtype: int32

loop¶

We decorate sumsq with the loop decorator. Once we introduce loop, The function will loop over columns of a DataFrame or a numpy array:

[4]:

@loop(pd.DataFrame, dict, list, np.ndarray)
@pd2np
@compiled
def sumsq(a, total = 0):
    res = np.empty_like(a)
    for i in range(a.shape[0]):
        if np.isnan(a[i]):
            res[i] = np.nan
        else:
            total += a[i]**2
            res[i] = total
    return res

df = pd.DataFrame(dict(a = a, b = a+1), drange(-4))
df

[4]:

	a	b
2021-02-27	0	1
2021-02-28	1	2
2021-03-01	2	3
2021-03-02	3	4
2021-03-03	4	5

[5]:

sumsq(df)

[5]:

	a	b
2021-02-27	0	1
2021-02-28	1	5
2021-03-01	5	14
2021-03-02	14	30
2021-03-03	30	55

Indeed, since we asked it to loop over dict, list and numpy array (2d)

[6]:

sumsq(dict(a = a, b = a+1))

[6]:

{'a': array([ 0,  1,  5, 14, 30]), 'b': array([ 1,  5, 14, 30, 55])}

[7]:

sumsq(df.values)

[7]:

array([[ 0,  1],
       [ 1,  5],
       [ 5, 14],
       [14, 30],
       [30, 55]])

presync: manage indexing and date stamps¶

Suppose the function takes two (or more) timeseries.

[8]:

@presync(index = 'inner')
@loop(pd.DataFrame, np.ndarray)
@pd2np
def product(a, b):
    return a * b

[9]:

a = np.arange(5); b = np.arange(5)
product(a,b)

[9]:

array([ 0,  1,  4,  9, 16])

What happens when the weights and the timeseries are unsynchronized?

[10]:

a_ = pd.Series(a, drange(-4)) ; a_.name = 'a'
b_ = pd.Series(b, drange(-3,1)); b_.name = 'b'
pd.concat([a_, b_], axis=1)

[10]:

	a	b
2021-02-27	0.0	NaN
2021-02-28	1.0	0.0
2021-03-01	2.0	1.0
2021-03-02	3.0	2.0
2021-03-03	4.0	3.0
2021-03-04	NaN	4.0

[11]:

product(a_, b_) ## just the inner values

[11]:

2021-02-28     0
2021-03-01     2
2021-03-02     6
2021-03-03    12
Freq: D, dtype: int32

[12]:

product.oj(a_, b_) ## outer join

[12]:

2021-02-27     NaN
2021-02-28     0.0
2021-03-01     2.0
2021-03-02     6.0
2021-03-03    12.0
2021-03-04     NaN
Freq: D, dtype: float64

[13]:

product.oj.ffill(a_, b_) ## outer join and forward-fill

[13]:

2021-02-27     NaN
2021-02-28     0.0
2021-03-01     2.0
2021-03-02     6.0
2021-03-03    12.0
2021-03-04    16.0
Freq: D, dtype: float64

presync and numpy arrays¶

When we deal with thousands of equities, one way of speeding calculations is by stacking them all onto huge dataframes. This does work but one is always busy fiddling with ‘the universe’ one is trading. We took a slightly different approach:

We define a global timestamp.
We then sample each timeseries to that global timestamp, dropping the early history where the data is all nan. (df_fillna(ts, index, method = ‘fnna’)).
We then do our research on these numpy arrays.
Finally, once we are done, we resample back to the global timestamp.

While we are in numpy arrays, we can ‘inner join’ by recognising the ‘end’ of each array shares the same date. Indeed df_index, df_reindex and presync all work seemlessly on np.ndarray as well as DataFrames, under that assumption that the end of all arrays are in sync.

We find this approach saves on memory and on computation time. It also lends itself to being able to retrieve and create specific universes for specific trading ideas. It is not without its own issues but that is a separate discussion.

[14]:

a = np.arange(5); b = np.arange(1,5)
a, b

[14]:

(array([0, 1, 2, 3, 4]), array([1, 2, 3, 4]))

[15]:

product(a, b)

[15]:

array([ 1,  4,  9, 16])

[16]:

us = calendar('US')
dates = pd.Index(us.drange('-40y', 0 ,'1b'))

[17]:

universe = dictable(stock = ['msft', 'appl', 'tsla'], n = [10000, 8000, 7000])
universe = universe(ts = lambda n: pd.Series(np.random.normal(0,1,n+1), us.drange('-%ib'%n, 0, '1b'))[np.random.normal(0,1,n+1)>-1])
universe

[17]:

dictable[3 x 3]
stock|n    |ts
msft |10000|1982-11-03   -1.309868
     |     |1982-11-04   -0.737816
     |     |1982-11-05    0.460173
     |     |1982-11-08   -0.895898
     |     |1982-11-09   -0.813305
appl |8000 |1990-07-04    0.040855
     |     |1990-07-05   -1.327995
     |     |1990-07-06    0.114328
     |     |1990-07-09   -1.626176
     |     |1990-07-10   -0.031428
tsla |7000 |1994-05-04   -1.259911
     |     |1994-05-05    1.014304
     |     |1994-05-09   -0.035104
     |     |1994-05-10   -1.265964
     |     |1994-05-11   -0.001664

[18]:

universe = universe(rtn = lambda ts: ts.values)
universe = universe(price = lambda rtn : cumsum(rtn))
universe = universe(vol = lambda rtn: ewmstd(rtn, 30))
universe

[18]:

dictable[3 x 6]
stock|n    |ts                    |rtn                                               |price                                             |vol
msft |10000|1982-11-03   -1.309868|[-1.3098679  -0.73781612  0.4601727  ... -0.327291|[-1.3098679  -2.04768402 -1.58751132 ...  4.750977|[       nan        nan        nan ... 1.02923517 1
     |     |1982-11-04   -0.737816|  0.67289106]                                     |  5.89220017]                                     |
     |     |1982-11-05    0.460173|                                                  |                                                  |
     |     |1982-11-08   -0.895898|                                                  |                                                  |
     |     |1982-11-09   -0.813305|                                                  |                                                  |
appl |8000 |1990-07-04    0.040855|[ 0.04085499 -1.32799499  0.11432766 ... -1.017795|[ 4.08549924e-02 -1.28714000e+00 -1.17281234e+00 .|[       nan        nan        nan ... 0.88535052 0
     |     |1990-07-05   -1.327995| -0.82540937]                                     |  7.67908570e+01  7.59654476e+01]                 |
     |     |1990-07-06    0.114328|                                                  |                                                  |
     |     |1990-07-09   -1.626176|                                                  |                                                  |
     |     |1990-07-10   -0.031428|                                                  |                                                  |
tsla |7000 |1994-05-04   -1.259911|[-1.25991126  1.01430418 -0.0351036  ... -0.174814|[-1.25991126 -0.24560708 -0.28071068 ... 24.331768|[       nan        nan        nan ... 0.94944115 0
     |     |1994-05-05    1.014304| -0.69279468]                                     | 23.93415613]                                     |
     |     |1994-05-09   -0.035104|                                                  |                                                  |
     |     |1994-05-10   -1.265964|                                                  |                                                  |
     |     |1994-05-11   -0.001664|                                                  |                                                  |

[19]:

presync(lambda tss: np.array(tss).T)(universe.vol)

[19]:

array([[1.01584217, 0.95105069,        nan],
       [1.02939552, 0.99139701,        nan],
       [1.01323584, 0.97982437,        nan],
       ...,
       [1.02923517, 0.88535052, 0.94944115],
       [1.018515  , 0.91252795, 0.93434464],
       [1.01216505, 0.91053212, 0.93155713]])

[20]:

universe = universe.do(lambda value: np_reindex(value, dates), 'rtn', 'price', 'vol')
universe

[20]:

dictable[3 x 6]
stock|n    |ts                    |rtn                   |price                  |vol
msft |10000|1982-11-03   -1.309868|1988-11-21   -1.309868|1988-11-21   -1.309868 |1988-11-21         NaN
     |     |1982-11-04   -0.737816|1988-11-22   -0.737816|1988-11-22   -2.047684 |1988-11-22         NaN
     |     |1982-11-05    0.460173|1988-11-23    0.460173|1988-11-23   -1.587511 |1988-11-23         NaN
     |     |1982-11-08   -0.895898|1988-11-24   -0.895898|1988-11-24   -2.483409 |1988-11-24         NaN
     |     |1982-11-09   -0.813305|1988-11-25   -0.813305|1988-11-25   -3.296714 |1988-11-25         NaN
appl |8000 |1990-07-04    0.040855|1995-04-20    0.040855|1995-04-20     0.040855|1995-04-20         NaN
     |     |1990-07-05   -1.327995|1995-04-21   -1.327995|1995-04-21    -1.287140|1995-04-21         NaN
     |     |1990-07-06    0.114328|1995-04-24    0.114328|1995-04-24    -1.172812|1995-04-24         NaN
     |     |1990-07-09   -1.626176|1995-04-25   -1.626176|1995-04-25    -2.798988|1995-04-25         NaN
     |     |1990-07-10   -0.031428|1995-04-26   -0.031428|1995-04-26    -2.830417|1995-04-26         NaN
tsla |7000 |1994-05-04   -1.259911|1998-09-11   -1.259911|1998-09-11    -1.259911|1998-09-11         NaN
     |     |1994-05-05    1.014304|1998-09-14    1.014304|1998-09-14    -0.245607|1998-09-14         NaN
     |     |1994-05-09   -0.035104|1998-09-15   -0.035104|1998-09-15    -0.280711|1998-09-15         NaN
     |     |1994-05-10   -1.265964|1998-09-16   -1.265964|1998-09-16    -1.546674|1998-09-16         NaN
     |     |1994-05-11   -0.001664|1998-09-17   -0.001664|1998-09-17    -1.548338|1998-09-17         NaN

[21]:

vol = pd.concat(universe.vol, axis = 1); vol.columns = universe.stock
vol

[21]:

	msft	appl	tsla
1988-11-21	NaN	NaN	NaN
1988-11-22	NaN	NaN	NaN
1988-11-23	NaN	NaN	NaN
1988-11-24	NaN	NaN	NaN
1988-11-25	NaN	NaN	NaN
...	...	...	...
2021-02-25	1.063016	0.890791	0.931185
2021-02-26	1.045735	0.880376	0.963182
2021-03-01	1.029235	0.885351	0.949441
2021-03-02	1.018515	0.912528	0.934345
2021-03-03	1.012165	0.910532	0.931557

8423 rows × 3 columns

pyg.timeseries decorators¶

loop¶

presync: manage indexing and date stamps¶

presync and numpy arrays¶

pyg

Navigation

Related Topics