pyg.timeseries¶

Given pandas, why do we need this timeseries library? pandas is amazing but there are a few features in pyg.timeseries designed to enhance it.

There are three issues with pandas that pyg.timeseries tries to address:

pandas works on pandas objects (obviously) but not on numpy arrays.
pandas handles nan within timeseries inconsistently across its functions. This makes your results sensitive to reindexing/resampling. E.g.:
- a.expanding() & a.ewm() ignore nan’s for calculation and then forward-fill the result.
- a.diff(), a.rolling() include any nans in the calculation, leading to nan propagation.
pandas is great if you have the full timeseries. However, if you now want to run the same calculations in a live environment, on recent data, you have to append recent data at the end of the DataFrame and rerun.

pyg.timeseries tries to address this:

pyg.timeseries agrees with pandas 100% (if there are no nan in the dataframe) while being of comparable speed
pyg.timeseries works seemlessly on pandas objects and on numpy arrays, with no code change.
pyg.timeseries handles nan consistently across all its functions, ‘ignoring’ all nan, making your results consistent regardless of reindexing/resampling.
pyg.timeseries exposes the state of the internal function calculation. The exposure of internal state allows us to calculate the output of additional data without re-running history. This speeds up of two very common problems in finance:
- risk calculations, Monte Carlo scenarios: We can run a trading strategy up to today and then generate multiple scenarios and see what-if, without having to rerun the full history.
- live versus history: pandas is designed to run a full historical simulation. However, once we reach “today”, speed is of the essense and running a full historical simulation every time we ingest a new price, is just too slow. That is why most fast trading is built around fast state-machines. Of course, making sure research & live versions do the same thing is tricky. pyg gives you the ability to run two systems in parallel with almost the same code base: run full history overnight and then run today’s code base instantly, instantiated with the output of the historical simulation.

Agreement between pyg.timeseries and pandas¶

[1]:

from pyg import *; import pandas as pd; import numpy as np
s = pd.Series(np.random.normal(0,1,10000), drange(-9999)); a = s.values
t = pd.Series(np.random.normal(0,1,10000), drange(-9999))

[2]:

assert abs(s.count() - ts_count(s))< 1e-10
assert abs(s.mean() - ts_mean(s))  < 1e-10
assert abs(s.sum()  - ts_sum(s))   < 1e-10
assert abs(s.std()  - ts_std(s))   < 1e-10
assert abs(s.skew() - ts_skew(s))  < 1e-10

[3]:

assert abs(ewma(s, 10) - s.ewm(10).mean()).max()       < 1e-10
assert abs(ewmstd(s, 10) - s.ewm(10).std()).max()      < 1e-10
assert abs(ewmvar(s, 10) - s.ewm(10).var()).max()      < 1e-10
assert abs(ewmcor(s, t, 10) - s.ewm(10).corr(t)).max() < 1e-10

[4]:

assert abs(expanding_sum(s) - s.expanding().sum()).max()            < 1e-10
assert abs(expanding_mean(s) - s.expanding().mean()).max()          < 1e-10
assert abs(expanding_std(s) - s.expanding().std()).max()            < 1e-10
assert abs(expanding_skew(s) - s.expanding().skew()).max()          < 1e-10
assert abs(expanding_min(s) - s.expanding().min()).max()            < 1e-10
assert abs(expanding_max(s) - s.expanding().max()).max()            < 1e-10
assert abs(expanding_median(s) - s.expanding().median()).max()      < 1e-10

[5]:

assert abs(rolling_sum(s,10) - s.rolling(10).sum()).max()                        < 1e-10
assert abs(rolling_mean(s,10) - s.rolling(10).mean()).max()                      < 1e-10
assert abs(rolling_std(s,10) - s.rolling(10).std()).max()                        < 1e-10
assert abs(rolling_skew(s,10) - s.rolling(10).skew()).max()                      < 1e-10
assert abs(rolling_min(s,10) - s.rolling(10).min()).max()                        < 1e-10
assert abs(rolling_max(s,10) - s.rolling(10).max()).max()                        < 1e-10
assert abs(rolling_median(s,10) - s.rolling(10).median()).max()                  < 1e-10
assert abs(rolling_quantile(s,10,0.3)[0.3] - s.rolling(10).quantile(0.3)).max()  < 1e-10 ## The rolling_quantile returns the quantile as the header, since it supports multiple quantiles calculations: e.g. rolling_quantile(s,10,[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])

Quick performance comparison¶

pyg, when run on pandas dataframes rather than arrays, is of comparable speed to pandas

[6]:

compare = dictable(op =  ['rolling_sum', 'rolling_mean', 'rolling_std', 'rolling_min', 'rolling_median'],
             pyg = [rolling_sum, rolling_mean, rolling_std, rolling_min, rolling_median],
             pandas = [s.rolling(10).sum, s.rolling(10).mean, s.rolling(10).std, s.rolling(10).min, s.rolling(10).median]).do(lambda v: timer(v, n = 100, time = True), 'pyg', 'pandas')(pyg = lambda pyg: pyg(s, 10))(pandas = lambda pandas: pandas())

compare += dictable(op =  ['expanding_sum', 'expanding_mean', 'expanding_std', 'expanding_min', 'expanding_median'],
             pyg = [expanding_sum, expanding_mean, expanding_std, expanding_min, expanding_median],
             pandas = [s.expanding().sum, s.expanding().mean, s.expanding().std, s.expanding().min, s.expanding().median]).do(lambda v: timer(v, n = 100, time = True), 'pyg', 'pandas')(pyg = lambda pyg: pyg(s))(pandas = lambda pandas: pandas())

compare += dictable(op =  ['ewma', 'ewmstd', 'ewmvar'],
             pyg = [ewma, ewmstd, ewmvar],
             pandas = [s.ewm(10).mean, s.ewm(10).std, s.ewm(10).var]).do(lambda v: timer(v, n = 100, time = True), 'pyg', 'pandas')(pyg = lambda pyg: pyg(s, 10))(pandas = lambda pandas: pandas())

print(compare(winner = lambda pyg, pandas: 'pyg' if pyg<pandas * 0.8 else 'pandas' if pyg > 1.2 * pandas else 'draw'))

2021-03-06 22:41:35,805 - pyg - INFO - TIMER:'rolling_sum' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.109934 sec
2021-03-06 22:41:35,898 - pyg - INFO - TIMER:'rolling_mean' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.087950 sec
2021-03-06 22:41:35,995 - pyg - INFO - TIMER:'rolling_std' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.090944 sec
2021-03-06 22:41:36,116 - pyg - INFO - TIMER:'rolling_min' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.119930 sec
2021-03-06 22:41:36,269 - pyg - INFO - TIMER:'rolling_median' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.153483 sec
2021-03-06 22:41:36,351 - pyg - INFO - TIMER:'sum' args:[[], []] (100 runs) took 0:00:00.079685 sec
2021-03-06 22:41:36,465 - pyg - INFO - TIMER:'mean' args:[[], []] (100 runs) took 0:00:00.112599 sec
2021-03-06 22:41:36,585 - pyg - INFO - TIMER:'std' args:[[], []] (100 runs) took 0:00:00.119877 sec
2021-03-06 22:41:36,687 - pyg - INFO - TIMER:'min' args:[[], []] (100 runs) took 0:00:00.100942 sec
2021-03-06 22:41:37,378 - pyg - INFO - TIMER:'median' args:[[], []] (100 runs) took 0:00:00.688107 sec
2021-03-06 22:41:37,467 - pyg - INFO - TIMER:'expanding_sum' args:[["<class 'pandas.core.series.Series'>[10000]"], []] (100 runs) took 0:00:00.086951 sec
2021-03-06 22:41:37,528 - pyg - INFO - TIMER:'expanding_mean' args:[["<class 'pandas.core.series.Series'>[10000]"], []] (100 runs) took 0:00:00.059966 sec
2021-03-06 22:41:37,651 - pyg - INFO - TIMER:'expanding_std' args:[["<class 'pandas.core.series.Series'>[10000]"], []] (100 runs) took 0:00:00.120938 sec
2021-03-06 22:41:37,695 - pyg - INFO - TIMER:'expanding_min' args:[["<class 'pandas.core.series.Series'>[10000]"], []] (100 runs) took 0:00:00.040991 sec
2021-03-06 22:41:37,903 - pyg - INFO - TIMER:'expanding_median' args:[["<class 'pandas.core.series.Series'>[10000]"], []] (100 runs) took 0:00:00.206892 sec
2021-03-06 22:41:37,939 - pyg - INFO - TIMER:'sum' args:[[], []] (100 runs) took 0:00:00.033967 sec
2021-03-06 22:41:38,002 - pyg - INFO - TIMER:'mean' args:[[], []] (100 runs) took 0:00:00.059981 sec
2021-03-06 22:41:38,075 - pyg - INFO - TIMER:'std' args:[[], []] (100 runs) took 0:00:00.071959 sec
2021-03-06 22:41:38,245 - pyg - INFO - TIMER:'min' args:[[], []] (100 runs) took 0:00:00.168553 sec
2021-03-06 22:41:39,523 - pyg - INFO - TIMER:'median' args:[[], []] (100 runs) took 0:00:01.277246 sec
2021-03-06 22:41:39,620 - pyg - INFO - TIMER:'ewma' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.094945 sec
2021-03-06 22:41:39,732 - pyg - INFO - TIMER:'ewmstd' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.110924 sec
2021-03-06 22:41:39,827 - pyg - INFO - TIMER:'ewmvar' args:[["<class 'pandas.core.series.Series'>[10000]", '10'], []] (100 runs) took 0:00:00.093965 sec
2021-03-06 22:41:39,855 - pyg - INFO - TIMER:'mean' args:[[], []] (100 runs) took 0:00:00.026971 sec
2021-03-06 22:41:39,953 - pyg - INFO - TIMER:'std' args:[[], []] (100 runs) took 0:00:00.096954 sec
2021-03-06 22:41:39,995 - pyg - INFO - TIMER:'var' args:[[], []] (100 runs) took 0:00:00.039983 sec

op              |pandas        |pyg           |winner
rolling_sum     |0:00:00.079685|0:00:00.109934|pandas
rolling_mean    |0:00:00.112599|0:00:00.087950|pyg
rolling_std     |0:00:00.119877|0:00:00.090944|pyg
rolling_min     |0:00:00.100942|0:00:00.119930|draw
rolling_median  |0:00:00.688107|0:00:00.153483|pyg
expanding_sum   |0:00:00.033967|0:00:00.086951|pandas
expanding_mean  |0:00:00.059981|0:00:00.059966|draw
expanding_std   |0:00:00.071959|0:00:00.120938|pandas
expanding_min   |0:00:00.168553|0:00:00.040991|pyg
expanding_median|0:00:01.277246|0:00:00.206892|pyg
ewma            |0:00:00.026971|0:00:00.094945|pandas
ewmstd          |0:00:00.096954|0:00:00.110924|draw
ewmvar          |0:00:00.039983|0:00:00.093965|pandas

pyg and numpy arrays¶

pyg supports numpy arrays natively. Indeed, pyg is 3-5 times faster on numpy arrays.

[7]:

a = s.values
assert abs(ts_count(a) - ts_count(s))< 1e-10
assert abs(ts_mean(a) - ts_mean(s))  < 1e-10
assert abs(ts_sum(a)  - ts_sum(s))   < 1e-10
assert abs(ts_std(a)  - ts_std(s))   < 1e-10
assert abs(ts_skew(a) - ts_skew(s))  < 1e-10

[8]:

assert abs(ewma(s, 10) - ewma(a,10)).max()                    < 1e-10
assert abs(ewmstd(s, 10) - ewmstd(a,10)).max()                < 1e-10
assert abs(ewmvar(s, 10) - ewmvar(a,10)).max()                < 1e-10
assert abs(ewmcor(s, t, 10) - ewmcor(a, t.values, 10)).max()  < 1e-10

[9]:

assert abs(expanding_sum(s) - expanding_sum(a)).max()               < 1e-10
assert abs(expanding_min(s) - expanding_min(a)).max()               < 1e-10
assert abs(expanding_max(s) - expanding_max(a)).max()               < 1e-10
assert abs(expanding_mean(s) - expanding_mean(a)).max()             < 1e-10
assert abs(expanding_std(s) - expanding_std(a)).max()               < 1e-10
assert abs(expanding_skew(s) - expanding_skew(a)).max()             < 1e-10
assert abs(expanding_median(s) - expanding_median(a)).max()         < 1e-10

[10]:

assert abs(rolling_sum(s,10) - rolling_sum(a,10)).max()               < 1e-10
assert abs(rolling_min(s,10) - rolling_min(a,10)).max()               < 1e-10
assert abs(rolling_max(s,10) - rolling_max(a,10)).max()               < 1e-10
assert abs(rolling_mean(s,10) - rolling_mean(a,10)).max()             < 1e-10
assert abs(rolling_std(s,10) - rolling_std(a,10)).max()               < 1e-10
assert abs(rolling_skew(s,10) - rolling_skew(a,10)).max()             < 1e-10
assert abs(rolling_median(s,10) - rolling_median(a,10)).max()         < 1e-10

pandas treatment of nan¶

Suppose we have weekly data that at some point we resample to daily… The two look the same…

[11]:

t0 = dt_bump('20210301', '-999w')
days = drange(t0,'20210301','1b')
weekly = pd.Series(np.random.normal(0,1,1000), drange(t0,None,'1w')); weekly.name = 'weekly'
daily = weekly.reindex(days); daily.name = 'daily'
pd.concat([weekly,daily], axis = 1)

[11]:

	weekly	daily
2002-01-07	0.423187	0.423187
2002-01-08	NaN	NaN
2002-01-09	NaN	NaN
2002-01-10	NaN	NaN
2002-01-11	NaN	NaN
...	...	...
2021-02-23	NaN	NaN
2021-02-24	NaN	NaN
2021-02-25	NaN	NaN
2021-02-26	NaN	NaN
2021-03-01	1.408439	1.408439

4996 rows × 2 columns

… but any calculation using the daily will yield a different result from a calculation on the weekly which is then resampled to daily:

[12]:

pd.concat([weekly.ewm(4).mean().reindex(days), daily.ewm(4).mean()], axis = 1) ## The result depends on what is done first...

[12]:

	weekly	daily
2002-01-07	0.423187	0.423187
2002-01-08	NaN	0.423187
2002-01-09	NaN	0.423187
2002-01-10	NaN	0.423187
2002-01-11	NaN	0.423187
...	...	...
2021-02-23	NaN	0.178687
2021-02-24	NaN	0.178687
2021-02-25	NaN	0.178687
2021-02-26	NaN	0.178687
2021-03-01	0.655222	1.005474

4996 rows × 2 columns

[13]:

pd.concat([weekly.diff().reindex(days), daily.diff()], axis = 1) ## The result depends on what is done first...

[13]:

	weekly	daily
2002-01-07	NaN	NaN
2002-01-08	NaN	NaN
2002-01-09	NaN	NaN
2002-01-10	NaN	NaN
2002-01-11	NaN	NaN
...	...	...
2021-02-23	NaN	NaN
2021-02-24	NaN	NaN
2021-02-25	NaN	NaN
2021-02-26	NaN	NaN
2021-03-01	1.644159	NaN

4996 rows × 2 columns

Indeed, for diff, daily.diff() is all nan!

pyg.timeseries treatment of nans¶

pyg treats nan as if they are not there, so the fact that we resampled the data and introduced lots of nan’s does not affect the calculations. We find this to be a more logical (and less error prone) approach.

[14]:

nona(pd.concat([ewma(weekly, 4).reindex(days), ewma(daily,4)], axis = 1)) ## The two match exactly

[14]:

	0	1
2002-01-07	0.423187	0.423187
2002-01-14	-0.105302	-0.105302
2002-01-21	-0.019371	-0.019371
2002-01-28	0.332137	0.332137
2002-02-04	0.559419	0.559419
...	...	...
2021-02-01	0.369931	0.369931
2021-02-08	0.526351	0.526351
2021-02-15	0.642578	0.642578
2021-02-22	0.466918	0.466918
2021-03-01	0.655222	0.655222

1000 rows × 2 columns

[15]:

nona(pd.concat([diff(weekly).reindex(days), diff(daily)], axis = 1)) ## The result depends on what is done first...

[15]:

	0	1
2002-01-14	-0.951280	-0.951280
2002-01-21	0.632462	0.632462
2002-01-28	0.913911	0.913911
2002-02-04	0.077887	0.077887
2002-02-11	-2.180086	-2.180086
...	...	...
2021-02-01	-0.678079	-0.678079
2021-02-08	1.049093	1.049093
2021-02-15	-0.044543	-0.044543
2021-02-22	-1.343206	-1.343206
2021-03-01	1.644159	1.644159

999 rows × 2 columns

Using pyg.timeseries to manage state¶

One of the problem in timeseries analysis is writing research code that works in analysing past data but ideally, the same code can be used in live application. One easy approach is “stick the extra data point at the end and run it again from 1980”. This leaves us with a single code base but for many live applications (e.g. live trading), this is not viable.

Further, given our positions today, we may want to run simulations of “what happens next?” to understand what the system is likely to do should various events occur. Risk calculations are expensive and re-running 10k Monte Carlo scenarios, each time running from 1980 is expensive.

Conversely, we can run research and live systems on two separate code base. This makes live systems responsive but six months down the line, we realise research code base and live code base did not do quite the same thing.

pyg approaches this problem by exposing the internal state of each of its calculation. Each function has two versions:

function(…) returns the calculation as performed by pandas
function_(…) returns a dictionary of dict(data = , state = ). The data agrees with function(…) while the state is a dict we can instantiate new calculations with.

[16]:

from pyg import *
history = pd.Series(np.random.normal(0,1,1000), drange(-1000,-1))
history_signal = ewma_(history, 10)
history_signal # The output consists of 'data' and 'state' where data matches a normal ewma calculation

[16]:

{'data': 2018-06-10   -0.511500
 2018-06-11    0.445609
 2018-06-12   -0.065606
 2018-06-13   -0.358735
 2018-06-14   -0.069188
                 ...
 2021-03-01   -0.144503
 2021-03-02   -0.066708
 2021-03-03   -0.141431
 2021-03-04   -0.122797
 2021-03-05   -0.051610
 Length: 1000, dtype: float64,
 'state': {'t': nan, 't0': 0.9999999999999994, 't1': -0.05161000819451757}}

[17]:

live = pd.Series(np.random.normal(0,1,10), drange(9))
live_signal = ewma(live, 10, state = history_signal.state) ## I only feed in live timeseries
'live: from today onwards', live_signal

[17]:

('live: from today onwards',
 2021-03-06   -0.059815
 2021-03-07   -0.165151
 2021-03-08   -0.104525
 2021-03-09   -0.160978
 2021-03-10   -0.224791
 2021-03-11   -0.325723
 2021-03-12   -0.207468
 2021-03-13   -0.233642
 2021-03-14   -0.228141
 2021-03-15   -0.244483
 dtype: float64)

[18]:

joint_data = pd.concat([history, live])
joint_signal = ewma(joint_data, 10)
assert eq(live_signal, joint_signal[dt(0):])  # The live signal is the same, even though it only received live data for its calculation.
joint_signal[dt(0):]

[18]:

2021-03-06   -0.059815
2021-03-07   -0.165151
2021-03-08   -0.104525
2021-03-09   -0.160978
2021-03-10   -0.224791
2021-03-11   -0.325723
2021-03-12   -0.207468
2021-03-13   -0.233642
2021-03-14   -0.228141
2021-03-15   -0.244483
dtype: float64

This allows us to set up three parallel pipelines that share a virtually identical codebase:

workflow	historic data	live data	risk analysis
when run?	research/overnight	live	overnight
data source?	ts = long timeseries	a = short ts/array	1000’s of sims
speed?	slow, non-critical	instantenous	quick
apply f to data	x_ = f_(ts)	x = f(a, **x_)	same as live
apply g	y_ = g_(ts, x_)	y = g(a, x, **y_)	same as live
final result h	z_ = h_(ts, x_, y_)	z = h(a, x, y, **z_)	same as live

Note that for live trading or risk analysis, we tend to switch and run on numpy arrays rather than pandas object. This speeds up the calculations while introduces no code change. In the example below we explore how to create state-aware, functions within pyg. The paradigm is that for most functions, function_ will return not just the timeseries output but also the states.

Example: creating a function exposing its state¶

Suppose we try to write an ewma crossover function (the difference of two ewma). We want to normalize it by its own volatility. Traditionally we will write:

[19]:

def pandas_crossover(a, fast, slow, vol):
    fast_ewma = a.ewm(fast).mean()
    slow_ewma = a.ewm(slow).mean()
    raw_signal = fast_ewma - slow_ewma
    signal_rms = (raw_signal**2).ewm(vol).mean()**0.5
    signal_rms[signal_rms==0] = np.nan
    normalized = raw_signal/signal_rms
    return normalized

a = pd.Series(np.random.normal(0,1,10000), drange(-9999)); fast = 10; slow = 30; vol = 50
pandas_x = pandas_crossover(a, fast, slow, vol)
pandas_x

[19]:

1993-10-20         NaN
1993-10-21   -1.407264
1993-10-22   -1.714259
1993-10-23    1.177760
1993-10-24   -1.220600
                ...
2021-03-02   -1.767405
2021-03-03   -1.183420
2021-03-04   -1.764486
2021-03-05   -2.458497
2021-03-06   -2.242366
Length: 10000, dtype: float64

We can quickly rewrite it using pyg:

[28]:

def crossover(a, fast, slow, vol):
    fast_ewma = ewma(a, fast)
    slow_ewma = ewma(a, slow)
    raw_signal = fast_ewma - slow_ewma
    signal_rms = ewmrms(raw_signal, vol)
    signal_rms = v2na(signal_rms)
    normalized = raw_signal/signal_rms
    return normalized
x = crossover(a, fast, slow, vol)
assert abs(x-pandas_x).max()<1e-10
x

[28]:

1993-10-20   -1.000000
1993-10-21   -1.407264
1993-10-22   -1.714259
1993-10-23    1.177760
1993-10-24   -1.220600
                ...
2021-03-02   -1.767405
2021-03-03   -1.183420
2021-03-04   -1.764486
2021-03-05   -2.458497
2021-03-06   -2.242366
Length: 10000, dtype: float64

And with very little additional effort, we can write a new function that also exposes the internal state:

[29]:

_data = 'data'
def crossover_(a, fast, slow, vol, instate = None):
    state = Dict(fast = {}, slow = {}, vol = {}) if instate is None else instate
    fast_ewma_ = ewma_(a, fast, instate = state.fast)
    slow_ewma_ = ewma_(a, slow, instate = state.slow)
    raw_signal = fast_ewma_.data - slow_ewma_.data
    signal_rms = ewmrms_(raw_signal, vol, instate = state.vol)
    normalized = raw_signal/v2na(signal_rms.data)
    return Dict(data = normalized, state = Dict(fast = fast_ewma_.state, slow = slow_ewma_.state, vol = signal_rms.state))

crossover_.output = ['data', 'state'] # output declares the function to have a dict output and is used by cell

def crossover(a, fast, slow, vol, state = None):
    return crossover_(a, fast, slow, vol, instate = state).data

x_ = crossover_(a, fast, slow, vol)
assert eq(x, x_.data) and eq(x, crossover(a, fast, slow, vol))
x_.data

[29]:

1993-10-20   -1.000000
1993-10-21   -1.407264
1993-10-22   -1.714259
1993-10-23    1.177760
1993-10-24   -1.220600
                ...
2021-03-02   -1.767405
2021-03-03   -1.183420
2021-03-04   -1.764486
2021-03-05   -2.458497
2021-03-06   -2.242366
Length: 10000, dtype: float64

The three give idential results and we can also verify that crossover_ will allow us to split the evaluation to the long-history and the new data:

[45]:

history = a[:9900]
live = a[9900:].values
x_history = crossover_(history, 10, 30, 50)
x_live = crossover(live, 10, 30, 50, state = x_history.state)
x_ = crossover_(a, fast, slow, vol)
assert eq(x_live , x_.data[9900:].values)

Have we gained anything?

[46]:

pandas_old = timer(pandas_crossover, 100, time = True)(history, 10, 30, 50)
x_history  = crossover_(history, 10, 30, 50)
x_history_time  = timer(crossover_, 100, time = True)(history, 10, 30, 50)
x_live = timer(crossover, 100, time = True)(live, 10, 30, 50, state = x_history.state)
'pandas: ', pandas_old.microseconds//1000, 'pyg history:', x_history_time.microseconds//1000, 'pyg_live:', x_live.microseconds//1000

2021-03-06 23:55:39,746 - pyg - INFO - TIMER:'pandas_crossover' args:[["<class 'pandas.core.series.Series'>[9900]", '10', '30', '50'], []] (100 runs) took 0:00:00.373514 sec
2021-03-06 23:55:39,953 - pyg - INFO - TIMER:'crossover_' args:[["<class 'pandas.core.series.Series'>[9900]", '10', '30', '50'], []] (100 runs) took 0:00:00.202883 sec
2021-03-06 23:55:40,004 - pyg - INFO - TIMER:'crossover' args:[["<class 'numpy.ndarray'>[100]", '10', '30', '50'], ["state=<class 'pyg.base._dict.Dict'>[3]"]] (100 runs) took 0:00:00.049972 sec

[46]:

('pandas: ', 373, 'pyg history:', 202, 'pyg_live:', 49)

We see that pyg is already faster than pandas. Running just the new data using numpy arrays, is about 4-5 times faster still. Indeed, running 10k 100-day forward scenarios take about 2 seconds at most.

[48]:

scenarios = np.random.normal(0,1,(100,10000))
x_scenarios = timer(crossover)(scenarios , 10, 30, 50, state = x_history.state)

2021-03-06 23:56:10,252 - pyg - INFO - TIMER:'crossover' args:[["<class 'numpy.ndarray'>[100]", '10', '30', '50'], ["state=<class 'pyg.base._dict.Dict'>[3]"]] (1 runs) took 0:00:01.605710 sec

Using cells, our code looks like this, with live and historical codebase looking pretty similar

[49]:

x_history = cell(crossover_, a = history, fast = 10, slow = 30, vol = 50)()
x_live = cell(crossover, a = live, fast = 10, slow = 30, vol = 50, state = x_history)()
x_history

[49]:

cell
a:
    1993-10-20    0.463739
    1993-10-21    0.429161
    1993-10-22   -0.342095
    1993-10-23    1.192557
    1993-10-24   -0.448828
                    ...
    2020-11-22   -0.272184
    2020-11-23    0.121197
    2020-11-24   -0.581223
    2020-11-25   -0.682961
    2020-11-26   -1.084583
    Length: 9900, dtype: float64
fast:
    10
slow:
    30
vol:
    50
function:
    <function crossover_ at 0x000001CF9B58BA60>
instate:
    None
data:
    1993-10-20   -1.000000
    1993-10-21   -1.407264
    1993-10-22   -1.714259
    1993-10-23    1.177760
    1993-10-24   -1.220600
                    ...
    2020-11-22   -2.091785
    2020-11-23   -1.765958
    2020-11-24   -1.796933
    2020-11-25   -1.853106
    2020-11-26   -2.044795
    Length: 9900, dtype: float64
state:
    Dict
    fast:
        {'t': nan, 't0': 0.9999999999999994, 't1': -0.4251894284980144}
    slow:
        {'t': nan, 't0': 0.9999999999999983, 't1': -0.14408421908740027}
    vol:
        {'t': nan, 't0': 0.9999999999999972, 't2': 0.01889897942675779}

[50]:

pd.concat([pd.Series(x_live.data, pandas_x.index[-100:]), pandas_x.iloc[-100:]], axis = 1)

[50]:

	0	1
2020-11-27	-2.466036	-2.466036
2020-11-28	-1.899795	-1.899795
2020-11-29	-1.573653	-1.573653
2020-11-30	-1.473624	-1.473624
2020-12-01	-1.978180	-1.978180
...	...	...
2021-03-02	-1.767405	-1.767405
2021-03-03	-1.183420	-1.183420
2021-03-04	-1.764486	-1.764486
2021-03-05	-2.458497	-2.458497
2021-03-06	-2.242366	-2.242366

100 rows × 2 columns

pyg.timeseries¶

Agreement between pyg.timeseries and pandas¶

Quick performance comparison¶

pyg and numpy arrays¶

pandas treatment of nan¶

pyg.timeseries treatment of nans¶

Using pyg.timeseries to manage state¶

Example: creating a function exposing its state¶

pyg

Navigation

Related Topics