{
"cells": [
{
"cell_type": "markdown",
"id": "blocked-diamond",
"metadata": {},
"source": [
"# pyg.timeseries decorators\n",
"There are a few decorators that are relevant to timeseries analysis\n",
"## pd2np and compiled\n",
"We write most of our underlying functions assuming the function parameters are 1-d numpy arrays.\n",
"If you want them numba.jit compiled, please use the compiled operator."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "super-exclusion",
"metadata": {},
"outputs": [],
"source": [
"from pyg import *\n",
"import pandas as pd; import numpy as np\n",
"@pd2np\n",
"@compiled\n",
"def sumsq(a, total = 0.0):\n",
" res = np.empty_like(a)\n",
" for i in range(a.shape[0]):\n",
" if np.isnan(a[i]):\n",
" res[i] = np.nan\n",
" else:\n",
" total += a[i]**2\n",
" res[i] = total\n",
" return res\n"
]
},
{
"cell_type": "markdown",
"id": "mounted-university",
"metadata": {},
"source": [
"It is not surpising that sumsq works for arrays. Notice how np.isnan is handled to ensure nans are skipped."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "younger-washer",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 5, 14, 30])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(5)\n",
"sumsq(a)"
]
},
{
"cell_type": "markdown",
"id": "spare-sixth",
"metadata": {},
"source": [
"**pd2np** will convert a pandas Series to arrays, run the function and convert back to pandas. This will only work for a 1-dimensional objects, so no df nor 2-d np.ndarray."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "configured-medicaid",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2021-02-27 0\n",
"2021-02-28 1\n",
"2021-03-01 5\n",
"2021-03-02 14\n",
"2021-03-03 30\n",
"dtype: int32"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = pd.Series(a, drange(-4))\n",
"sumsq(s)"
]
},
{
"cell_type": "markdown",
"id": "surprising-beaver",
"metadata": {},
"source": [
"## loop\n",
"We decorate sumsq with the **loop** decorator. Once we introduce loop, The function will loop over columns of a DataFrame or a numpy array:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "biblical-thirty",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-02-27 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2021-02-28 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 2021-03-01 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 2021-03-02 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 2021-03-03 | \n",
" 4 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2021-02-27 0 1\n",
"2021-02-28 1 2\n",
"2021-03-01 2 3\n",
"2021-03-02 3 4\n",
"2021-03-03 4 5"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@loop(pd.DataFrame, dict, list, np.ndarray)\n",
"@pd2np\n",
"@compiled\n",
"def sumsq(a, total = 0):\n",
" res = np.empty_like(a)\n",
" for i in range(a.shape[0]):\n",
" if np.isnan(a[i]):\n",
" res[i] = np.nan\n",
" else:\n",
" total += a[i]**2\n",
" res[i] = total\n",
" return res\n",
"\n",
"df = pd.DataFrame(dict(a = a, b = a+1), drange(-4))\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "diverse-estate",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-02-27 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2021-02-28 | \n",
" 1 | \n",
" 5 | \n",
"
\n",
" \n",
" 2021-03-01 | \n",
" 5 | \n",
" 14 | \n",
"
\n",
" \n",
" 2021-03-02 | \n",
" 14 | \n",
" 30 | \n",
"
\n",
" \n",
" 2021-03-03 | \n",
" 30 | \n",
" 55 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2021-02-27 0 1\n",
"2021-02-28 1 5\n",
"2021-03-01 5 14\n",
"2021-03-02 14 30\n",
"2021-03-03 30 55"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sumsq(df)"
]
},
{
"cell_type": "markdown",
"id": "bronze-commodity",
"metadata": {},
"source": [
"Indeed, since we asked it to loop over dict, list and numpy array (2d)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "charged-british",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': array([ 0, 1, 5, 14, 30]), 'b': array([ 1, 5, 14, 30, 55])}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sumsq(dict(a = a, b = a+1))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "rational-algeria",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1],\n",
" [ 1, 5],\n",
" [ 5, 14],\n",
" [14, 30],\n",
" [30, 55]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sumsq(df.values)"
]
},
{
"cell_type": "markdown",
"id": "exact-agreement",
"metadata": {},
"source": [
"## presync: manage indexing and date stamps\n",
"Suppose the function takes two (or more) timeseries."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "amazing-perspective",
"metadata": {},
"outputs": [],
"source": [
"@presync(index = 'inner')\n",
"@loop(pd.DataFrame, np.ndarray)\n",
"@pd2np\n",
"def product(a, b):\n",
" return a * b"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "cognitive-reliance",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 4, 9, 16])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(5); b = np.arange(5)\n",
"product(a,b)"
]
},
{
"cell_type": "markdown",
"id": "stretch-cleanup",
"metadata": {},
"source": [
"What happens when the weights and the timeseries are unsynchronized?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "distributed-cache",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-02-27 | \n",
" 0.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2021-02-28 | \n",
" 1.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2021-03-01 | \n",
" 2.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 2021-03-02 | \n",
" 3.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 2021-03-03 | \n",
" 4.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 2021-03-04 | \n",
" NaN | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2021-02-27 0.0 NaN\n",
"2021-02-28 1.0 0.0\n",
"2021-03-01 2.0 1.0\n",
"2021-03-02 3.0 2.0\n",
"2021-03-03 4.0 3.0\n",
"2021-03-04 NaN 4.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_ = pd.Series(a, drange(-4)) ; a_.name = 'a'\n",
"b_ = pd.Series(b, drange(-3,1)); b_.name = 'b'\n",
"pd.concat([a_, b_], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "through-rochester",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2021-02-28 0\n",
"2021-03-01 2\n",
"2021-03-02 6\n",
"2021-03-03 12\n",
"Freq: D, dtype: int32"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(a_, b_) ## just the inner values"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "natural-trail",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2021-02-27 NaN\n",
"2021-02-28 0.0\n",
"2021-03-01 2.0\n",
"2021-03-02 6.0\n",
"2021-03-03 12.0\n",
"2021-03-04 NaN\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product.oj(a_, b_) ## outer join"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "weekly-pulse",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2021-02-27 NaN\n",
"2021-02-28 0.0\n",
"2021-03-01 2.0\n",
"2021-03-02 6.0\n",
"2021-03-03 12.0\n",
"2021-03-04 16.0\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product.oj.ffill(a_, b_) ## outer join and forward-fill"
]
},
{
"cell_type": "markdown",
"id": "alike-electron",
"metadata": {},
"source": [
"### presync and numpy arrays\n",
"When we deal with thousands of equities, one way of speeding calculations is by stacking them all onto huge dataframes. \n",
"This does work but one is always busy fiddling with 'the universe' one is trading. We took a slightly different approach: "
]
},
{
"cell_type": "markdown",
"id": "secondary-pitch",
"metadata": {},
"source": [
"- We define a global timestamp.\n",
"- We then sample each timeseries to that global timestamp, dropping the early history where the data is all nan. (df_fillna(ts, index, method = 'fnna')).\n",
"- We then do our research on these numpy arrays.\n",
"- Finally, once we are done, we resample back to the global timestamp.\n"
]
},
{
"cell_type": "markdown",
"id": "juvenile-cleanup",
"metadata": {},
"source": [
"While we are in numpy arrays, we can 'inner join' by recognising the 'end' of each array shares the same date.\n",
"Indeed df_index, df_reindex and presync all work seemlessly on np.ndarray as well as DataFrames, under that assumption that **the end of all arrays are in sync**.\n",
"\n",
"We find this approach saves on memory and on computation time. It also lends itself to being able to retrieve and create specific universes for specific trading ideas.\n",
"It is not without its own issues but that is a separate discussion."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "returning-light",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([0, 1, 2, 3, 4]), array([1, 2, 3, 4]))"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(5); b = np.arange(1,5)\n",
"a, b"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "reflected-hormone",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 4, 9, 16])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(a, b)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "modified-polyester",
"metadata": {},
"outputs": [],
"source": [
"us = calendar('US')\n",
"dates = pd.Index(us.drange('-40y', 0 ,'1b'))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "accessible-pierre",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dictable[3 x 3]\n",
"stock|n |ts \n",
"msft |10000|1982-11-03 -1.309868\n",
" | |1982-11-04 -0.737816\n",
" | |1982-11-05 0.460173\n",
" | |1982-11-08 -0.895898\n",
" | |1982-11-09 -0.813305\n",
"appl |8000 |1990-07-04 0.040855\n",
" | |1990-07-05 -1.327995\n",
" | |1990-07-06 0.114328\n",
" | |1990-07-09 -1.626176\n",
" | |1990-07-10 -0.031428\n",
"tsla |7000 |1994-05-04 -1.259911\n",
" | |1994-05-05 1.014304\n",
" | |1994-05-09 -0.035104\n",
" | |1994-05-10 -1.265964\n",
" | |1994-05-11 -0.001664"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"universe = dictable(stock = ['msft', 'appl', 'tsla'], n = [10000, 8000, 7000])\n",
"universe = universe(ts = lambda n: pd.Series(np.random.normal(0,1,n+1), us.drange('-%ib'%n, 0, '1b'))[np.random.normal(0,1,n+1)>-1])\n",
"universe"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "continued-render",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dictable[3 x 6]\n",
"stock|n |ts |rtn |price |vol \n",
"msft |10000|1982-11-03 -1.309868|[-1.3098679 -0.73781612 0.4601727 ... -0.327291|[-1.3098679 -2.04768402 -1.58751132 ... 4.750977|[ nan nan nan ... 1.02923517 1\n",
" | |1982-11-04 -0.737816| 0.67289106] | 5.89220017] | \n",
" | |1982-11-05 0.460173| | | \n",
" | |1982-11-08 -0.895898| | | \n",
" | |1982-11-09 -0.813305| | | \n",
"appl |8000 |1990-07-04 0.040855|[ 0.04085499 -1.32799499 0.11432766 ... -1.017795|[ 4.08549924e-02 -1.28714000e+00 -1.17281234e+00 .|[ nan nan nan ... 0.88535052 0\n",
" | |1990-07-05 -1.327995| -0.82540937] | 7.67908570e+01 7.59654476e+01] | \n",
" | |1990-07-06 0.114328| | | \n",
" | |1990-07-09 -1.626176| | | \n",
" | |1990-07-10 -0.031428| | | \n",
"tsla |7000 |1994-05-04 -1.259911|[-1.25991126 1.01430418 -0.0351036 ... -0.174814|[-1.25991126 -0.24560708 -0.28071068 ... 24.331768|[ nan nan nan ... 0.94944115 0\n",
" | |1994-05-05 1.014304| -0.69279468] | 23.93415613] | \n",
" | |1994-05-09 -0.035104| | | \n",
" | |1994-05-10 -1.265964| | | \n",
" | |1994-05-11 -0.001664| | | "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"universe = universe(rtn = lambda ts: ts.values)\n",
"universe = universe(price = lambda rtn : cumsum(rtn))\n",
"universe = universe(vol = lambda rtn: ewmstd(rtn, 30))\n",
"universe"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "dietary-handbook",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1.01584217, 0.95105069, nan],\n",
" [1.02939552, 0.99139701, nan],\n",
" [1.01323584, 0.97982437, nan],\n",
" ...,\n",
" [1.02923517, 0.88535052, 0.94944115],\n",
" [1.018515 , 0.91252795, 0.93434464],\n",
" [1.01216505, 0.91053212, 0.93155713]])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"presync(lambda tss: np.array(tss).T)(universe.vol)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "improved-deputy",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dictable[3 x 6]\n",
"stock|n |ts |rtn |price |vol \n",
"msft |10000|1982-11-03 -1.309868|1988-11-21 -1.309868|1988-11-21 -1.309868 |1988-11-21 NaN\n",
" | |1982-11-04 -0.737816|1988-11-22 -0.737816|1988-11-22 -2.047684 |1988-11-22 NaN\n",
" | |1982-11-05 0.460173|1988-11-23 0.460173|1988-11-23 -1.587511 |1988-11-23 NaN\n",
" | |1982-11-08 -0.895898|1988-11-24 -0.895898|1988-11-24 -2.483409 |1988-11-24 NaN\n",
" | |1982-11-09 -0.813305|1988-11-25 -0.813305|1988-11-25 -3.296714 |1988-11-25 NaN\n",
"appl |8000 |1990-07-04 0.040855|1995-04-20 0.040855|1995-04-20 0.040855|1995-04-20 NaN\n",
" | |1990-07-05 -1.327995|1995-04-21 -1.327995|1995-04-21 -1.287140|1995-04-21 NaN\n",
" | |1990-07-06 0.114328|1995-04-24 0.114328|1995-04-24 -1.172812|1995-04-24 NaN\n",
" | |1990-07-09 -1.626176|1995-04-25 -1.626176|1995-04-25 -2.798988|1995-04-25 NaN\n",
" | |1990-07-10 -0.031428|1995-04-26 -0.031428|1995-04-26 -2.830417|1995-04-26 NaN\n",
"tsla |7000 |1994-05-04 -1.259911|1998-09-11 -1.259911|1998-09-11 -1.259911|1998-09-11 NaN\n",
" | |1994-05-05 1.014304|1998-09-14 1.014304|1998-09-14 -0.245607|1998-09-14 NaN\n",
" | |1994-05-09 -0.035104|1998-09-15 -0.035104|1998-09-15 -0.280711|1998-09-15 NaN\n",
" | |1994-05-10 -1.265964|1998-09-16 -1.265964|1998-09-16 -1.546674|1998-09-16 NaN\n",
" | |1994-05-11 -0.001664|1998-09-17 -0.001664|1998-09-17 -1.548338|1998-09-17 NaN"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"universe = universe.do(lambda value: np_reindex(value, dates), 'rtn', 'price', 'vol')\n",
"universe"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "instant-fossil",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" msft | \n",
" appl | \n",
" tsla | \n",
"
\n",
" \n",
" \n",
" \n",
" 1988-11-21 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1988-11-22 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1988-11-23 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1988-11-24 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1988-11-25 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2021-02-25 | \n",
" 1.063016 | \n",
" 0.890791 | \n",
" 0.931185 | \n",
"
\n",
" \n",
" 2021-02-26 | \n",
" 1.045735 | \n",
" 0.880376 | \n",
" 0.963182 | \n",
"
\n",
" \n",
" 2021-03-01 | \n",
" 1.029235 | \n",
" 0.885351 | \n",
" 0.949441 | \n",
"
\n",
" \n",
" 2021-03-02 | \n",
" 1.018515 | \n",
" 0.912528 | \n",
" 0.934345 | \n",
"
\n",
" \n",
" 2021-03-03 | \n",
" 1.012165 | \n",
" 0.910532 | \n",
" 0.931557 | \n",
"
\n",
" \n",
"
\n",
"
8423 rows × 3 columns
\n",
"
"
],
"text/plain": [
" msft appl tsla\n",
"1988-11-21 NaN NaN NaN\n",
"1988-11-22 NaN NaN NaN\n",
"1988-11-23 NaN NaN NaN\n",
"1988-11-24 NaN NaN NaN\n",
"1988-11-25 NaN NaN NaN\n",
"... ... ... ...\n",
"2021-02-25 1.063016 0.890791 0.931185\n",
"2021-02-26 1.045735 0.880376 0.963182\n",
"2021-03-01 1.029235 0.885351 0.949441\n",
"2021-03-02 1.018515 0.912528 0.934345\n",
"2021-03-03 1.012165 0.910532 0.931557\n",
"\n",
"[8423 rows x 3 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vol = pd.concat(universe.vol, axis = 1); vol.columns = universe.stock\n",
"vol"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}