{ "cells": [ { "cell_type": "markdown", "id": "touched-slide", "metadata": {}, "source": [ "# pyg.base.join\n", "Only read this if you are a seasoned dictable user. In data science, we usually have data in multiple tables and we want to pull specific columns together for an analysis.\n", "We will first look at **join** function and then examine the **perdictable** decorator." ] }, { "cell_type": "markdown", "id": "anonymous-modification", "metadata": {}, "source": [ "## Join\n", "\n", "### Example: Using join function to transfer money to a bank\n", "We begin by setting up a mini database:" ] }, { "cell_type": "code", "execution_count": 22, "id": "tribal-planet", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Customers\n", " customer|address |bank \n", "alan |1 Abba Avenue |allied \n", "barbara |2 Beatles Lane|barclays\n", "charles |3 Corrs Close |chase \n", "\n", "Products\n", " product|price|supplier \n", "apple |1 |grove limited \n", "banana |2 |go banabas \n", "cherry |3 |cherry pickers \n", "\n", "Customer_products\n", " customer|product|amount|purchase_date \n", "alan |apple |1 |2021-02-23 00:00:00\n", "alan |banana |2 |2021-02-24 00:00:00\n", "charles |cherry |3 |2021-02-25 00:00:00\n", "charles |apple |4 |2021-02-26 00:00:00 \n", "\n", "Banks\n", " bank |account\n", "allied |5556 \n", "barclays|2461 \n" ] } ], "source": [ "from pyg import *\n", "customers = dictable(customer = ['alan', 'barbara', 'charles'], address = ['1 Abba Avenue', '2 Beatles Lane', '3 Corrs Close'], bank = ['allied', 'barclays', 'chase'])\n", "products = dictable(product = ['apple', 'banana', 'cherry'], price = [1,2,3], supplier = ['grove limited', 'go banabas', 'cherry pickers'])\n", "customer_products = dictable(customer = ['alan', 'alan', 'charles', 'charles'], product = ['apple', 'banana', 'cherry', 'apple'], amount = [1,2,3,4], purchase_date = drange(-2,1))\n", "banks = dictable(bank = ['allied', 'barclays'], account = [5556, 2461])\n", "\n", "print('Customers\\n', customers, '\\n\\nProducts\\n', products, '\\n\\nCustomer_products\\n', customer_products, '\\n\\nBanks\\n', banks)" ] }, { "cell_type": "markdown", "id": "persistent-springfield", "metadata": {}, "source": [ "### Simple join: inner join between tables\n", "Suppose we want to know how much money is to be transferred from each bank. \n", "- We only care about the fields 'bank', 'amount' and 'price' \n", "- each field is pulled from different tables, \n", "- need to specify customer & product as the keys we will join on:" ] }, { "cell_type": "code", "execution_count": 23, "id": "weird-typing", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 5]\n", "product|customer|bank |amount|price\n", "apple |alan |allied|1 |1 \n", "banana |alan |allied|2 |2 \n", "apple |charles |chase |4 |1 \n", "cherry |charles |chase |3 |3 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "join(dict(bank = customers, amount = customer_products, price = products), on = ['customer', 'product'])" ] }, { "cell_type": "markdown", "id": "fiscal-luxury", "metadata": {}, "source": [ "### Defaults for fields we want to left-join on...\n", "The function we need to run to transfer money looks like this, so actually, we would like to have account details too." ] }, { "cell_type": "code", "execution_count": 24, "id": "noticed-convenience", "metadata": {}, "outputs": [], "source": [ "def transfer_money(bank, amount, price, account = 'default'):\n", " ## if account == 'default' transfer money slowly, else transfer quickly\n", " ## return\n", " pass" ] }, { "cell_type": "markdown", "id": "extra-solid", "metadata": {}, "source": [ "We can grab the account details from the 'banks' table: " ] }, { "cell_type": "code", "execution_count": 25, "id": "guilty-nitrogen", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 6]\n", "bank |product|customer|amount|price|account\n", "allied|apple |alan |1 |1 |5556 \n", "allied|banana |alan |2 |2 |5556 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "join(dict(bank = customers, amount = customer_products, price = products, account = banks), on = ['customer', 'product', 'bank'])" ] }, { "cell_type": "markdown", "id": "cloudy-jordan", "metadata": {}, "source": [ "but we just **lost** Chase transactions as we dont have its account details. However, money is transfered perfectly (albeit slowly) even without account id. So instead.... " ] }, { "cell_type": "code", "execution_count": 26, "id": "consolidated-surfing", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 6]\n", "account|amount|bank |customer|price|product\n", "5556 |1 |allied|alan |1 |apple \n", "5556 |2 |allied|alan |2 |banana \n", "default|4 |chase |charles |1 |apple \n", "default|3 |chase |charles |3 |cherry " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "join(dict(bank = customers, amount = customer_products, price = products, account = banks), \n", " on = ['customer', 'product', 'bank'], \n", " defaults = dict(account = 'default'))" ] }, { "cell_type": "markdown", "id": "hidden-survey", "metadata": {}, "source": [ "### Renaming & calculating fields\n", "We also want to ensure we don't transfer money that we already transferred... so we need to grab an expiry column based on purchase_date in customer_product table\n" ] }, { "cell_type": "code", "execution_count": 27, "id": "superior-durham", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 7]\n", "account|amount|bank |customer|expiry |price|product\n", "5556 |1 |allied|alan |2021-02-24 00:00:00|1 |apple \n", "5556 |2 |allied|alan |2021-02-25 00:00:00|2 |banana \n", "default|4 |chase |charles |2021-03-01 00:00:00|1 |apple \n", "default|3 |chase |charles |2021-02-26 00:00:00|3 |cherry " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "join(dict(bank = customers, amount = customer_products, price = products, account = banks, expiry = customer_products), \n", " on = ['customer', 'product', 'bank'], \n", " renames = dict(expiry = lambda purchase_date: dt(purchase_date, '1b')), ## it takes 1 business day to transfer money \n", " defaults = dict(account = 'default'))" ] }, { "cell_type": "markdown", "id": "joint-jamaica", "metadata": {}, "source": [ "## Perdictable\n", "perdictable takes the same operation one steps further and actually runs the function. We also use the function signature to determine the defaults parameter.\n", "Here is another example: \n", "### Example: Oil prices\n", "In Finance, there are contracts called Futures, each Future contract has an expiry. E.g. Futures contracts for Oil are contracts agreeing the delivery of oil to a particular place in a particular month. Once that month is gone, that contract is no longer traded and the oil needs to be delivered.\n" ] }, { "cell_type": "code", "execution_count": 28, "id": "rocky-block", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 3]\n", "m |y |ticker \n", "3 |2020|OIL_2020_03\n", "6 |2020|OIL_2020_06\n", "9 |2020|OIL_2020_09\n", "...8 rows...\n", "6 |2021|OIL_2021_06\n", "9 |2021|OIL_2021_09\n", "12|2021|OIL_2021_12" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyg import *\n", "oil = dictable(y = dt().year-1, m = range(3, 13, 3)) + dictable(y = dt().year, m = range(3, 13, 3))\n", "oil = oil(ticker = lambda y, m: 'OIL_%i_%s'%(y, m if m>9 else '0%i'%m))\n", "oil" ] }, { "cell_type": "markdown", "id": "cultural-cannon", "metadata": {}, "source": [ "y,m and ticker will form our primary keys" ] }, { "cell_type": "code", "execution_count": 29, "id": "assisted-regression", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 4]\n", "y |m |ticker |data \n", "2020|3 |OIL_2020_03|2020-04-01 00:00:00\n", "2020|6 |OIL_2020_06|2020-07-01 00:00:00\n", "2020|9 |OIL_2020_09|2020-10-01 00:00:00\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-07-01 00:00:00\n", "2021|9 |OIL_2021_09|2021-10-01 00:00:00\n", "2021|12|OIL_2021_12|2022-01-01 00:00:00" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pk = ['y', 'm', 'ticker']\n", "expiry = perdictable(lambda y, m: dt(y,m+1,1), on = pk)(y = oil, m = oil)\n", "expiry" ] }, { "cell_type": "code", "execution_count": 30, "id": "split-auditor", "metadata": {}, "outputs": [], "source": [ "def fake_ts(ticker, expiry):\n", " return 500 + pd.Series(np.random.normal(0,1,100), drange(dt_bump(expiry,-99), expiry)).cumsum()" ] }, { "cell_type": "markdown", "id": "second-kitty", "metadata": {}, "source": [ "To add a price for each of the futures, we first wrap fake_ts and then run it:" ] }, { "cell_type": "code", "execution_count": 31, "id": "public-expense", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 4]\n", "y |m |ticker |data \n", "2020|3 |OIL_2020_03|2019-12-24 499.000139\n", " | | |2019-12-25 500.904180\n", " | | |2019-12-26 501.792007\n", " | | |2019-12-27 502.410313\n", " | | |2019-12-28 502.843697\n", "2020|6 |OIL_2020_06|2020-03-24 500.575052\n", " | | |2020-03-25 499.504860\n", " | | |2020-03-26 500.558506\n", " | | |2020-03-27 500.599754\n", " | | |2020-03-28 500.704313\n", "2020|9 |OIL_2020_09|2020-06-24 500.333677\n", " | | |2020-06-25 500.974220\n", " | | |2020-06-26 499.882500\n", " | | |2020-06-27 500.342359\n", " | | |2020-06-28 501.423622\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-03-24 501.437903\n", " | | |2021-03-25 500.808820\n", " | | |2021-03-26 499.478861\n", " | | |2021-03-27 499.203311\n", " | | |2021-03-28 498.270609\n", "2021|9 |OIL_2021_09|2021-06-24 499.950777\n", " | | |2021-06-25 500.458993\n", " | | |2021-06-26 498.564582\n", " | | |2021-06-27 497.988147\n", " | | |2021-06-28 498.193692\n", "2021|12|OIL_2021_12|2021-09-24 500.320503\n", " | | |2021-09-25 499.915132\n", " | | |2021-09-26 498.200049\n", " | | |2021-09-27 497.805575\n", " | | |2021-09-28 497.879972" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "price = perdictable(fake_ts, on = pk)(ticker = oil, expiry = expiry) \n", "price" ] }, { "cell_type": "markdown", "id": "sensitive-operator", "metadata": {}, "source": [ "We have wrapped a function so that we get a price for **each** of these contracts.
\n", "This allows us to move from operating on single timeseries, to run it on multiple rows from multiple tables " ] }, { "cell_type": "code", "execution_count": 32, "id": "advised-exposure", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 4]\n", "y |m |ticker |data \n", "2020|3 |OIL_2020_03|2019-12-24 NaN\n", " | | |2019-12-25 0.003816\n", " | | |2019-12-26 0.001772\n", " | | |2019-12-27 0.001232\n", " | | |2019-12-28 0.000863\n", "2020|6 |OIL_2020_06|2020-03-24 NaN\n", " | | |2020-03-25 -0.002138\n", " | | |2020-03-26 0.002109\n", " | | |2020-03-27 0.000082\n", " | | |2020-03-28 0.000209\n", "2020|9 |OIL_2020_09|2020-06-24 NaN\n", " | | |2020-06-25 0.001280\n", " | | |2020-06-26 -0.002179\n", " | | |2020-06-27 0.000920\n", " | | |2020-06-28 0.002161\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-03-24 NaN\n", " | | |2021-03-25 -0.001255\n", " | | |2021-03-26 -0.002656\n", " | | |2021-03-27 -0.000552\n", " | | |2021-03-28 -0.001868\n", "2021|9 |OIL_2021_09|2021-06-24 NaN\n", " | | |2021-06-25 0.001017\n", " | | |2021-06-26 -0.003785\n", " | | |2021-06-27 -0.001156\n", " | | |2021-06-28 0.000413\n", "2021|12|OIL_2021_12|2021-09-24 NaN\n", " | | |2021-09-25 -0.000810\n", " | | |2021-09-26 -0.003431\n", " | | |2021-09-27 -0.000792\n", " | | |2021-09-28 0.000149" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rtn = perdictable(diff, on = pk)(a = price, expiry = expiry)\n", "yesterday_price = perdictable(shift, on = pk)(a = price, expiry = expiry)\n", "percentage_return = perdictable(div_, on = pk)(a = rtn, b = yesterday_price, expiry = expiry)\n", "percentage_return" ] }, { "cell_type": "markdown", "id": "smooth-sponsorship", "metadata": {}, "source": [ "### perdictable and caching \n", "This is nice but (a) what have we gained? and (b) why do we keep using expiry as a variable? \n", "The answer is to do with caching actually.
If we rerun prices, we should get brand new data, since fake_ts just generates random prices...\n", "
perdictable identifies rows that have been run and are now 'expired' It uses provided old data and does not recalculate.
If either expiry or old values are not provided then it calculates everything." ] }, { "cell_type": "code", "execution_count": 37, "id": "theoretical-subject", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 5]\n", "y |m |ticker |new |old \n", "2020|3 |OIL_2020_03|2019-12-24 499.000139|2019-12-24 499.000139\n", " | | |2019-12-25 500.904180|2019-12-25 500.904180\n", " | | |2019-12-26 501.792007|2019-12-26 501.792007\n", " | | |2019-12-27 502.410313|2019-12-27 502.410313\n", " | | |2019-12-28 502.843697|2019-12-28 502.843697\n", "2020|6 |OIL_2020_06|2020-03-24 500.575052|2020-03-24 500.575052\n", " | | |2020-03-25 499.504860|2020-03-25 499.504860\n", " | | |2020-03-26 500.558506|2020-03-26 500.558506\n", " | | |2020-03-27 500.599754|2020-03-27 500.599754\n", " | | |2020-03-28 500.704313|2020-03-28 500.704313\n", "2020|9 |OIL_2020_09|2020-06-24 500.333677|2020-06-24 500.333677\n", " | | |2020-06-25 500.974220|2020-06-25 500.974220\n", " | | |2020-06-26 499.882500|2020-06-26 499.882500\n", " | | |2020-06-27 500.342359|2020-06-27 500.342359\n", " | | |2020-06-28 501.423622|2020-06-28 501.423622\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-03-24 500.429724|2021-03-24 501.437903\n", " | | |2021-03-25 501.537890|2021-03-25 500.808820\n", " | | |2021-03-26 501.167511|2021-03-26 499.478861\n", " | | |2021-03-27 502.611689|2021-03-27 499.203311\n", " | | |2021-03-28 501.820261|2021-03-28 498.270609\n", "2021|9 |OIL_2021_09|2021-06-24 499.911914|2021-06-24 499.950777\n", " | | |2021-06-25 497.451472|2021-06-25 500.458993\n", " | | |2021-06-26 498.190816|2021-06-26 498.564582\n", " | | |2021-06-27 498.015362|2021-06-27 497.988147\n", " | | |2021-06-28 497.224958|2021-06-28 498.193692\n", "2021|12|OIL_2021_12|2021-09-24 498.129511|2021-09-24 500.320503\n", " | | |2021-09-25 498.739546|2021-09-25 499.915132\n", " | | |2021-09-26 499.321094|2021-09-26 498.200049\n", " | | |2021-09-27 498.491587|2021-09-27 497.805575\n", " | | |2021-09-28 497.057529|2021-09-28 497.879972" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_price = perdictable(fake_ts, on = pk)(ticker = oil, data = price, expiry = expiry)\n", "(new_price.relabel(data = 'new') * price.relabel(data = 'old')).sort('y', 'm')" ] }, { "cell_type": "markdown", "id": "final-halloween", "metadata": {}, "source": [ "### perdictable with the cell framework\n", "We can run the function and use a cell to store the output..." ] }, { "cell_type": "code", "execution_count": 40, "id": "medieval-teacher", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 4]\n", "y |m |ticker |data \n", "2020|3 |OIL_2020_03|2019-12-24 501.331417\n", " | | |2019-12-25 500.332873\n", " | | |2019-12-26 500.160526\n", " | | |2019-12-27 496.688779\n", " | | |2019-12-28 497.774215\n", "2020|6 |OIL_2020_06|2020-03-24 500.899756\n", " | | |2020-03-25 500.830490\n", " | | |2020-03-26 501.829020\n", " | | |2020-03-27 501.875464\n", " | | |2020-03-28 503.241949\n", "2020|9 |OIL_2020_09|2020-06-24 500.395880\n", " | | |2020-06-25 500.311780\n", " | | |2020-06-26 499.817331\n", " | | |2020-06-27 499.780468\n", " | | |2020-06-28 497.550235\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-03-24 501.291426\n", " | | |2021-03-25 499.592175\n", " | | |2021-03-26 499.104934\n", " | | |2021-03-27 497.698320\n", " | | |2021-03-28 497.868177\n", "2021|9 |OIL_2021_09|2021-06-24 499.978264\n", " | | |2021-06-25 500.784927\n", " | | |2021-06-26 501.212177\n", " | | |2021-06-27 501.852472\n", " | | |2021-06-28 502.035097\n", "2021|12|OIL_2021_12|2021-09-24 500.282482\n", " | | |2021-09-25 501.077309\n", " | | |2021-09-26 501.005312\n", " | | |2021-09-27 501.173168\n", " | | |2021-09-28 502.098126" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = cell(perdictable(fake_ts, on = pk), ticker = oil, expiry = expiry)()\n", "c.data" ] }, { "cell_type": "code", "execution_count": 43, "id": "abstract-lincoln", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[8 x 4]\n", "y |m |ticker |data \n", "2020|3 |OIL_2020_03|2019-12-24 501.331417\n", " | | |2019-12-25 500.332873\n", " | | |2019-12-26 500.160526\n", " | | |2019-12-27 496.688779\n", " | | |2019-12-28 497.774215\n", "2020|6 |OIL_2020_06|2020-03-24 500.899756\n", " | | |2020-03-25 500.830490\n", " | | |2020-03-26 501.829020\n", " | | |2020-03-27 501.875464\n", " | | |2020-03-28 503.241949\n", "2020|9 |OIL_2020_09|2020-06-24 500.395880\n", " | | |2020-06-25 500.311780\n", " | | |2020-06-26 499.817331\n", " | | |2020-06-27 499.780468\n", " | | |2020-06-28 497.550235\n", "...8 rows...\n", "2021|6 |OIL_2021_06|2021-03-24 499.383712\n", " | | |2021-03-25 498.812289\n", " | | |2021-03-26 498.995159\n", " | | |2021-03-27 499.504985\n", " | | |2021-03-28 498.453581\n", "2021|9 |OIL_2021_09|2021-06-24 499.310726\n", " | | |2021-06-25 500.248325\n", " | | |2021-06-26 500.458067\n", " | | |2021-06-27 498.482094\n", " | | |2021-06-28 499.704684\n", "2021|12|OIL_2021_12|2021-09-24 501.359202\n", " | | |2021-09-25 501.178162\n", " | | |2021-09-26 501.846290\n", " | | |2021-09-27 502.217393\n", " | | |2021-09-28 500.996568" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recalculated_cell = c.go(1) ## force a recalculation\n", "recalculated_cell.data" ] }, { "cell_type": "markdown", "id": "matched-cooling", "metadata": {}, "source": [ "We observe that the cell, when recalculates, automatically caches the history and does not recalculate fake_ts.
\n", "This is not magic. When a cell calculates its function, it provides the function with the variables it needs.\n", "Once calculated, it stores the output in **data** and will be able to provide **data** to the function next time, allowing it to avoid re-running expired calculations. Then cell will store the functions's result back in the **data** key for later use and this is repeated.\n", "\n", "### perdictable API\n", "Parameters **on**, **renames** and **defaults** parameters determine the way the data is joined. If defaults is missing, the defaults from the function are used:\n" ] }, { "cell_type": "code", "execution_count": 45, "id": "billion-summary", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 2]\n", "product|data\n", "apple |2 \n", "banana |6 \n", "cherry |3 " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function = lambda price, quantity = 1: price * quantity\n", "price = dictable(product = ['apple', 'banana', 'cherry'], price = [1,2,3])\n", "quantity = dictable(product = ['apple', 'banana', 'damson'], quantity = [2,3,4])\n", "perdictable(function, on = 'product')(price = price, quantity = quantity) ## cherry should appear with default quantity" ] }, { "cell_type": "markdown", "id": "embedded-plaintiff", "metadata": {}, "source": [ "If you want to see the full calculations and inputs to the function set **include_inputs**=True:" ] }, { "cell_type": "code", "execution_count": 46, "id": "confident-fruit", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "price|product|quantity|expiry|data\n", "1 |apple |2 |None |2 \n", "2 |banana |3 |None |6 \n", "3 |cherry |1 |None |3 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perdictable(function, on = 'product', include_inputs = True)(price = price, quantity = quantity)" ] }, { "cell_type": "markdown", "id": "angry-inspection", "metadata": {}, "source": [ "If you want output column to be not data, use **col**:" ] }, { "cell_type": "code", "execution_count": 47, "id": "infinite-shoot", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "price|product|quantity|expiry|cost\n", "1 |apple |2 |None |2 \n", "2 |banana |3 |None |6 \n", "3 |cherry |1 |None |3 " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perdictable(function, on = 'product', include_inputs = True, col = 'cost')(price = price, quantity = quantity)" ] }, { "cell_type": "markdown", "id": "portable-somerset", "metadata": {}, "source": [ "The **if_none** parameter determines how data is calculated for rows that have expired but their data is None:" ] }, { "cell_type": "code", "execution_count": 52, "id": "conscious-grain", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "expiry |price|product|quantity|data \n", "2021-02-24 00:00:00|1 |apple |2 |None \n", "2021-02-25 00:00:00|2 |banana |3 |some value that will be kept\n", "2021-02-27 00:00:00|3 |cherry |1 |3 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "expiry = dictable(product = ['apple', 'banana', 'cherry'], expiry = [dt(-2), dt(-1), dt(1)])\n", "previous_data = dictable(product = ['apple', 'banana', 'cherry'], data = [None, 'some value that will be kept', 'this value will be recalculated'])\n", "perdictable(function, on = 'product', include_inputs = True, if_none = False)(price = price, quantity = quantity, expiry = expiry, data = previous_data)" ] }, { "cell_type": "code", "execution_count": 53, "id": "indonesian-today", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "expiry |price|product|quantity|data \n", "2021-02-24 00:00:00|1 |apple |2 |2 \n", "2021-02-25 00:00:00|2 |banana |3 |some value that will be kept\n", "2021-02-27 00:00:00|3 |cherry |1 |3 " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perdictable(function, on = 'product', include_inputs = True, if_none = True)(price = price, quantity = quantity, expiry = expiry, data = previous_data)" ] }, { "cell_type": "markdown", "id": "abandoned-scanner", "metadata": {}, "source": [ "Some function want to receive historic data and they use it themselves. Parameter **output_is_input** controls this.
\n", "For example: If your function is pulling historic prices from yahoo finance, you can use existing data to ask yahoo for only recent ones." ] }, { "cell_type": "code", "execution_count": 54, "id": "growing-wallpaper", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "price|product|quantity|expiry|data\n", "1 |apple |2 |None |12 \n", "2 |banana |3 |None |26 \n", "3 |cherry |1 |None |33 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def running_total_costs(price, quantity=1, data=0):\n", " return data + price * quantity\n", "\n", "previous_data = dictable(product = ['apple', 'banana', 'cherry', 'damson'], data = [10, 20, 30, 40])\n", "perdictable(running_total_costs, on = 'product', include_inputs = True)(price = price, quantity = quantity, data = previous_data)\n" ] }, { "cell_type": "code", "execution_count": 57, "id": "interesting-throat", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "price|product|quantity|expiry|data\n", "1 |apple |2 |None |2 \n", "2 |banana |3 |None |6 \n", "3 |cherry |1 |None |3 " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## if you don't want existing data to be presented to the function:\n", "perdictable(running_total_costs, on = 'product', include_inputs = True, output_is_input = False)(price = price, quantity = quantity, data = previous_data)" ] }, { "cell_type": "markdown", "id": "contrary-median", "metadata": {}, "source": [ "## Conclusions\n", "pyg.base.join allows us to create joined table with the variables we need. This is leveraged by perdictable so that the 'atomic' data we work with is not a single timeseries but a whole table of timeseries data indexed by some keys. We can use various perdictable parameters to control cache policy. All this is done with very little additional code, allowing us to manage quite a lot of data items with very little effort while managing caching expired items." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }