{ "cells": [ { "cell_type": "markdown", "id": "stupid-think", "metadata": {}, "source": [ "# pyg.base.dictable\n", "\n", "dictable is a table, a collection of iterable records. It is also a dict with each key's value being a column. \n", "Why not use a pandas.DataFrame? pd.DataFrame leads a dual life: " ] }, { "cell_type": "markdown", "id": "encouraging-driver", "metadata": {}, "source": [ "* by day an index-based optimized numpy array supporting e.g. timeseries analytics etc.\n", "* by night, a table with keys supporting filtering, aggregating, pivoting on keys as well as inner/outer joining on keys." ] }, { "cell_type": "markdown", "id": "architectural-cursor", "metadata": {}, "source": [ "As a result, the pandas interface is somewhat cumbersome. Further, the DataFrame isn't really designed for containing more complicated objects within it.\n", "
Conversely, dictable only tries to do the latter and is designed precisely for holding entire research process in one place. You can think of dictable as 'one level up' on a DataFrame: a dictable will handle thousands of data frames within it with ease.\n", "Indeed, dictable should be thought of as an 'organiser of research flow' rather than as an array of primitives.\n", "
In general, each row will contain some keys indexing the experiment, while some keys will contain complicated objects: a pd.DataFrame, a timeseries, yield_curves, machine-learning experiments etc.\n", "
The interface is succinct and extremely intuitive, allowing the user to concentrate on logic of the calculations rather than boilerplate.\n", "\n", "## Motivation: dictable as an organiser of research flow\n", "We start with a simple motivating example. Here is a typical workflow:" ] }, { "cell_type": "code", "execution_count": 3, "id": "roman-congo", "metadata": {}, "outputs": [], "source": [ "from pyg import *; import pandas as pd; import numpy as np\n", "import yfinance as yf" ] }, { "cell_type": "code", "execution_count": 4, "id": "middle-recorder", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "\n", "1 Failed download:\n", "- BAD_SYMBOL: No data found, symbol may be delisted\n", "[*********************100%***********************] 1 of 1 completed\n" ] } ], "source": [ "symbols = ['MSFT', 'WMT', 'TSLA', 'AAPL', 'BAD_SYMBOL', 'C']\n", "history = [yf.download(symbol) for symbol in symbols]\n", "prices = [h['Adj Close'] for h in history]\n", "rtns = [p.diff() for p in prices]\n", "vols = [r.ewm(30).std() for r in rtns]\n", "zscores = [r/v for r,v in zip(rtns, vols)]\n", "zavgs = [z.mean() for z in zscores]" ] }, { "cell_type": "code", "execution_count": 5, "id": "speaking-matrix", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.06238896915574035,\n", " 0.045634555332148996,\n", " 0.0676156301672513,\n", " 0.053189575669227614,\n", " nan,\n", " 0.027297252361386543]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zavgs" ] }, { "cell_type": "markdown", "id": "universal-trace", "metadata": {}, "source": [ "At this point we ask ourselves: Why do we have a **nan**? Which ticker was it, and when did it go wrong? " ] }, { "cell_type": "code", "execution_count": 6, "id": "circular-outline", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['BAD_SYMBOL']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_symbols = [s for s, z in zip(symbols, zavgs) if np.isnan(z)]; bad_symbols" ] }, { "cell_type": "markdown", "id": "pending-trailer", "metadata": {}, "source": [ "Great, how do we remove bad symbols from all our other variables?" ] }, { "cell_type": "code", "execution_count": 7, "id": "thick-research", "metadata": {}, "outputs": [], "source": [ "vols = [v for s, v in zip(symbols, vols) if s not in bad_symbols]" ] }, { "cell_type": "markdown", "id": "strange-fence", "metadata": {}, "source": [ "Now we can calculate some stuff with rtns and vols perhaps?" ] }, { "cell_type": "code", "execution_count": 8, "id": "great-colony", "metadata": {}, "outputs": [], "source": [ "ewmas = [r.ewm(n).mean()/v for r,v in zip(rtns, vols) for n in [10, 20, 30]]" ] }, { "cell_type": "markdown", "id": "accepted-placement", "metadata": {}, "source": [ "Things went wrong and went wrong silently too: " ] }, { "cell_type": "markdown", "id": "greek-wildlife", "metadata": {}, "source": [ "* We forgot to remove bad data from rtns as well as from vols so our zip function is zipping the wrong stocks together\n", "* It is nearly impossible to discover what item in the list belong to what n and what stock" ] }, { "cell_type": "markdown", "id": "systematic-bruce", "metadata": {}, "source": [ "If you ever dealt with real data, the mess described above must be familiar." ] }, { "cell_type": "markdown", "id": "severe-projector", "metadata": {}, "source": [ "## Same code, in dictable" ] }, { "cell_type": "code", "execution_count": 9, "id": "breathing-colony", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n", "\n", "1 Failed download:\n", "- BAD_SYMBOL: No data found, symbol may be delisted\n", "[*********************100%***********************] 1 of 1 completed\n" ] } ], "source": [ "from pyg import *\n", "import yfinance as yf\n", "s = dictable(symbol = ['MSFT', 'WMT', 'TSLA', 'AAPL', 'BAD_SYMBOL', 'C'])\n", "s = s(history = lambda symbol: yf.download(symbol))\n", "s = s(price = lambda history: history['Adj Close'])\n", "s = s(rtn = lambda price: price.diff())\n", "s = s(vol = lambda rtn: rtn.ewm(30).std())\n", "s = s(zscore = lambda rtn, vol: rtn/vol)\n", "s = s(zavg = lambda zscore: zscore.mean())" ] }, { "cell_type": "markdown", "id": "sexual-period", "metadata": {}, "source": [ "dictable __s__ contains all our data. \n" ] }, { "cell_type": "markdown", "id": "interesting-hindu", "metadata": {}, "source": [ "* each row contains all the variables associated with a specific symbol\n", "* each column corresponds to a variable\n", "* adding a new variable is declarative and free of boiler-plate loop and zip" ] }, { "cell_type": "code", "execution_count": 10, "id": "likely-anniversary", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[6 x 4]\n", "symbol |history |vol |zavg \n", "MSFT | Open High Low |Date |0.06238896915574035 \n", " |Date |1986-03-13 NaN | \n", " |1986-03-13 0.088542 0.101563 0.088542 |1986-03-14 NaN | \n", " |1986-03-14 0.097222 0.102431 0.097222 |1986-03-17 0.000779 | \n", " |1986-03-17 0.100694 0.103299 0.100694 |1986-03-18 0.001997 | \n", "WMT | Open High Low |Date |0.045634555332148996\n", " |Date |1972-08-25 NaN | \n", " |1972-08-25 0.063477 0.064697 0.063477 |1972-08-28 NaN | \n", " |1972-08-28 0.064453 0.064941 0.064209 |1972-08-29 0.000198 | \n", " |1972-08-29 0.063965 0.063965 0.063477 |1972-08-30 0.000215 | \n", "TSLA | Open High Low |Date |0.0676156301672513 \n", " |Date |2010-06-29 NaN | \n", " |2010-06-29 3.800000 5.000000 3.508000 |2010-06-30 NaN | \n", " |2010-06-30 5.158000 6.084000 4.660000 |2010-07-01 0.255972 | \n", " |2010-07-01 5.000000 5.184000 4.054000 |2010-07-02 0.274164 | \n", "AAPL | Open High Low |Date |0.053189575669227614\n", " |Date |1980-12-12 NaN | \n", " |1980-12-12 0.128348 0.128906 0.128348 |1980-12-15 NaN | \n", " |1980-12-15 0.122210 0.122210 0.121652 |1980-12-16 0.001242 | \n", " |1980-12-16 0.113281 0.113281 0.112723 |1980-12-17 0.004938 | \n", "BAD_SYMBOL|Empty DataFrame |Series([], Name: Adj Close, dtype: float64)|nan \n", " |Columns: [Open, High, Low, Close, Adj Close, Volum| | \n", " |Index: [] | | \n", "C | Open High Low C|Date |0.027297252361386543\n", " |Date |1977-01-03 NaN | \n", " |1977-01-03 16.133125 16.236876 16.133125 16.23|1977-01-04 NaN | \n", " |1977-01-04 16.236876 16.288750 16.184999 16.28|1977-01-05 0.053720 | \n", " |1977-01-05 16.288750 16.288750 16.133125 16.18|1977-01-06 0.043500 | " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[['symbol', 'history', 'vol', 'zavg']]" ] }, { "cell_type": "code", "execution_count": 11, "id": "wicked-fairy", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.06238896915574035,\n", " 0.045634555332148996,\n", " 0.0676156301672513,\n", " 0.053189575669227614,\n", " nan,\n", " 0.027297252361386543]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.zavg" ] }, { "cell_type": "markdown", "id": "literary-eligibility", "metadata": {}, "source": [ "### Oh, no, we have a bad symbol, how do we remove it?" ] }, { "cell_type": "code", "execution_count": 12, "id": "dated-vintage", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.06238896915574035,\n", " 0.045634555332148996,\n", " 0.0676156301672513,\n", " 0.053189575669227614,\n", " 0.027297252361386543]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = s.exc(zavg = np.nan); s.zavg" ] }, { "cell_type": "markdown", "id": "sound-bowling", "metadata": {}, "source": [ "### Now if we want to calculate something per symbol and window...\n", "We want to create a new table, now keyed on two values: symbol and window n, so we create a bigger table using cross product:" ] }, { "cell_type": "code", "execution_count": 13, "id": "legislative-segment", "metadata": {}, "outputs": [], "source": [ "sn = s * dict(n = [10,20,30]) ## each row is now unique per symbol and window n" ] }, { "cell_type": "code", "execution_count": 14, "id": "injured-floor", "metadata": {}, "outputs": [], "source": [ "sn = sn(ewma = lambda rtn, n, vol: rtn.ewm(n).mean()/vol)" ] }, { "cell_type": "markdown", "id": "mounted-convention", "metadata": {}, "source": [ "And here is Citibank's three ewma..." ] }, { "cell_type": "code", "execution_count": 15, "id": "single-plaza", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 2]\n", "n |ewma \n", "10|Date \n", " |1977-01-03 NaN\n", " |1977-01-04 NaN\n", " |1977-01-05 -0.269415\n", " |1977-01-06 -0.636750\n", "20|Date \n", " |1977-01-03 NaN\n", " |1977-01-04 NaN\n", " |1977-01-05 -0.252990\n", " |1977-01-06 -0.610388\n", "30|Date \n", " |1977-01-03 NaN\n", " |1977-01-04 NaN\n", " |1977-01-05 -0.247336\n", " |1977-01-06 -0.601208" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sn.inc(symbol = 'C')[['n', 'ewma']]" ] }, { "cell_type": "markdown", "id": "racial-dining", "metadata": {}, "source": [ "Here is a pivot table of the average of each ewma per symbol and window... Note that again, we can access functions of variables and not just the existing keys in the dictable" ] }, { "cell_type": "code", "execution_count": 16, "id": "latest-minority", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 4]\n", "symbol|10 |20 |30 \n", "AAPL |[0.05186739363086048] |[0.048216725636102103]|[0.044663941035992756]\n", "C |[0.027514106874753336]|[0.026447710541888325]|[0.02513228496004374] \n", "MSFT |[0.061144190254947106]|[0.058806269926441355]|[0.056647557919386575]\n", "TSLA |[0.05809192519536144] |[0.0513719449570986] |[0.0461212842741293] \n", "WMT |[0.04534668336293025] |[0.04406462686679985] |[0.0426962176790935] " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sn.pivot('symbol', 'n', lambda ewma: ewma.mean())" ] }, { "cell_type": "markdown", "id": "first-engine", "metadata": {}, "source": [ "## dictable functionality\n", "### construction\n", "dictable is quite flexible on constuctions. " ] }, { "cell_type": "code", "execution_count": 17, "id": "efficient-atmosphere", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 2]\n", "a|b\n", "1|a\n", "2|b\n", "3|c\n", "4|d" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable(a = [1,2,3,4], b = ['a', 'b', 'c', 'd']); d" ] }, { "cell_type": "code", "execution_count": 18, "id": "existing-cedar", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 4]\n", "symbol|exchange|a|b\n", "MSFT |NYSE |1|a\n", "AAPL |NYSE |2|b\n", "APA |NYSE |3|c\n", "MMM |NYSE |4|d" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable(dict(a = [1,2,3,4], b = ['a', 'b', 'c', 'd']), symbol = ['MSFT', 'AAPL', 'APA', 'MMM'], exchange = 'NYSE'); d" ] }, { "cell_type": "code", "execution_count": 19, "id": "direct-security", "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(d) # can instantiate a DataFrame from a dictable with no code and vice versa..." ] }, { "cell_type": "code", "execution_count": 20, "id": "civic-reality", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 4]\n", "symbol|exchange|a|b\n", "MSFT |NYSE |1|a\n", "AAPL |NYSE |2|b\n", "APA |NYSE |3|c\n", "MMM |NYSE |4|d" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable(df); d" ] }, { "cell_type": "code", "execution_count": 21, "id": "hydraulic-tuition", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 2]\n", "a|b\n", "1|3\n", "2|4\n", "3|5" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable([(1,3), (2,4), (3,5)], ['a', 'b']); d # construction from records as tuples" ] }, { "cell_type": "code", "execution_count": 22, "id": "sophisticated-variation", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "b|c |a|d \n", "3|None |1|None \n", "4|None |2|new column\n", "5|also here|3|None " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable([dict(a = 1, b = 3), dict(a = 2, b = 4, d = 'new column'), dict(a = 3, b = 5, c = 'also here')]); d # construction from records as dicts, mismatching on keys is fine" ] }, { "cell_type": "code", "execution_count": 23, "id": "understanding-material", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[6 x 3]\n", "Symbol|Name |Sector \n", "MMM |3M Company |Industrials \n", "AOS |A.O. Smith Corp |Industrials \n", "ABT |Abbott Laboratories|Health Care \n", "ABBV |AbbVie Inc. |Health Care \n", "ABMD |ABIOMED Inc |Health Care \n", "ACN |Accenture plc |Information Technology" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dictable(read_csv('d:/dropbox/yoav/python/pyg/docs/constituents_csv.csv')); d = d[:6]; d" ] }, { "cell_type": "markdown", "id": "formed-pitch", "metadata": {}, "source": [ "### row access" ] }, { "cell_type": "code", "execution_count": 24, "id": "aware-cincinnati", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Symbol': 'MMM', 'Name': '3M Company', 'Sector': 'Industrials'}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d[0] #returns a record" ] }, { "cell_type": "code", "execution_count": 25, "id": "recreational-checkout", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 3]\n", "Symbol|Name |Sector \n", "MMM |3M Company |Industrials\n", "AOS |A.O. Smith Corp|Industrials" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d[:2] ## subset rows using slice" ] }, { "cell_type": "code", "execution_count": 26, "id": "sharp-webcam", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Symbol': 'MMM', 'Name': '3M Company', 'Sector': 'Industrials'}\n", "{'Symbol': 'AOS', 'Name': 'A.O. Smith Corp', 'Sector': 'Industrials'}\n", "{'Symbol': 'ABT', 'Name': 'Abbott Laboratories', 'Sector': 'Health Care'}\n", "{'Symbol': 'ABBV', 'Name': 'AbbVie Inc.', 'Sector': 'Health Care'}\n", "{'Symbol': 'ABMD', 'Name': 'ABIOMED Inc', 'Sector': 'Health Care'}\n", "{'Symbol': 'ACN', 'Name': 'Accenture plc', 'Sector': 'Information Technology'}\n" ] } ], "source": [ "for row in d: # iteration is by row\n", " print(row)" ] }, { "cell_type": "markdown", "id": "scheduled-weekly", "metadata": {}, "source": [ "### column access" ] }, { "cell_type": "code", "execution_count": 27, "id": "printable-moment", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3M Company',\n", " 'A.O. Smith Corp',\n", " 'Abbott Laboratories',\n", " 'AbbVie Inc.',\n", " 'ABIOMED Inc',\n", " 'Accenture plc']" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.Name" ] }, { "cell_type": "code", "execution_count": 28, "id": "corporate-study", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3M Company',\n", " 'A.O. Smith Corp',\n", " 'Abbott Laboratories',\n", " 'AbbVie Inc.',\n", " 'ABIOMED Inc',\n", " 'Accenture plc']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['Name']" ] }, { "cell_type": "code", "execution_count": 29, "id": "subtle-bacon", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('3M Company', 'Industrials'),\n", " ('A.O. Smith Corp', 'Industrials'),\n", " ('Abbott Laboratories', 'Health Care'),\n", " ('AbbVie Inc.', 'Health Care'),\n", " ('ABIOMED Inc', 'Health Care'),\n", " ('Accenture plc', 'Information Technology')]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['Name', 'Sector']" ] }, { "cell_type": "code", "execution_count": 30, "id": "advised-price", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[6 x 2]\n", "Name |Sector \n", "3M Company |Industrials \n", "A.O. Smith Corp |Industrials \n", "Abbott Laboratories|Health Care \n", "AbbVie Inc. |Health Care \n", "ABIOMED Inc |Health Care \n", "Accenture plc |Information Technology" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d[['Name', 'Sector']]" ] }, { "cell_type": "markdown", "id": "broke-foundation", "metadata": {}, "source": [ "### d is a dict so supports the usual keys(), values() and items():" ] }, { "cell_type": "code", "execution_count": 31, "id": "olympic-uzbekistan", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Symbol : ['MMM', 'AOS', 'ABT', 'ABBV', 'ABMD', 'ACN']\n", "Name : ['3M Company', 'A.O. Smith Corp', 'Abbott Laboratories', 'AbbVie Inc.', 'ABIOMED Inc', 'Accenture plc']\n", "Sector : ['Industrials', 'Industrials', 'Health Care', 'Health Care', 'Health Care', 'Information Technology']\n" ] } ], "source": [ "for key, column in d.items():\n", " print(key, ':', column)" ] }, { "cell_type": "markdown", "id": "freelance-bahrain", "metadata": {}, "source": [ "access via __function__ of variables is also supported" ] }, { "cell_type": "code", "execution_count": 32, "id": "friendly-impression", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['MMM, Industrials',\n", " 'AOS, Industrials',\n", " 'ABT, Health Care',\n", " 'ABBV, Health Care',\n", " 'ABMD, Health Care',\n", " 'ACN, Information Technology']" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d[lambda Symbol, Sector: '%s, %s'%(Symbol, Sector)]" ] }, { "cell_type": "markdown", "id": "romance-tragedy", "metadata": {}, "source": [ "### column and row access are commutative" ] }, { "cell_type": "code", "execution_count": 33, "id": "false-gauge", "metadata": {}, "outputs": [], "source": [ "assert d[0].Name == d.Name[0] == '3M Company'\n", "assert d[0][lambda Symbol, Sector: '%s, %s'%(Symbol, Sector)] == d[lambda Symbol, Sector: '%s, %s'%(Symbol, Sector)][0] == 'MMM, Industrials'\n", "assert d[0]['Name'] == d['Name'][0]\n", "assert d[:2]['Name', 'Sector'] == d['Name', 'Sector'][:2]\n", "assert d[:2][['Name', 'Sector']] == d[['Name', 'Sector']][:2]\n" ] }, { "cell_type": "markdown", "id": "outstanding-halloween", "metadata": {}, "source": [ "### adding records" ] }, { "cell_type": "code", "execution_count": 34, "id": "bridal-desperate", "metadata": {}, "outputs": [], "source": [ "d = dictable(name = ['alan', 'barbara', 'chris'], surname = ['abramson', 'brown', 'cohen'], age = [1,2,3])" ] }, { "cell_type": "code", "execution_count": 35, "id": "brilliant-subscription", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 3]\n", "age|name |surname \n", "1 |alan |abramson \n", "2 |barbara|brown \n", "3 |chris |cohen \n", "4 |david |donaldson" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d + dict(name = 'david', surname = 'donaldson', age = 4) ## adding a single record" ] }, { "cell_type": "code", "execution_count": 36, "id": "wired-findings", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 3]\n", "age|name |surname \n", "1 |alan |abramson \n", "2 |barbara|brown \n", "3 |chris |cohen \n", "4 |david |donaldson\n", "5 |evan |emmerson " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d + [dict(name = 'david', surname = 'donaldson', age = 4), dict(name = 'evan', surname = 'emmerson', age = 5)]" ] }, { "cell_type": "code", "execution_count": 37, "id": "arbitrary-affect", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 3]\n", "age|name |surname \n", "1 |alan |abramson \n", "2 |barbara|brown \n", "3 |chris |cohen \n", "4 |david |donaldson\n", "5 |evan |emmerson " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d + dict(name = ['david', 'evan'], surname = ['donaldson', 'emmerson'], age = [4,5])" ] }, { "cell_type": "code", "execution_count": 38, "id": "median-correlation", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 3]\n", "age|name |surname \n", "1 |alan |abramson \n", "2 |barbara|brown \n", "3 |chris |cohen \n", "4 |david |donaldson\n", "5 |evan |emmerson " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d + pd.DataFrame(dict(name = ['david', 'evan'], surname = ['donaldson', 'emmerson'], age = [4,5]))" ] }, { "cell_type": "markdown", "id": "micro-breakdown", "metadata": {}, "source": [ "### adding/modifying columns\n", "You can add a column or a constant by simply calling the dictable with the values:" ] }, { "cell_type": "code", "execution_count": 39, "id": "stupid-regulation", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "name |surname |age|gender|school \n", "alan |abramson|1 |m |St Paul\n", "barbara|brown |2 |f |St Paul\n", "chris |cohen |3 |m |St Paul" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d(gender = ['m', 'f', 'm'])(school = 'St Paul')" ] }, { "cell_type": "markdown", "id": "careful-chambers", "metadata": {}, "source": [ "More interestingly, it can be a callable function using the other variables..." ] }, { "cell_type": "code", "execution_count": 40, "id": "emotional-description", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "name |surname |age|initials\n", "alan |abramson|1 |aa \n", "barbara|brown |2 |bb \n", "chris |cohen |3 |cc " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = d(initials = lambda name, surname: name[0] + surname[0]); d" ] }, { "cell_type": "markdown", "id": "primary-pride", "metadata": {}, "source": [ "Given d is a dict, a more traditional way of setting a new key is by simple assignment:" ] }, { "cell_type": "code", "execution_count": 41, "id": "passing-delaware", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "name |surname |age|initials\n", "alan |abramson|1 |aa \n", "barbara|brown |2 |bb \n", "chris |cohen |3 |cc " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['initials'] = d[lambda name, surname: name[0] + surname[0]]; d" ] }, { "cell_type": "markdown", "id": "difficult-zealand", "metadata": {}, "source": [ "Or you can use the dict.update method:" ] }, { "cell_type": "code", "execution_count": 42, "id": "liquid-bulgaria", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "name |surname |age|initials|gender\n", "alan |abramson|1 |aa |m \n", "barbara|brown |2 |bb |f \n", "chris |cohen |3 |cc |m " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.update(dict(gender = ['m', 'f', 'm'])); d" ] }, { "cell_type": "markdown", "id": "excessive-village", "metadata": {}, "source": [ "### do\n", "Sometime we want to apply the same function(s) to a collection of columns. For this, 'do' will do nicely:" ] }, { "cell_type": "code", "execution_count": 43, "id": "explicit-muscle", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "name |surname |age|initials|gender\n", "Alan |Abramson|1 |AA |M \n", "Barbara|Brown |2 |BB |F \n", "Chris |Cohen |3 |CC |M " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = d.do(upper, 'initials', 'gender').do(proper, 'name', 'surname'); d" ] }, { "cell_type": "markdown", "id": "proprietary-affair", "metadata": {}, "source": [ "### removing columns " ] }, { "cell_type": "code", "execution_count": 44, "id": "indoor-electricity", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "name |surname |age|gender\n", "Alan |Abramson|1 |M \n", "Barbara|Brown |2 |F \n", "Chris |Cohen |3 |M " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = d - 'initials'; d" ] }, { "cell_type": "markdown", "id": "velvet-collect", "metadata": {}, "source": [ "### removing rows" ] }, { "cell_type": "code", "execution_count": 45, "id": "significant-flavor", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 4]\n", "age|gender|name |surname\n", "2 |F |Barbara|Brown \n", "3 |M |Chris |Cohen " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.exc(name = 'Alan')" ] }, { "cell_type": "code", "execution_count": 46, "id": "clean-winner", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 4]\n", "age|gender|name |surname \n", "1 |M |Alan |Abramson\n", "3 |M |Chris|Cohen " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.inc(name = ['Alan', 'Chris'])" ] }, { "cell_type": "code", "execution_count": 47, "id": "expanded-phase", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 4]\n", "age|gender|name |surname\n", "2 |F |Barbara|Brown \n", "3 |M |Chris |Cohen " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.inc(lambda age: age>1)" ] }, { "cell_type": "code", "execution_count": 48, "id": "disabled-scroll", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[1 x 4]\n", "name |surname|age|gender\n", "Barbara|Brown |2 |F " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.exc(lambda gender: gender == 'M')" ] }, { "cell_type": "code", "execution_count": 49, "id": "likely-privacy", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 4]\n", "age|gender|name |surname \n", "1 |M |Alan |Abramson\n", "3 |M |Chris|Cohen " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.exc(lambda name, surname: len(name)>len(surname))" ] }, { "cell_type": "markdown", "id": "annual-seafood", "metadata": {}, "source": [ "### sort" ] }, { "cell_type": "code", "execution_count": 50, "id": "pointed-fetish", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "name |surname |age|gender\n", "Alan |Abramson|1 |M \n", "Barbara|Brown |2 |F \n", "Chris |Cohen |3 |M " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.sort('name', 'surname')" ] }, { "cell_type": "code", "execution_count": 51, "id": "impaired-power", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "name |surname |age|gender\n", "Barbara|Brown |2 |F \n", "Alan |Abramson|1 |M \n", "Chris |Cohen |3 |M " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.sort(lambda name: name[::-1]) # can sort on functions of variables too" ] }, { "cell_type": "markdown", "id": "destroyed-prophet", "metadata": {}, "source": [ "### listby(keys)\n", "listby is like groupby except it returns a dictable with unique keys and the other columns are returned as a list. We find that MUCH more useful usually than groupby" ] }, { "cell_type": "code", "execution_count": 52, "id": "binary-diversity", "metadata": {}, "outputs": [], "source": [ "grades = dictable(name = ['alan', 'barbara', 'chris'], grades = [30,90,80], subject = 'english', teacher = 'mr bennet') \\\n", " + dictable(name = ['alan', 'david', 'esther'], grades = [40,50,70], subject = 'math', teacher = 'mrs ruler') \\\n", " + dictable(name = ['barbara', 'chris', 'esther'], grades = [90,60,80], subject = 'french', teacher = 'dr francois')" ] }, { "cell_type": "code", "execution_count": 53, "id": "understanding-cooper", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 4]\n", "teacher |grades |name |subject \n", "dr francois|[90, 60, 80]|['barbara', 'chris', 'esther']|['french', 'french', 'french'] \n", "mr bennet |[30, 90, 80]|['alan', 'barbara', 'chris'] |['english', 'english', 'english']\n", "mrs ruler |[40, 50, 70]|['alan', 'david', 'esther'] |['math', 'math', 'math'] " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades.listby('teacher')" ] }, { "cell_type": "code", "execution_count": 54, "id": "blessed-brief", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[3 x 5]\n", "teacher |grades |name |subject |avg_grade \n", "dr francois|[90, 60, 80]|['barbara', 'chris', 'esther']|['french', 'french', 'french'] |76.66666666666667 \n", "mr bennet |[30, 90, 80]|['alan', 'barbara', 'chris'] |['english', 'english', 'english']|66.66666666666667 \n", "mrs ruler |[40, 50, 70]|['alan', 'david', 'esther'] |['math', 'math', 'math'] |53.333333333333336" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades.listby('teacher')(avg_grade = lambda grades: np.mean(grades))" ] }, { "cell_type": "markdown", "id": "steady-rebate", "metadata": {}, "source": [ "### unlist\n", "unlist undoes listby() assuming it is possible..." ] }, { "cell_type": "code", "execution_count": 55, "id": "worldwide-cathedral", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 4]\n", "grades|name |subject|teacher \n", "90 |barbara|french |dr francois\n", "60 |chris |french |dr francois\n", "80 |esther |french |dr francois\n", "...9 rows...\n", "40 |alan |math |mrs ruler \n", "50 |david |math |mrs ruler \n", "70 |esther |math |mrs ruler " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades.listby('teacher').unlist()" ] }, { "cell_type": "markdown", "id": "useful-murray", "metadata": {}, "source": [ "### groupby(keys) and ungroup\n", "This is similar to DatFrame groupby except that instead of a new object, a dictable is returned: The name of the grouped column is given by 'grp'. ungroup allows us to get back to original." ] }, { "cell_type": "code", "execution_count": 56, "id": "surprised-mexico", "metadata": {}, "outputs": [], "source": [ "classes = grades.groupby(['teacher', 'subject'], grp = 'class')" ] }, { "cell_type": "code", "execution_count": 57, "id": "compatible-pizza", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'teacher': 'dr francois',\n", " 'subject': 'french',\n", " 'class': dictable[3 x 2]\n", " grades|name \n", " 90 |barbara\n", " 60 |chris \n", " 80 |esther }" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classes[0]" ] }, { "cell_type": "code", "execution_count": 58, "id": "acknowledged-roommate", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 4]\n", "grades|name |subject|teacher \n", "90 |barbara|french |dr francois\n", "60 |chris |french |dr francois\n", "80 |esther |french |dr francois\n", "...9 rows...\n", "40 |alan |math |mrs ruler \n", "50 |david |math |mrs ruler \n", "70 |esther |math |mrs ruler " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "classes.ungroup('class')" ] }, { "cell_type": "markdown", "id": "mineral-psychology", "metadata": {}, "source": [ "### inner join\n", "The multiplication operation is overloaded for the join method. By default, if two dictables share keys, the join is an inner join on the keys " ] }, { "cell_type": "code", "execution_count": 59, "id": "imported-enemy", "metadata": {}, "outputs": [], "source": [ "students = dictable(name = ['alan', 'barbara', 'chris', 'david', 'esthar', 'fabian'], surname = ['abramsom', 'brown', 'cohen', 'drummond', 'ecklestone', 'fox'])" ] }, { "cell_type": "code", "execution_count": 60, "id": "australian-citation", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shared keys: ['name']\n" ] }, { "data": { "text/plain": [ "dictable[7 x 5]\n", "name |grades|subject|teacher |surname \n", "alan |30 |english|mr bennet |abramsom\n", "alan |40 |math |mrs ruler |abramsom\n", "barbara|90 |english|mr bennet |brown \n", "...7 rows...\n", "chris |80 |english|mr bennet |cohen \n", "chris |60 |french |dr francois|cohen \n", "david |50 |math |mrs ruler |drummond" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('shared keys:', grades.keys() & students.keys())\n", "grades * students" ] }, { "cell_type": "markdown", "id": "fitted-muslim", "metadata": {}, "source": [ "Are there students with no surname? We can do a xor or use division which is overloaded for xor: " ] }, { "cell_type": "code", "execution_count": 61, "id": "about-norway", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 4]\n", "grades|name |subject|teacher \n", "70 |esther|math |mrs ruler \n", "80 |esther|french |dr francois" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades / students" ] }, { "cell_type": "markdown", "id": "intimate-lighting", "metadata": {}, "source": [ "Are there students with no grades?" ] }, { "cell_type": "code", "execution_count": 62, "id": "cordless-peeing", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[2 x 2]\n", "name |surname \n", "esthar|ecklestone\n", "fabian|fox " ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students / grades" ] }, { "cell_type": "code", "execution_count": 63, "id": "acting-founder", "metadata": {}, "outputs": [], "source": [ "students = dictable(name = ['Alan', 'Barbara', 'Chris', 'David', 'Esther', 'Fabian'], surname = ['abramsom', 'brown', 'cohen', 'drummond', 'ecklestone', 'fox'])" ] }, { "cell_type": "markdown", "id": "classified-launch", "metadata": {}, "source": [ "We fixed Esther's spelling but introduced capitalization, that is OK, we are allowed to inner join on functions of keys too." ] }, { "cell_type": "code", "execution_count": 64, "id": "curious-orlando", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 5]\n", "name |grades|subject|teacher |surname \n", "alan |30 |english|mr bennet |abramsom \n", "alan |40 |math |mrs ruler |abramsom \n", "barbara|90 |english|mr bennet |brown \n", "...9 rows...\n", "david |50 |math |mrs ruler |drummond \n", "esther |70 |math |mrs ruler |ecklestone\n", "esther |80 |french |dr francois|ecklestone" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades.join(students, 'name', lambda name: name.lower())" ] }, { "cell_type": "code", "execution_count": 65, "id": "progressive-phoenix", "metadata": {}, "outputs": [], "source": [ "students = dictable(first_name = ['alan', 'barbara', 'chris', 'david', 'esther', 'fabian'], surname = ['abramsom', 'brown', 'cohen', 'drummond', 'ecklestone', 'fox'])" ] }, { "cell_type": "markdown", "id": "naked-trial", "metadata": {}, "source": [ "You can inner join on different column names and both columns will be populated: " ] }, { "cell_type": "code", "execution_count": 66, "id": "defined-vocabulary", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 6]\n", "name |grades|subject|teacher |first_name|surname \n", "alan |30 |english|mr bennet |alan |abramsom \n", "alan |40 |math |mrs ruler |alan |abramsom \n", "barbara|90 |english|mr bennet |barbara |brown \n", "...9 rows...\n", "david |50 |math |mrs ruler |david |drummond \n", "esther |70 |math |mrs ruler |esther |ecklestone\n", "esther |80 |french |dr francois|esther |ecklestone" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades.join(pd.DataFrame(students), 'name', 'first_name')" ] }, { "cell_type": "markdown", "id": "pretty-cheese", "metadata": {}, "source": [ "### inner join (with other columns that match names)\n", "By default, if columns are shared but are not in the join, they will be returned with a tuple containing both values" ] }, { "cell_type": "code", "execution_count": 67, "id": "exterior-receptor", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 4]\n", "key|y|z|x \n", "a |4|2|(1, 4)\n", "b |5|8|(2, 1)\n", "b |5|9|(2, 2)\n", "c |6|1|(3, 3)\n", "c |7|1|(4, 3)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = dictable(key = ['a', 'b', 'c', 'c'], x = [1,2,3,4], y = [4,5,6,7])\n", "y = dictable(key = ['b', 'b', 'c', 'a'], x = [1,2,3,4], z = [8,9,1,2])\n", "x.join(y, 'key', 'key') ## ignore x column for joining" ] }, { "cell_type": "code", "execution_count": 68, "id": "transsexual-highway", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[5 x 4]\n", "key|y|z|x\n", "a |4|2|1\n", "b |5|8|2\n", "b |5|9|2\n", "c |6|1|3\n", "c |7|1|4" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.join(y, 'key', 'key', mode = 'left') ## grab left value" ] }, { "cell_type": "markdown", "id": "incoming-victoria", "metadata": {}, "source": [ "### cross join\n", "If no columns are shared, then a cross join is returned. " ] }, { "cell_type": "code", "execution_count": 69, "id": "present-roads", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[12 x 2]\n", "x|y\n", "1|1\n", "1|2\n", "1|3\n", "...12 rows...\n", "4|1\n", "4|2\n", "4|3" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = dictable(x = [1,2,3,4])\n", "y = dict(y = [1,2,3])\n", "x * y" ] }, { "cell_type": "code", "execution_count": 70, "id": "apart-bullet", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[12 x 2]\n", "x|y\n", "1|1\n", "1|2\n", "1|3\n", "...12 rows...\n", "4|1\n", "4|2\n", "4|3" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.join(y, [], []) ## you can force a full outer join" ] }, { "cell_type": "code", "execution_count": 71, "id": "favorite-master", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x / y == x" ] }, { "cell_type": "markdown", "id": "killing-pillow", "metadata": {}, "source": [ "### xor (versus left and right join)\n", "We find left/right join actually not very useful. There is usually a genuine reason for records for which there is a match and for records for which there isn't. And the treatment of these is distinct, which means a left-join operation that joins the two outcomes together is positively harmful.\n", "\n", "The xor operator is much more useful and you can use it to recreate left/right join if we really must. Here is an example" ] }, { "cell_type": "code", "execution_count": 80, "id": "described-remedy", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 5]\n", "grades|name |teacher |surname |subject\n", "30 |alan |mr bennet |abramsom|english\n", "40 |alan |mrs ruler |abramsom|math \n", "90 |barbara|mr bennet |brown |english\n", "...9 rows...\n", "50 |david |mrs ruler |None |math \n", "70 |esther |mrs ruler |None |math \n", "80 |esther |dr francois|None |french " ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students = dictable(name = ['alan', 'barbara', 'chris'], surname = ['abramsom', 'brown', 'cohen',])\n", "new_students = dictable(name = ['david', 'esther', 'fabian'], surname = ['drummond', 'ecklestone', 'fox'])\n", "\n", "inner_join = grades * students ## grades with students\n", "left_xor = grades / students ## grades without sudents\n", "\n", "# you can...\n", "left_join = grades * students + grades / students ## grades for which no surname is available will have None surname\n", "left_join" ] }, { "cell_type": "code", "execution_count": 73, "id": "earned-playback", "metadata": {}, "outputs": [], "source": [ "# but really you want to do:\n", "student_grades = grades * students\n", "unmapped_grades = grades / students ## we treat this one separately...\n", "new_student_grades = unmapped_grades * new_students ## and grab surnames from the new students table..." ] }, { "cell_type": "code", "execution_count": 74, "id": "norwegian-heating", "metadata": {}, "outputs": [], "source": [ "assert len(unmapped_grades / new_student_grades) == 0, 'students must exist either in the students table or in the new students table'" ] }, { "cell_type": "code", "execution_count": 75, "id": "peripheral-reggae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[9 x 5]\n", "grades|name |subject|surname |teacher \n", "30 |alan |english|abramsom |mr bennet \n", "40 |alan |math |abramsom |mrs ruler \n", "90 |barbara|english|brown |mr bennet \n", "...9 rows...\n", "50 |david |math |drummond |mrs ruler \n", "70 |esther |math |ecklestone|mrs ruler \n", "80 |esther |french |ecklestone|dr francois" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_grades = student_grades + new_student_grades; all_grades" ] }, { "cell_type": "markdown", "id": "serious-sierra", "metadata": {}, "source": [ "### pivot" ] }, { "cell_type": "code", "execution_count": 76, "id": "dynamic-programmer", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[16 x 2]\n", "x|y\n", "1|1\n", "1|2\n", "1|3\n", "...16 rows...\n", "4|2\n", "4|3\n", "4|4" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = dictable(x = [1,2,3,4])\n", "y = dictable(y = [1,2,3,4])\n", "xy = (x * y)\n", "xy" ] }, { "cell_type": "code", "execution_count": 77, "id": "actual-scholarship", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 5]\n", "x|1 |2 |3 |4 \n", "1|[1]|[2]|[3] |[4] \n", "2|[2]|[4]|[6] |[8] \n", "3|[3]|[6]|[9] |[12]\n", "4|[4]|[8]|[12]|[16]" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xy.pivot('x', 'y', lambda x, y: x*y)" ] }, { "cell_type": "markdown", "id": "plain-width", "metadata": {}, "source": [ "### a few observations:" ] }, { "cell_type": "markdown", "id": "prime-novel", "metadata": {}, "source": [ "* as per usual, can provide a function for values in table (indeed columns y) and not just keys\n", "* the output in the cells come back as a list. This is because sometimes there are more than one row with given x and y, and sometimes there are none:" ] }, { "cell_type": "code", "execution_count": 78, "id": "opponent-inclusion", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 5]\n", "x|1 |2 |3 |4 \n", "1|[1, 1]|[2, 2]|[3, 3] |None \n", "2|[2, 2]|[4, 4]|None |[8, 8] \n", "3|[3, 3]|None |[9, 9] |[12, 12]\n", "4|None |[8, 8]|[12, 12]|[16, 16]" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(xy + xy).exc(lambda x,y: x+y == 5).pivot('x', 'y', lambda x, y: x*y)" ] }, { "cell_type": "markdown", "id": "intensive-crash", "metadata": {}, "source": [ "You can apply a sequence of aggregate functions:" ] }, { "cell_type": "code", "execution_count": 79, "id": "expressed-wellington", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dictable[4 x 5]\n", "x|1 |2 |3 |4 \n", "1|2 |2 |2 |None\n", "2|2 |2 |None|2 \n", "3|2 |None|2 |2 \n", "4|None|2 |2 |2 " ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(xy + xy).exc(lambda x,y: x+y == 5).pivot('x', 'y', lambda x, y: x*y, lambda v: len(v))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }