pyg.base¶
extensions to dict¶
dictattr¶
-
class
pyg.base._dictattr.
dictattr
¶ - A simple dict with extended member manipulation
access using d.key
access multiple elements using d[key1, key2]
- Example
members access
>>> from pyg import * >>> d = dictattr(a = 1, b = 2, c = 3) >>> assert isinstance(d, dict) >>> assert d.a == 1 >>> assert d['a','b'] == [1,2] >>> assert d[['a','b']] == dictattr(a = 1, b = 2)
In addition, it has extended key selection/subsetting
- Example
subsetting
>>> d = dictattr(a = 1, b = 2, c = 3) >>> assert d - 'a' == dictattr(b = 2, c = 3) >>> assert d & ['b', 'c', 'not in keys'] == dictattr(b = 2, c = 3)
dictattr supports not in-place ‘update’:
- Example
updating via adding another dict
>>> d = dictattr(a = 1, b = 2) + dict(b = 'replacing old value', c = 'new key') >>> assert d == dictattr(a = 1, b = 'replacing old value', c = 'new key')
-
copy
() → a shallow copy of D¶
-
keys
()¶ dictattr returns an actual list rather than a generator. Further, this recognises that the keys are necessarily unique so it returns a ulist which is also a set
- Returns
- ulist
list of keys of dictattr.
- Example
>>> from pyg import * >>> d = dictattr(a = 1, b = 2) >>> assert d.keys() == ulist(['a', 'b']) >>> assert d.keys() & ['a', 'c', 'd'] == ['a']
-
relabel
(*args, **relabels)¶ easy relabel/rename of keys
- Parameters
- *argsstr or callable
a string ending/starting with _ will trigger a prefix/suffix to all keys
callable function will be applied to the keys to update them
- **relabelsstrings
individual relabeling of keys
- Returns
- dictattr
new dict with renamed keys.
- Example
suffix/prefix
>>> from pyg import * >>> d = dictattr(a = 1, b = 2, c = 3) >>> assert d.relabel('x_') == dictattr(x_a = 1, x_b = 2, x_c = 3) # prefixing >>> assert d.relabel('_x') == dictattr(a_x = 1, b_x = 2, c_x = 3) # suffixing
- Example
callable
>>> assert d.rename(upper) == dictattr(A = 1, B = 2, C = 3)
- Example
individual relabelling
>>> assert d.rename(a = 'A') == dictattr(A = 1, b = 2, c = 3) >>> assert d.rename(['A', 'B', 'C']) == d.relabel(upper)
-
rename
(*args, **relabels)¶ Identical to relabel. See relabel for full docs
-
values
() → an object providing a view on D’s values¶
-
pyg.base._dictattr.dictattr.
__add__
(self, other)¶ dictattr uses add as a copy + update. Similar to the latest python |=
- Example
>>> from pyg import * >>> d = dictattr(a = 1, b = 2) >>> assert d + dict(b = 3, c = 5) == dictattr(a = 1, b = 3, c = 5)
- Parameters
- other: dict
a dict used to update current dict.
-
pyg.base._dictattr.dictattr.
__sub__
(self, key, copy=True)¶ deletes an item but does not throw an exception if not there dictattr uses subtraction to remove key(s)
- Returns
updated dictattr
- Example
>>> from pyg import * >>> d = dictattr(a = 1, b = 2, c = 3) >>> assert d - ['b','c'] == dictattr(a = 1) >>> assert d - 'c' == dictattr(a = 1, b = 2) >>> assert d - 'key not there' == d >>> #commutative >>> assert (d - 'c').keys() == d.keys() - 'c'
-
pyg.base._dictattr.dictattr.
__and__
(self, other)¶ dictattr uses & as a set operator for key filtering
- Returns
updated dictattr
- Example
>>> from pyg import * >>> d = dictattr(a = 1, b = 2, c = 3) >>> assert d & ['a', 'b', 'not_there'] == dictattr(a = 1, b = 2) >>> #commutative >>> assert (d & ['a', 'b', 'x']).keys() == d.keys() & ['a', 'b', 'x']
ulist¶
The dictattr.keys() method returns a ulist: a list with unique elements:
-
class
pyg.base._ulist.
ulist
(*args, unique=False)¶ A list whose members are unique. It has +/- operations overloaded while also supporting set opeations &/|
- Example
>>> assert ulist([1,3,2,1]) == list([1,3,2])
- Example
addition adds element(s)
>>> assert ulist([1,3,2,1]) + 4 == list([1,3,2,4]) >>> assert ulist([1,3,2,1]) + [4,1] == list([1,3,2,4]) >>> assert ulist([1,3,2,1]) + [4,1,5] == list([1,3,2,4,5])
- Example
subtraction removes element(s)
>>> assert ulist([1,3,2,1]) - 1 == [3,2] >>> assert ulist([1,3,2,1]) - [1,3,4] == [2]
- Example
set operations
>>> assert ulist([1,3,2,1]) & 1 == [1] >>> assert ulist([1,3,2,1]) & [1,3,4] == [1,3]
>>> assert ulist([1,3,2,1]) | 1 == [1,3,2] >>> assert ulist([1,3,2,1]) | 4 == [1,3,2,4] >>> assert ulist([1,3,2,1]) | [1,3,4] == [1,3,2,4]
-
copy
()¶ Return a shallow copy of the list.
Dict¶
-
class
pyg.base._dict.
Dict
¶ Dict extends dictattr to allow access to functions of members
- Example
>>> from pyg import * >>> d = Dict(a = 1, b=2) >>> assert d[lambda a, b: a+b] == 3 >>> assert d['a','b', lambda a,b: a+b] == [1,2,3]
Dict is also callable where the key-value is used to add/update current members
- Example
>>> from pyg import * >>> d = Dict(a = 1, b=2) >>> assert d(c = 3) == Dict(a = 1, b = 2, c = 3) >>> assert d(c = lambda a,b: a+b) == Dict(a = 1, b = 2, c = 3)
>>> assert d(c = 3) == Dict(a = 1, b = 2) + Dict(c = 3) >>> assert Dict(a = 1)(b = lambda a: a+1)(c = lambda a,b: a+b) == Dict(a = 1,b = 2,c = 3)
-
do
(function, *keys)¶ applies a function(s) on multiple keys at the same time
- Parameters
- functioncallable or list of callables
function to be applied per column
- *keysstring/list of strings
list of columns to be applied. If missing, applied to all columns
- Returns
res : Dict
- Example
>>> from pyg import * >>> d = Dict(name = 'adam', surname = 'atkins') >>> assert d.do(proper) == Dict(name = 'Adam', surname = 'Atkins')
- Example
using another key in the calculation
>>> from pyg import * >>> d = Dict(a = 1, b = 5, denominator = 10) >>> d = d.do(lambda value, denominator: value/denominator, 'a', 'b') >>> assert d == Dict(a = 0.1, b = 0.5, denominator = 10)
-
pyg.base._dict.Dict.
__call__
(self, **kwargs)¶ Call self as a function.
dictable¶
-
class
pyg.base._dictable.
dictable
(data=None, columns=None, **kwargs)¶ - What is dictable?
dictable is a table, a collection of iterable records. It is also a dict with each key being a column. Why not use a pandas.DataFrame? pd.DataFrame leads a dual life:
by day an index-based optimized numpy array supporting e.g. timeseries analytics etc.
by night, a table with keys supporting filtering, aggregating, pivoting on keys as well as inner/outer joining on keys.
dictable only tries to do the latter. dictable should be thought of as a ‘container for complicated objects’ rather than just an array of primitive floats. In general, each cell may contain timeseries, yield_curves, machine-learning experiments etc. The interface is very succinct and allows the user to concentrate on logic of the calculations rather than boilerplate.
dictable supports quite a flexible construction:
- Example
construction using records
>>> from pyg import *; import pandas as pd >>> d = dictable([dict(name = 'alan', surname = 'atkins', age = 39, country = 'UK'), >>> dict(name = 'barbara', surname = 'brown', age = 29, country = 'UK')])
- Example
construction using columns and constants
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
- Example
construction using pandas.DataFrame
>>> original = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK') >>> df_from_dictable = pd.DataFrame(original) >>> dictable_from_df = dictable(df_from_dictable) >>> assert original == dictable_from_df
- Example
construction rows and columns
>>> d = dictable([['alan', 'atkins', 39, 'UK'], ['barbara', 'brown', 29, 'UK']], columns = ['name', 'surname', 'age', 'country'])
- Access
column access
>>> assert d.keys() == ['name', 'surname', 'age', 'country'] >>> assert d.name == ['alan', 'barbara'] >>> assert d['name'] == ['alan', 'barbara'] >>> assert d['name', 'surname'] == [('alan', 'atkins'), ('barbara', 'brown')] >>> assert d[lambda name, surname: '%s %s'%(name, surname)] == ['alan atkins', 'barbara brown']
- Access
row access & iteration
>>> assert d[0] == {'name': 'alan', 'surname': 'atkins', 'age': 39, 'country': 'UK'} >>> assert [row for row in d] == [{'name': 'alan', 'surname': 'atkins', 'age': 39, 'country': 'UK'}, >>> {'name': 'barbara', 'surname': 'brown', 'age': 29, 'country': 'UK'}]
Note that members access is commutative:
>>> assert d.name[0] == d[0].name == 'alan' >>> d[lambda name, surname: name + surname][0] == d[0][lambda name, surname: name + surname] >>> assert sum([row for row in d], dictable()) == d
- Example
adding records
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK') >>> d = d + {'name': 'charlie', 'surname': 'chocolate', 'age': 49} # can add a record directly >>> assert d[-1] == {'name': 'charlie', 'surname': 'chocolate', 'age': 49, 'country': None} >>> d += dictable(name = ['dana', 'ender'], surname = ['deutch', 'esterhase'], age = [10, 20], country = ['Germany', 'Hungary']) >>> assert d.name == ['alan', 'barbara', 'charlie', 'dana', 'ender'] >>> assert len(dictable.concat([d,d])) == len(d) * 2
- Example
adding columns
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> ### all of the below are ways of adding columns #### >>> d.gender == ['m', 'f'] >>> d = d(gender = ['m', 'f']) >>> d['gender'] == ['m', 'f'] >>> d2 = dictable(gender = ['m', 'f'], profession = ['astronaut', 'barber']) >>> d = d(**d2)
- Example
adding derived columns
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK') >>> d = d(full_name = lambda name, surname: proper('%s %s'%(name, surname))) >>> d['full_name'] = d[lambda name, surname: proper('%s %s'%(name, surname))] >>> assert d.full_name == ['Alan Atkins', 'Barbara Brown']
- Example
dropping columns
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK') >>> del d.country # in place >>> del d['age'] # in place >>> assert (d - 'name')[0] == {'surname': 'atkins'} and d[0] == {'name': 'alan', 'surname': 'atkins'}
- Example
row selection, inc/exc
>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK') >>> assert len(d.exc(name = 'alan')) == 1 >>> assert len(d.exc(lambda age: age<30)) == 1 # can filter on *functions* of members, not just members. >>> assert d.inc(name = 'alan').surname == ['atkins'] >>> assert d.inc(lambda age: age<30).name == ['barbara'] >>> assert d.exc(lambda age: age<30).name == ['alan']
- dictable supports:
sort
group-by/ungroup
list-by/ unlist
pivot/unpivot
inner join, outer join and xor
Full details are below.
-
classmethod
concat
(*others)¶ adds together multiple dictables. equivalent to sum(others, self) but a little faster
- Parameters
- *othersdictables
records to be added to current table
- Returns
- mergeddictable
sum of all records
- Example
>>> from pyg import * >>> d1 = dictable(a = [1,2,3]) >>> d2 = dictable(a = [4,5,6]) >>> d3 = dictable(a = [7,8,9])
>>> assert dictable.concat(d1,d2,d3) == dictable(a = range(1,10)) >>> assert dictable.concat([d1,d2,d3]) == dictable(a = range(1,10))
-
do
(function, *keys)¶ applies a function(s) on multiple keys at the same time
- Parameters
- functioncallable or list of callables
function to be applied per column
- *keysstring/list of strings
list of columns to be applied. If missing, applied to all columns
- Returns
res : dictable
- Example
>>> from pyg import * >>> d = dictable(name = ['adam', 'barbara', 'chris'], surname = ['atkins', 'brown', 'cohen']) >>> assert d.do(proper) == dictable(name = ['Adam', 'Barbara', 'Chris'], surname = ['Atkins', 'Brown', 'Cohen'])
- Example
using another column in the calculation
>>> from pyg import * >>> d = dictable(a = [1,2,3,4], b = [5,6,9,8], denominator = [10,20,30,40]) >>> d = d.do(lambda value, denominator: value/denominator, 'a', 'b') >>> assert d == dictable(a = 0.1, b = [0.5,0.3,0.3,0.2], denominator = [10,20,30,40])
-
exc
(*functions, **filters)¶ performs a filter on what rows to exclude
- Parameters
- *functionscallables or a dict
filters based on functions of each row
- **filtersvalue or list of values
filters per each column
- Returns
- dictable
table with rows that satisfy all conditions excluded.
- Example
filtering on keys
>>> from pyg import *; import numpy as np >>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5]) >>> assert d.exc(x = np.nan) == dictable(x = [1,2,3], y = [0,4,3]) >>> assert d.exc(x = 1) == dictable(x = [2,3,np.nan], y = [4,3,5]) >>> assert d.exc(x = [1,2]) == dictable(x = [1,2], y = [0,4])
- Example
filtering on callables
>>> from pyg import *; import numpy as np >>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5]) >>> assert d.exc(lambda x,y: x>y) == dictable(x = 1, y = 0)
-
get
(key, default=None)¶ Return the value for key if key is in the dictionary, else default.
-
groupby
(*by, grp='grp')¶ Similar to pandas groupby but returns a dictable of dictables with a new column ‘grp’
- Example
>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0]) >>> res = x.groupby('b') >>> assert res.keys() == ['b', 'grp'] >>> assert is_dictable(res[0].grp) and res[0].grp.keys() == ['a']
- Parameters
*by : str or list of strings
gr.
- grpstr, optional
The name of the column for the dictables per each key. The default is ‘grp’.
- Returns
- dictable
A dictable containing the original keys and a dictable per unique key.
-
inc
(*functions, **filters)¶ performs a filter on what rows to include
- Parameters
- *functionscallables or a dict
filters based on functions of each row
- **filtersvalue or list of values
filters per each column
- Returns
- dictable
table with rows that satisfy all conditions.
- Example
filtering on keys
>>> from pyg import *; import numpy as np >>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5]) >>> assert d.inc(x = np.nan) == dictable(x = np.nan, y = 5) >>> assert d.inc(x = 1) == dictable(x = 1, y = 0) >>> assert d.inc(x = [1,2]) == dictable(x = [1,2], y = [0,4])
- Example
filtering on regex
>>> import re >>> d = dictable(text = ['once', 'upon', 'a', 'time', 'in', 'the', 'west', 1, 2, 3]) >>> assert d.inc(text = re.compile('o')) == dictable(text = ['once', 'upon']) >>> assert d.exc(text = re.compile('e')) == dictable(text = ['upon', 'a', 'in', 1, 2, 3])
- Example
filtering on callables
>>> from pyg import *; import numpy as np >>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5]) >>> assert d.inc(lambda x,y: x>y) == dictable(x = 1, y = 0)
-
join
(other, lcols=None, rcols=None, mode=None)¶ Performs either an inner join or a cross join between two dictables
- Example
inner join
>>> from pyg import * >>> x = dictable(a = ['a','b','c','a']) >>> y = dictable(a = ['a','y','z']) >>> assert x.join(y) == dictable(a = ['a', 'a'])
- Example
outer join
>>> from pyg import * >>> x = dictable(a = ['a','b']) >>> y = dictable(b = ['x','y']) >>> assert x.join(y) == dictable(a = ['a', 'a', 'b', 'b'], b = ['x', 'y', 'x', 'y'])
-
pivot
(x, y, z, agg=None)¶ pivot table functionality.
- Parameters
- xstr/list of str
unique keys per each row
- ystr
unique key per each column
- zstr/callable
A column in the table or an evaluated quantity per each row
- aggNone/callable or list of callables, optional
Each (x,y) cell can potentially contain multiple z values. so if agg = None, a list is returned If you want the data aggregated in any way, then supply an aggregating function(s)
- Returns
A dictable which is a pivot table of the original data
- Example
>>> from pyg import * >>> timetable_as_list = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) >>> timetable = timetable_as_list.xyz('x','y',lambda x, y: x * y) >>> assert timetable = dictable(x = [1,2,3], )
- Example
>>> self = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) >>> x = 'x'; y = 'y'; z = lambda x, y: x * y >>> self.exc(lambda x, y: x+y==5).xyz(x,y,z, len)
-
sort
(*by)¶ Sorts the table either using a key, list of keys or functions of members
- Example
>>> import numpy as np >>> self = dictable(a = [_ for _ in 'abracadabra'], b=range(11), c = range(0,33,3)) >>> self.d = list(np.array(self.c) % 11) >>> res = self.sort('a', 'd') >>> assert list(res.c) == list(range(11))
>>> d = dictable(a = ['a', 1, 'c', 0, 'b', 2]).sort('a') >>> res = d.sort('a','c') >>> print(res) >>> assert ''.join(res.a) == 'aaaaabbcdrr' and list(res.c) == [0,4,8,9,10] + [2,3] + [1] + [7] + [5,6]
>>> d = d.sort(lambda b: b*3 % 11) ## sorting again by c but using a function >>> assert list(d.c) == list(range(11))
-
ungroup
(grp='grp')¶ Undoes groupby
- Example
>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0]) >>> self = x.groupby('b')
- Parameters
- grpstr, optional
column name where dictables are. The default is ‘grp’.
- Returns
dictable.
-
unlist
()¶ undoes listby…
- Example
>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0]) >>> x.listby('b')
dictable[2 x 2] b|a 0|[2, 4] 1|[1, 3]
>>> assert x.listby('b').unlist().sort('a') == x
- Returns
- dictable
a dictable where all rows with list in them have been ‘expanded’.
-
unpivot
(x, y, z)¶ undoes self.xyz / self.pivot
- Example
>>> from pyg import * >>> orig = (dictable(x = [1,2,3,4]) * dict(y = [1,2,3,4,5]))(z = lambda x, y: x*y) >>> pivot = orig.xyz('x', 'y', 'z', last) >>> unpivot = pivot.unpivot('x','y','z').do(int, 'y') # the conversion to column names mean y is now string... so we convert back to int >>> assert orig == unpivot
- Parameters
- xstr/list of strings
list of keys in the pivot table.
- ystr
name of the columns that wil be used for the values that are currently column headers.
- zstr
name of the column that describes the data currently within the pivot table.
- Returns
dictable
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
xor
(other, lcols=None, rcols=None, mode='l')¶ returns what is in lhs but NOT in rhs (or vice versa if mode = ‘r’). Together with inner joining, can be used as left/right join
- Examples
>>> from pyg import * >>> self = dictable(a = [1,2,3,4]) >>> other = dictable(a = [1,2,3,5]) >>> assert self.xor(other) == dictable(a = 4) # this is in lhs but not in rhs >>> assert self.xor(other, lcols = lambda a: a * 2, rcols = 'a') == dictable(a = [2,3,4]) # fit can be done using formulae rather than actual columns
The XOR functionality can be performed using quotient (divide): >>> assert lhs/rhs == dictable(a = 4) >>> assert rhs/lhs == dictable(a = 5)
>>> rhs = dictable(a = [1,2], b = [3,4]) >>> left_join_can_be_done_simply_as = lhs * rhs + lhs/rhs
- Parameters
- otherdictable (or something that can be turned to one)
what we exclude with.
- lcolsstr/list of strs, optional
the left columns/formulae on which we match. The default is None.
- rcolsstr/list of strs, optional
the right columns/formulae on which we match. The default is None.
- modestring, optional
When set to ‘r’, performs xor the other way. The default is ‘l’.
- Returns
- dictable
a dictable containing what is in self but not in ther other dictable.
-
xyz
(x, y, z, agg=None)¶ pivot table functionality.
- Parameters
- xstr/list of str
unique keys per each row
- ystr
unique key per each column
- zstr/callable
A column in the table or an evaluated quantity per each row
- aggNone/callable or list of callables, optional
Each (x,y) cell can potentially contain multiple z values. so if agg = None, a list is returned If you want the data aggregated in any way, then supply an aggregating function(s)
- Returns
A dictable which is a pivot table of the original data
- Example
>>> from pyg import * >>> timetable_as_list = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) >>> timetable = timetable_as_list.xyz('x','y',lambda x, y: x * y) >>> assert timetable = dictable(x = [1,2,3], )
- Example
>>> self = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) >>> x = 'x'; y = 'y'; z = lambda x, y: x * y >>> self.exc(lambda x, y: x+y==5).xyz(x,y,z, len)
-
pyg.base._dictable.dictable.
__call__
(self, **kwargs)¶ Call self as a function.
perdictable¶
-
pyg.base._perdictable.
perdictable
()¶ A decorator that makes a function works per dictable and not just on original value
- Example
>>> f = lambda a, b: a+b >>> p = perdictable(f, on = ['key'])
The new modified function p now works the same on old values:
- Paramaters
- functioncallable
A function
- on: str/list of str
perform join based on these keys
- renames: dict
This tells us which column to grab from which table
- defaults: dict
If a default is provided for a parameter, we will perform a left join, substituting missing values with the defaults
- if_none: bool, list of keys
If historic data is None while the row has expired, should we force a recalculation? if True, will be done.
- output_is_input: bool, list of keys
Some functions want their own outut to be presented to them. If you see to True, if cached values exist for these columns, these are provided to the function
- include_inputs:
When we return the outputs, do you want the inputs to be included as well in the dictable.
- col: str
the name of the variable output.
- Example
>>> f = lambda a, b: a+b >>> p = perdictable(f, include_inputs = True) >>> assert p(a = 1, b = 2) == 3 >>> assert p(a = dictable(a = [1,2,3]), b = 3) == dictable(a = [1,2,3], b = 3, expiry = None, data = [4,5,6])
# some parameters are constant, some are tables…
>>> assert p(a = 1, b = dictable(key = ['a','b','c'], b = [1,2,3])) == dictable(key = ['a', 'b', 'c'], data = [2,3,4])
# multiple tables… some unkeyed
>>> assert p(a = dictable(a = [1,2]), b = dictable(key = ['a','b','c'], b = [1,2,3])) == dictable(key = ['a','a', 'b', 'b', 'c','c'], data = [2,3,3,4,4,5])
# multiple tables… all keyed
>>> a = dictable(key = ['x', 'y'], data = [1,2]) >>> b = dictable(key = ['y', 'z'], data = [3,4]) >>> assert p(a = a, b = b) == dictable(key = ['y'], data = [5])
- Example
existing data provided using data and expiry
>>> a = dictable(key = ['x', 'y', 'z'], data = [1,2,3]) >>> b = dictable(key = ['x', 'y', 'z'], data = [1,3,4]) >>> data = dictable(key = ['x', 'y'], data = ['we calculated this before', 'we calculated before but hasnt expired']) >>> expiry = dictable(key = ['x', 'y'], data = [dt(2000,1,1), dt(3000,1,1)]) >>> inputs = dict(a = a, b = b)
>>> res = p(a = a, b = b, data = data, expiry = expiry) >>> assert res.find_data(key = 'x').data == 'we calculated this before' >>> assert res.find_data(key = 'y').data == 5 # although calculated before, we recalculate as its expiry is in the future
join¶
-
pyg.base._perdictable.
join
(inputs, on=None, renames=None, defaults=None)¶ Suppose we have a function which is defined on simple numbers
- Example
>>> from pyg import * >>> profit = lambda amount, price: amount * price
The amounts sold are available in one table and prices in another
- Example
>>> amounts = dictable(product = ['apple', 'orange', 'pear'], amount = [1,2,3]) >>> prices = dictable(product = ['apple', 'orange', 'pear', 'banana'], price = [4,5,6,8]) >>> join(dict(amount = amounts, price = prices), on = 'product')(profit = profit)
>>> dictable[3 x 4] >>> product|amount|price|profit >>> apple |1 |4 |4 >>> orange |2 |5 |10 >>> pear |3 |6 |18
- Parameters
- inputsdict
a dict of input parameters, some of them may be dictables.
- onstr/list of str
when we have dictables
- renamesdict, optional
remapping. if the datasets contain multiple columns, you can say renames = dict(price = ‘price_in_dollar’) to tell the algo, this is the column to use The default is None.
- defaultsdict, optional
Normally, an inner join is performed. However, if there is a default value/formula for when e.g. a price is missing, use this. The default is None.
- Returns
- dictable
a dictable of an inner join.
- Example
how column mapping is done
>>> on = 'a' >>> ## if there is only one column apart from keys, then it is selected:
>>> assert join(dict(x = dictable(a = [1,2], data = [2,3])), on = on) == dictable(a = [1,2], x = [2,3]) >>> assert join(dict(x = dictable(a = [1,2], random_name = [2,3])), on = on) == dictable(a = [1,2], x = [2,3])
>>> ## if there are multiple columns, if variable name is there, we use it: >>> assert join(dict(x = dictable(a = [1,2], z = [2,3], x = [4,5])), on) == dictable(a = [1,2], x = [4,5])
>>> ## if there are multiple columns, and 'data' is one of the columns, we use it: >>> assert join(dict(x = dictable(a = [1,2], z = [2,3], data = [4,5])), on) == dictable(a = [1,2], x = [4,5])
- Example
how column mapping is done with rename
>>> with pytest.raises(KeyError): >>> join(dict(x = dictable(a = [1,2], b = [2,3], c = [4,5])), on = 'a') ## pick b or c? >>> assert join(dict(x = dictable(a = [1,2], b = [2,3], c = [4,5])), on = 'a', renames = dict(x = 'c')) == dictable(a = [1,2,], x = [4, 5])
- Example
joins with partial columns in some tables
>>> on = ['a', 'b', 'c'] >>> a = dictable(a = [1,2,3,4], x = [1,2,3,4]) ## only column a here >>> b = dictable(b = [1,2,3,4], y = [1,2,3,4]) ## only column b here >>> c = dictable(a = [1,2,3,4], b = [1,2,3,4], c = [1,2,3,4], z = [1,2,3,4]) >>> j = join(dict(x = a, y = b, z = c), on = ['a', 'b', 'c']) >>> assert len(j) == 4 and sorted(j.keys()) == ['a', 'b', 'c', 'x', 'y', 'z']
- Example
join with defaults
If no defaults are provided, we need all variables to be present. However, if we specify defaults, we left-join on that variable and insert the default value
>>> x = dictable(a = [1,2,4], x = [1,2,4]) >>> y = dictable(a = [1,2,3], x = [5,6,7]) >>> on = 'a' >>> assert join(dict(x = x, y = y), on = on) == dictable(a = [1,2,], x = [1,2], y = [5,6]) >>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(x = None)) == dictable(a = [1,2,3], x = [1,2,None], y = [5,6,7]) >>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(y = 0)) == dictable(a = [1,2,4], x = [1,2,4], y = [5,6,0]) >>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(x = None, y = 0)) == dictable(a = [1,2,3,4], x = [1,2,None,4], y = [5,6,7,0])
named_dict¶
-
pyg.base._named_dict.
named_dict
(name, keys, defaults={}, types={}, casts={}, basedict='pyg.base.dictattr', debug=False)¶ This forms a base for all classes. It is similar to named_tuple but:
supports additional features such as casting/type checking.
support default values
The resulting class is a dict so can be stored in MongoDB, sent to json or be used to construct a pd.Series automatically.
- Example
Simple construction
>>> Customer = named_dict('Customer', ['name', 'date', 'balance']) >>> james = Customer('james', 'today', 10) >>> assert james['balance'] == 10 >>> assert james.date == 'today'
- Example
How named_dict works with json/pandas/other named_dicts
>>> class Customer(named_dict('Customer', ['name', 'date', 'balance'])): >>> def add_to_balance(self, value): >>> res = self.copy() >>> res.balance += value >>> return res
>>> james = Customer('james', 'date', 10) >>> assert james.add_to_balance(10).balance == 20 >>> import json >>> assert pd.Series(james).date == 'date' >>> assert dict(james) == {'name': 'james', 'date': 'date', 'balance': 10} >>> assert json.dumps(james) == '{"name": "james", "date": "date", "balance": 10}'
>>> class VIP(named_dict('VIP', ['name', 'date'])): >>> def some_method(self): >>> return 'inheritence between classes works as long as members can share'
>>> vip = VIP(james) >>> assert vip.name == 'james' ## members moved seemlessly >>> assert vip.some_method() == 'inheritence between classes works as long as members can share'
- Example
Adding defaults
>>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0)) >>> james = Customer('james', 'today') >>> assert james['balance'] == 0
- Example
types checking
>>> import datetime >>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0), types = dict(date = 'datetime.datetime')) >>> james = Customer('james', datetime.datetime.now()) >>> assert james['balance'] == 0
- Example
casting
>>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0), types = dict(date = 'datetime.datetime'), casts = dict(balance = 'float')) >>> james = Customer('james', datetime.datetime.now(), balance = '10.3') >>> assert james['balance'] == 10.3
- Parameters
- namestr
name of new class.
- keyslist
list of keys that the class must have as members.
- defaultsdict, optional
default values for the keys. The default is {}.
- typestype or callable, optional
A test to be applied for keys either as a callable or as a type. The default is {}.
- castsdict, optional
function. The default is {}.
- basedictstr, optional
name of the dict class to inherit from. The default is ‘dict’.
- debugbool, optional
output the construction text if set to True. The default is False.
- ValueError
DESCRIPTION.
- Returns
result : new class that inherits from a dict
decorators¶
wrapper¶
-
class
pyg.base._decorators.
wrapper
(function=None, *args, **kwargs)¶ A base class for all decorators. It is similar to functools.wraps but better. See below why wrapt cannot be used… You basically need to define the wrapped method and everything else is handled for you. - You can then use it either directly to decorate functions - Or use it to create parameterized decorators - the __name__, __wrapped__, __doc__ and the getargspec will all be taken care of.
- Example
>>> class and_add(wrapper): >>> def wrapped(self, *args, **kwargs): >>> return self.function(*args, **kwargs) + self.add ## note that we are assuming self.add exists
>>> @and_add(add = 3) ## create a decorator and decorate the function >>> def f(a,b): >>> return a+b
>>> assert f.add == 3 >>> assert f(1,2) == 6
Alternatively you can also use it this directly:
>>> def f(a,b): >>> return a+b >>> >>> assert and_add(f, add = 3)(1,2) == 6
- Example
Explicit parameter construction
You can make the init more explict, also adding defaults for the parameters:
>>> class and_add_version_2(wrapper): >>> def __init__(self, function = None, add = 3): >>> super(and_add, self).__init__(function = function, add = add) >>> def wrapped(self, *args, **kwargs): >>> return self.function(*args, **kwargs) + self.add
>>> @and_add_version_2 >>> def f(a,b): >>> return a+b >>> assert f(1,2) == 6
- Example
No recursion
The decorator is designed to have a single instance of a specific wrapper
>>> f = lambda a, b: a+b >>> assert and_add(and_add(f)) == and_add(f)
This holds even for multiple levels of wrapping:
>>> x = try_none(and_add(f)) >>> y = try_none(and_add(x)) >>> assert x == y >>> assert x(1, 'no can add') is None
- Example
wrapper vs wrapt
wrapt (wrapt.readthedocs.io) is an awesome wrapping tool. If you have static library functions, none is better. The problem we face is that wrapt is too good in pretending the wrapped up object is the same as original function:
>>> import wrapt >>> def add_value(value): >>> @wrapt.decorator >>> def wrapper(wrapped, instance, args, kwargs): >>> return wrapped(*args, **kwargs) + value >>> return wrapper
>>> def f(x,y): >>> return x*y
>>> add_three = add_value(value = 3)(f) >>> add_four = add_value(value = 4)(f) >>> assert add_four(3,4) == 16 and add_three(3,4) == 15
>>> ## but here is the problem: >>> assert encode(add_three) == encode(add_four) == encode(f)
So if we ever encode the function and send it across json/Mongo, the wrapping is lost and the user when she receives it cannot use it
>>> class add_value(wrapper): >>> def wrapped(self, *args, **kwargs): >>> return self.function(*args, **kwargs) + self.value
>>> add_three = add_value(value = 3)(f) >>> add_four = add_value(value = 4)(f) >>> encode(add_three) >>> {'value': 3, 'function': '{"py/function": "__main__.f"}', '_obj': '{"py/type": "__main__.add_value"}'} >>> encode(add_three) >>> {'value': 4, 'function': '{"py/function": "__main__.f"}', '_obj': '{"py/type": "__main__.add_value"}'}
timer¶
-
class
pyg.base._decorators.
timer
(function, n=1, time=False)¶ timer is similar to timeit but rather than execution of a Python statement, timer wraps a function to make it log its evaluation time before returning output
- Parameters
- function: callable
The function to be wraooed
- n: int, optional
Number of times the function is to be evaluated. Default is 1
- time: bool, optional
If set to True, function will return the TIME it took to evaluate rather than the original function output.
- Example
>>> from pyg import *; import datetime >>> f = lambda a, b: a+b >>> evaluate_100 = timer(f, n = 100, time = True)(1,2) >>> evaluate_10000 = timer(f, n = 10000, time = True)(1,2) >>> assert evaluate_10000> evaluate_100 >>> assert isinstance(evaluation_time, datetime.timedelta)
try_value¶
-
pyg.base._decorators.
try_value
()¶ wraps a function to try an evaluation. If an exception is thrown, returns a cached argument
- Parameters
- function callable
The function we want to decorate
- value:
If the function fails, it will return value instead. Default is None
- verbose: bool
If set to True, the logger will warn with the error message.
There are various convenience functions with specific values try_zero, try_false, try_true, try_nan and try_none will all return specific values if function fails.
- Example
>>> from pyg import * >>> f = lambda a: a[0] >>> assert try_none(f)(4) is None >>> assert try_none(f, 'failed')(4) == 'failed'
try_back¶
-
pyg.base._decorators.
try_back
()¶ wraps a function to try an evaluation. If an exception is thrown, returns first argument
- Example
>>> f = lambda a: a[0] >>> assert try_back(f)('hello') == 'h' and try_back(f)(5) == 5
loops¶
-
class
pyg.base._loop.
loops
(function=None, types=None)¶ converts a function to loop over the arguments, depending on the type of the first argument
- Examples
>>> @loop(dict, list, pd.DataFrame, pd.Series) >>> def f(a,b): >>> return a+b
>>> assert f(1,2) == 3 >>> assert f([1,2,3],2) == [3,4,5] >>> assert f([1,2,3], [4,5,6]) == [5,7,9]
>>> assert f(dict(x=1,y=2), 3) == dict(x = 4, y = 5) >>> assert f(dict(x=1,y=2), dict(x = 3, y = 4)) == dict(x = 4, y = 6)
>>> a = pd.Series(dict(x=1,y=2)) >>> b = dict(x=3,y=4) >>> assert np.all(f(a,b) == pd.Series(dict(x=4,y=6)))
>>> a = pd.DataFrame(dict(x=[1,1],y=[2,2])); a.index = [5,10] >>> b = dict(x=3,y=4) >>> res = f(a,b) >>> assert np.all(res == pd.DataFrame(dict(x=[4,4],y=[6,6]), index = [5,10]))
>>> a = pd.DataFrame(dict(x=[1,1],y=[2,2])); a.index = [5,10] >>> res = f(a,[3,4]) >>> assert np.all( res == pd.DataFrame(dict(x=[4,4],y=[6,6]), index = [5,10]))
graphs & cells¶
cell¶
-
class
pyg.base._cell.
cell
(function=None, output=None, **kwargs)¶ cell is a Dict that can be though of as a node in a calculation graph. The nearest parallel is actually an Excel cell:
cell contains both its function and its output. cell.output defines the keys where the output is supposed to be
cell contains reference to all the function outputs
cell contains its locations and the means to manage its own persistency
- Parameters
function is the function to be called
** kwargs are the function named key value args. NOTE: NO SUPPORT for *args nor **kwargs in function
output: where should the function output go?
- Example
simple construction
>>> from pyg import * >>> c = cell(lambda a, b: a+b, a = 1, b = 2) >>> assert c.a == 1 >>> c = c.go() >>> assert c.output == ['data'] and c.data == 3
- Example
make output go to ‘value’ key
>>> c = cell(lambda a, b: a+b, a = 1, b = 2, output = 'value') >>> assert c.go().value == 3
- Example
multiple outputs by function
>>> f = lambda a, b: dict(sum = a+b, prod = a*b) >>> c = cell(f, a = 1, b = 2, output = ['sum', 'prod']) >>> c = c.go() >>> assert c.sum == 3 and c.prod == 2
- Methods
cell.run() returns bool if cell needs to be run
cell.go() calculates the cell and returns the function with cell.output keys now populated.
cell.load()/cell.save() interface for self load/save persistence
-
copy
() → a shallow copy of D¶
-
go
(go=1, mode=0, **kwargs)¶ calculates the cell (if needed). By default, will then run cell.save() to save the cell. If you don’t want to save the output (perhaps you want to check it first), use cell._go()
- Parameters
- goint, optional
a parameter that forces calculation. The default is 0. go = 0: calculate cell only if cell.run() is True go = 1: calculate THIS cell regardless. calculate the parents only if their cell.run() is True go = 2: calculate THIS cell and PARENTS cell regardless, calculate grandparents if cell.run() is True etc. go = -1: calculate the entire tree again.
- **kwargsparameters
You can actually allocate the variables to the function at runtime
Note that by default, cell.go() will default to go = 1 and force a calculation on cell while cell() is lazy and will default to assuming go = 0
- Returns
- cell
the cell, calculated
- Example
different values of go
>>> from pyg import * >>> f = lambda x=None,y=None: max([dt(x), dt(y)]) >>> a = cell(f)() >>> b = cell(f, x = a)() >>> c = cell(f, x = b)() >>> d = cell(f, x = c)()
>>> e = d.go() >>> e0 = d.go(0) >>> e1 = d.go(1) >>> e2 = d.go(2) >>> e_1 = d.go(-1)
>>> assert not d.run() and e.data == d.data >>> assert e0.data == d.data >>> assert e1.data > d.data and e1.x.data == d.x.data >>> assert e2.data > d.data and e2.x.data > d.x.data and e2.x.x.data == d.x.x.data >>> assert e_1.data > d.data and e_1.x.data > d.x.data and e_1.x.x.data > d.x.x.data
- Example
adding parameters on the run
>>> c = cell(lambda a, b: a+b) >>> d = c(a = 1, b =2) >>> assert d.data == 3
-
load
(mode=0)¶ Loads the cell from the database based on primary keys of cell perhaps. Not implemented for simple cell. see db_cell
- Returns
- cell
self, updated with values from database.
-
run
()¶ checks if the cell needs calculation. This depends on the nature of the cell. By default (for cell and db_cell), if the cell is already calculated so that cell._output exists, then returns False. otherwise True
- bool
run cell?
- Example
>>> c = cell(lambda x: x+1, x = 1) >>> assert c.run() >>> c = c() >>> assert c.data == 2 and not c.run()
-
save
()¶ Saves the cell for persistency. Not implemented for simple cell. see db_cell
- Returns
- cell
self, saved.
cell_go¶
-
pyg.base._cell.
cell_go
(value, go=0, mode=0)¶ cell_go makes a cell run (using cell.go(go)) and returns the calculated cell. If value is not a cell, value is returned.
- Parameters
- valuecell
The cell (or anything else).
- goint
same inputs as per cell.go(go). 0: run if cell.run() is True 1: run this cell regardless, run parent cells only if they need to calculate too n: run this cell & its nth parents regardless.
- Returns
The calculated cell
- Example
calling non-cells
>>> assert cell_go(1) == 1 >>> assert cell_go(dict(a=1,b=2)) == dict(a=1,b=2)
- Example
calling cells
>>> c = cell(lambda a, b: a+b, a = 1, b = 2) >>> assert cell_go(c) == c(data = 3)
cell_item¶
-
pyg.base._cell.
cell_item
(value, key=None)¶ returns an item from a cell (if not cell, returns back the value). If no key is provided, will return the output of the cell
- Parameters
- valuecell or object or list of cells/objects
cell
- keystr, optional
The key within cell we are interested in. Note that key is treated as GUIDANCE only. Our strong preference is to return valid output from cell_output(cell)
- Example
non cells
>>> assert cell_item(1) == 1 >>> assert cell_item(dict(a=1,b=2)) == dict(a=1,b=2)
- Example
cells, simple
>>> c = cell(lambda a, b: a+b, a = 1, b = 2) >>> assert cell_item(c) is None >>> assert cell_item(c.go()) == 3
cell_func¶
-
pyg.base._cell.
cell_func
()¶ cell_func is a wrapped and wraps a function to act on cells rather than just on values
When called, it will returns not just the function, but also args, kwargs used to call it.
- Example
>>> from pyg import * >>> a = cell(lambda x: x**2, x = 3) >>> b = cell(lambda y: y**3, y = 2) >>> function = lambda a, b: a+b >>> self = cell_func(function) >>> result, args, kwargs = self(a,b)
>>> assert result == 8 + 9 >>> assert args[0].data == 3 ** 2 >>> assert args[1].data == 2 ** 3
cell_clear¶
-
pyg.base._cell.
cell_clear
(value)¶ cell_clear clears a cell of its output so that it contains only the essentil stuff to do its calculations. This will be used when we save the cell or we want to recalculate it.
- Example
>>> from pyg import * >>> a = cell(add_, a = 1, b = 2) >>> b = cell(add_, a = 2, b = 3) >>> c = cell(add_, a = a, b = b)() >>> assert c.data == 8 >>> assert c.a.data == 3
>>> bare = cell_clear(c) >>> assert 'data' not in bare and 'data' not in bare.a >>> assert bare() == c
- Parameters
- value: obj
cell (or list/dict of) to be cleared of output
encode and decode/save and load¶
encode¶
-
pyg.base._encode.
encode
(value)¶ encode/decode are performed prior to sending to mongodb or after retrieval from db. The idea is to make object embedding in Mongo transparent to the user.
We use jsonpickle package to embed general objects. These are encoded as strings and can be decoded as long as the original library exists when decoding.
pandas.DataFrame are encoded to bytes using pickle while numpy arrays are encoded using the faster array.tobytes() with arrays’ shape & type exposed and searchable.
- Example
>>> from pyg import *; import numpy as np >>> value = Dict(a=1,b=2) >>> assert encode(value) == {'a': 1, 'b': 2, '_obj': '{"py/type": "pyg.base._dict.Dict"}'} >>> assert decode({'a': 1, 'b': 2, '_obj': '{"py/type": "pyg.base._dict.Dict"}'}) == Dict(a = 1, b=2) >>> value = dictable(a=[1,2,3], b = 4) >>> assert encode(value) == {'a': [1, 2, 3], 'b': [4, 4, 4], '_obj': '{"py/type": "pyg.base._dictable.dictable"}'} >>> assert decode(encode(value)) == value >>> assert encode(np.array([1,2])) == {'data': bytes, >>> 'shape': (2,), >>> 'dtype': '{"py/reduce": [{"py/type": "numpy.dtype"}, {"py/tuple": ["i4", false, true]}, {"py/tuple": [3, "<", null, null, null, -1, -1, 0]}]}', >>> '_obj': '{"py/function": "pyg.base._encode.bson2np"}'}
- Example
functions and objects
>>> from pyg import *; import numpy as np >>> assert encode(ewma) == '{"py/function": "pyg.timeseries._ewm.ewma"}' >>> assert encode(Calendar) == '{"py/type": "pyg.base._drange.Calendar"}'
- Parameters
- valueobj
An object to be encoded
- Returns
A pre-json object
decode¶
-
pyg.base._encode.
decode
(value, date=None)¶ decodes a string or an object dict
- Parameters
- valuestr or dict
usually a json
- dateNone, bool or a regex expression, optional
date format to be decoded
- Returns
- obj
the json decoded.
- Examples
>>> from pyg import * >>> class temp(dict): >>> pass
>>> orig = temp(a = 1, b = dt(0)) >>> encoded = encode(orig) >>> assert eq(decode(encoded), orig) # type matching too...
pd_to_parquet¶
-
pyg.base._parquet.
pd_to_parquet
(value, path, compression='GZIP')¶ a small utility to save df to parquet, extending both pd.Series and non-string columns
- Example
>>> from pyg import * >>> import pandas as pd >>> import pytest
>>> df = pd.DataFrame([[1,2],[3,4]], drange(-1), columns = [0, dt(0)]) >>> s = pd.Series([1,2,3], drange(-2))
>>> with pytest.raises(ValueError): ## must have string column names df.to_parquet('c:/temp/test.parquet')
>>> with pytest.raises(AttributeError): ## pd.Series has no to_parquet s.to_parquet('c:/temp/test.parquet')
>>> df_path = pd_to_parquet(df, 'c:/temp/df.parquet') >>> series_path = pd_to_parquet(s, 'c:/temp/series.parquet')
>>> df2 = pd_read_parquet(df_path) >>> s2 = pd_read_parquet(series_path)
>>> assert eq(df, df2) >>> assert eq(s, s2)
pd_read_parquet¶
-
pyg.base._parquet.
pd_read_parquet
(path)¶ a small utility to read df/series from parquet, extending both pd.Series and non-string columns
- Example
>>> from pyg import * >>> import pandas as pd >>> import pytest
>>> df = pd.DataFrame([[1,2],[3,4]], drange(-1), columns = [0, dt(0)]) >>> s = pd.Series([1,2,3], drange(-2))
>>> with pytest.raises(ValueError): ## must have string column names df.to_parquet('c:/temp/test.parquet')
>>> with pytest.raises(AttributeError): ## pd.Series has no to_parquet s.to_parquet('c:/temp/test.parquet')
>>> df_path = pd_to_parquet(df, 'c:/temp/df.parquet') >>> series_path = pd_to_parquet(s, 'c:/temp/series.parquet')
>>> df2 = pd_read_parquet(df_path) >>> s2 = pd_read_parquet(series_path)
>>> assert eq(df, df2) >>> assert eq(s, s2)
parquet_encode¶
-
pyg.mongo._encoders.
parquet_encode
(value, path, compression='GZIP')¶ encodes a single DataFrame or a document containing dataframes into a an abject that can be decoded
>>> from pyg import * >>> path = 'c:/temp' >>> value = dict(key = 'a', n = np.random.normal(0,1, 10), data = dictable(a = [pd.Series([1,2,3]), pd.Series([4,5,6])], b = [1,2]), other = dict(df = pd.DataFrame(dict(a=[1,2,3], b= [4,5,6])))) >>> encoded = parquet_encode(value, path) >>> assert encoded['n']['file'] == 'c:/temp/n.npy' >>> assert encoded['data'].a[0]['path'] == 'c:/temp/data/a/0.parquet' >>> assert encoded['other']['df']['path'] == 'c:/temp/other/df.parquet'
>>> decoded = decode(encoded) >>> assert eq(decoded, value)
csv_encode¶
-
pyg.mongo._encoders.
csv_encode
(value, path)¶ encodes a single DataFrame or a document containing dataframes into a an abject that can be decoded while saving dataframes into csv
>>> path = 'c:/temp' >>> value = dict(key = 'a', data = dictable(a = [pd.Series([1,2,3]), pd.Series([4,5,6])], b = [1,2]), other = dict(df = pd.DataFrame(dict(a=[1,2,3], b= [4,5,6])))) >>> encoded = csv_encode(value, path) >>> assert encoded['data'].a[0]['path'] == 'c:/temp/data/a/0.csv' >>> assert encoded['other']['df']['path'] == 'c:/temp/other/df.csv'
>>> decoded = decode(encoded) >>> assert eq(decoded, value)
convertors to bytes¶
-
pyg.base._encode.
pd2bson
(value)¶ converts a value (usually a pandas.DataFrame/Series) to bytes using pickle
-
pyg.base._encode.
np2bson
(value)¶ converts a numpy array to bytes using value.tobytes(). This is much faster than pickle but does not save shape/type info which we save separately.
-
pyg.base._encode.
bson2np
(data, dtype, shape)¶ converts a byte with dtype and shape information into a numpy array.
-
pyg.base._encode.
bson2pd
(data)¶ converts a pickled object back to an object. We insist that new object has .shape to ensure we did not unpickle gibberish.
dates and calendar¶
dt¶
-
pyg.base._dates.
dt
(*args, dialect='uk', none=<built-in method now of type object>)¶ A more generic constructor for datetime.datetime.
- Example
Simple construction
>>> assert dt(2000,1 ,1) == datetime.datetime(2000, 1, 1, 0, 0) # name of month >>> assert dt(2000,'jan',1) == datetime.datetime(2000, 1, 1, 0, 0) # name of month >>> assert dt(2000,'f',1) == datetime.datetime(2000, 1, 1, 0, 0) # future month code >>> assert dt('01-02-2002') == datetime.datetime(2002, 2, 1) >>> assert dt('01-02-2002', dialect = 'US') == datetime.datetime(2002, 1, 2) >>> assert dt('01 March 2002') == datetime.datetime(2002, 3, 1) >>> assert dt('01 March 2002', dialect = 'US') == datetime.datetime(2002, 3, 1) >>> assert dt('01 March 2002 10:20:30') == datetime.datetime(2002, 3, 1, 10, 20, 30)
>>> assert dt(20020301) == datetime.datetime(2002, 3, 1) >>> assert dt(37316) == datetime.datetime(2002, 3, 1) # excel date >>> assert dt(730180) == datetime.datetime(2000,3,1) # ordinal for 1/3/2000 >>> assert dt(2000,3,1).timestamp() == 951868800.0 >>> assert dt(951868800.0) == datetime.datetime(2000,3,1) # utc timestamp >>> assert dt(np.datetime64(dt(2000,3,1))) == dt(2000,3,1) ## numpy.datetime64 object
>>> assert dt(2000) == datetime.datetime(2000,1,1) >>> assert dt(2000,3) == datetime.datetime(2000,3,1) >>> assert dt(2000,3, 1) == datetime.datetime(2000,3,1) >>> assert dt(2000,3, 1, 10,20,30) == datetime.datetime(2000,3,1,10,20,30) >>> assert dt(2000,'march', 1) == datetime.datetime(2000,3,1) >>> assert dt(2000,'h', 1) == datetime.datetime(2000,3,1) # future codes
- Example
date as offset from today
>>> today = dt(0); >>> import datetime >>> day = datetime.timedelta(1) >>> assert dt(-3) == today - 3 * day >>> assert dt('-10b') == today - 14 * day
- Example
datetime arithmetic:
dt has an interesting logic in implementing datetime arithmentic:
day and month parameters can be negative or bigger than the days of month
dt() will roll back/forward from the date which is valid
>>> assert dt(2000,4,1) == datetime.datetime(2000, 4, 1, 0, 0) >>> assert dt(2000,4,0) == datetime.datetime(2000, 3, 31, 0, 0) # a day before dt(2000,4,1)
and rolling back months:
>>> assert dt(2000,0,1) == datetime.datetime(1999, 12, 1, 0, 0) # a month before dt(2000,1,1) >>> assert dt(2000,13,1) == datetime.datetime(2001, 1, 1, 0, 0) # a month after dt(2000,12,1)
This may feel unnatural at first, but does allow for much nicer code, e.g.: [dt(2000,i,1) for i in range(-10,10)]
- Parameters
- *argsstr, int or dates
argument to be converted into dates
- dialectstr, optional
parsing of 01/02/2020 is it 1st Feb or 2nd Jan? The default is ‘uk’, i.e. dd/mm/yyyy
- nonecallable, optional
What is dt()? The default is datetime.datetime.now()
ymd¶
-
pyg.base._dates.
ymd
(*args, dialect='uk', none=<built-in method now of type object>)¶ just like dt() but always returns date only (year/month/date) without fractions. see dt() for full documentation
datetime.datetime
dt_bump¶
-
pyg.base._dates.
dt_bump
(t, *bumps)¶ - Example
>>> from pyg import * >>> t = pd.Series([1,2,3], drange(dt(2000,1,1),2)) >>> assert eq(dt_bump(t, 1), pd.Series([1,2,3], drange(dt(2000,1,2),2)))
drange¶
-
pyg.base._drange.
drange
(t0=None, t1=None, bump=None)¶ A quick and happy wrapper for dateutil.rrule
- Examples
>>> drange(2000, 10, 1) # 10 days starting from dt(2000,1,1) >>> drange(2000, '10b', '1b') # weekdays between dt(2000,1,1) and dt(2000,1,17) >>> drange('-10b', 0, '1b') # business days since 10 bdays ago >>> drange('-10b', '10b', '1w') # starting 10b days ago, to 10b from now, counting in weekly jumps
- Parameters
- t0date, optional
start date. The default is None.
- t1date, optional
end date. The default is None.
- bumptimedelta, int, string, optional
bump period. The default is None.
- Returns
list of dates
- Example
>>> t0 = 2000; t1 = 1999 >>> bump = '-1b'
- Example
>>> t0 = dt(2020); t1 = dt(2021); bump = datetime.timedelta(hours = 4)
Calendar¶
-
class
pyg.base._drange.
Calendar
(key=None, holidays=None, weekend=None, t0=None, t1=None, adj='m', day_start=0, day_end=235959)¶ - Calendar is
a dict
containing holiday dates
implementing business day arithmetic
Calendar is restricted to operate between cal.t0 and cal.t1 which default to TMIN = 1900 and TMAX = 2300
- Calendar does this by having two key members:
dt2int: a mapping from all business dates to their integer ‘clock’
int2dt: a mapping from integer value to the date
Since Calendar is an ‘expensive’ memory wise, we assign a key to the calendar and the Calendar is stored in the singleton calendars under this key
- Example
>>> from pyg import * >>> holidays = dictable([[1,'2012-01-02','New Year Day',], [2,'2012-01-16','Martin Luther King Jr. Day',], [3,'2012-02-20','Presidents Day (Washingtons Birthday)',], [4,'2012-05-28','Memorial Day',], [5,'2012-07-04','Independence Day',], [6,'2012-09-03','Labor Day',], [7,'2012-10-08','Columbus Day',], [8,'2012-11-12','Veterans Day',], [9,'2012-11-22','Thanksgiving Day',], [10,'2012-12-25','Christmas Day',], [11,'2013-01-01','New Year Day',], [12,'2013-01-21','Martin Luther King Jr. Day',], [13,'2013-02-18','Presidents Day (Washingtons Birthday)',], [14,'2013-05-27','Memorial Day',], [15,'2013-07-04','Independence Day',], [16,'2013-09-02','Labor Day',], [17,'2013-10-14','Columbus Day',], [18,'2013-11-11','Veterans Day',], [19,'2013-11-28','Thanksgiving Day',], [20,'2013-12-25','Christmas Day',], [21,'2014-01-01','New Year Day',], [22,'2014-01-20','Martin Luther King Jr. Day',], [23,'2014-02-17','Presidents Day (Washingtons Birthday)',], [24,'2014-05-26','Memorial Day',], [25,'2014-07-04','Independence Day',], [26,'2014-09-01','Labor Day',], [27,'2014-10-13','Columbus Day',], [28,'2014-11-11','Veterans Day',], [29,'2014-11-27','Thanksgiving Day',],], ['i', 'date', 'name']).do(dt, 'date')
>>> cal = calendar('US', holidays.date, t0 = 2012, t1 = 2015) >>> assert not cal.is_bday(dt(2013,9,2)) # Labor day
>>> cached_calendar = calendar('US') >>> assert not cached_calendar.is_bday(dt(2013,9,2)) # Labor day
>>> assert cal.adjust(dt(2013,9,2)) == dt(2013,9,3) >>> assert cal.drange(dt(2013,9,0), dt(2013,9,7), '1b') == [dt(2013,8,30), dt(2013,9,3), dt(2013,9,4), dt(2013,9,5), dt(2013,9,6),] ## skipped labour day and weekend prior
>>> assert cal.bdays(dt(2013,9,0), dt(2013,9,7)) == 5
-
adjust
(date, adj=None)¶ adjust a non-business day to prev/following bussiness date
- Parameters
date : datetime. adj : None or p/f/m
adjustment convention: ‘prev/following/modified following’
- Returns
- dateime
nearby business day
-
dt_bump
(t, bump, adj=None)¶ adds a bump to a date
- Parameters
- tdatetime
date to bump.
- bumpint, str
bump e.g. ‘-1y’ or ‘1b’ or 3
- adjadjustement type
The default is None.
- Returns
- datetime
bumped date.
-
is_trading
(date=None)¶ calculates if we are within a trading session
- Parameters
- datedatetime, optional
the time & date we want to check. The default is None (i.e. now)
- Returns
- bool:
are we within a trading session
-
trade_date
(date=None, adj=None)¶ This is very similar for adjust, but it also takes into account the time of the day. if day_start = 0 and day_end = 23:59:59 then this is exactly adjust.
- Parameters
- datedatetime, optional
date (with time). The default is None.
- adjf/p, optional
If date isn’t within trading day, which direction to adjust to? The default is None.
- Example
>>> from pyg import *; import datetime
>>> uk = calendar('UK', day_start = 8, day_end = 17) >>> assert uk.trade_date(dt(2021,2,9,5), 'f') == dt(2021, 2, 9) # Tuesday morning rolls into Tuesday >>> assert uk.trade_date(dt(2021,2,9,5), 'p') == dt(2021, 2, 8) # Tuesday morning back into Monday >>> assert uk.trade_date(dt(2021,2,7,5), 'f') == dt(2021, 2, 8) # Sunday rolls into Monday >>> assert uk.trade_date(dt(2021,2,7,5), 'p') == dt(2021, 2, 5) # Sunday rolls back to Friday
>>> assert uk.trade_date(date = dt(2021,2,9,23), adj = 'f') == dt(2021, 2, 10) # Tuesday eve rolls into Wed >>> assert uk.trade_date(date = dt(2021,2,9,23), adj = 'p') == dt(2021, 2, 9) # Tuesday eve back into Tuesday >>> assert uk.trade_date(date = dt(2021,2,7,23), adj = 'f') == dt(2021, 2, 8) # Sunday rolls into Monday >>> assert uk.trade_date(date = dt(2021,2,7,23), adj = 'p') == dt(2021, 2, 5) # Sunday rolls back to Friday
>>> assert uk.trade_date(date = dt(2021,2,9,12), adj = 'f') == dt(2021, 2, 9) # Tuesday is Tuesday >>> assert uk.trade_date(date = dt(2021,2,9,12), adj = 'p') == dt(2021, 2, 9) # Tuesday is Tuesday
>>> au = calendar('AU', day_start = 2230, day_end = 1300) >>> assert au.trade_date(dt(2021,2,9,5), 'f') == dt(2021, 2, 9) # Tuesday morning in session >>> assert au.trade_date(dt(2021,2,9,5), 'p') == dt(2021, 2, 9) # Tuesday morning in session >>> assert au.trade_date(dt(2021,2,7,5), 'f') == dt(2021, 2, 8) # Sunday rolls into Monday >>> assert au.trade_date(dt(2021,2,7,5), 'p') == dt(2021, 2, 5) # Sunday rolls back to Friday
>>> assert au.trade_date(date = dt(2021,2,9,23), adj = 'f') == dt(2021, 2, 10) # Tuesday eve rolls into Wed >>> assert au.trade_date(date = dt(2021,2,9,23), adj = 'p') == dt(2021, 2, 10) # Already in Wed >>> assert au.trade_date(date = dt(2021,2,7,23), adj = 'f') == dt(2021, 2, 8) # Sunday rolls into Monday >>> assert au.trade_date(date = dt(2021,2,7,23), adj = 'p') == dt(2021, 2, 8) # Already on Monday >>> assert au.trade_date(date = dt(2021,2,5,23), adj = 'f') == dt(2021, 2, 8) # Friday afternoon rolls into Monday
>>> assert au.trade_date(date = dt(2021,2,9,14), adj = 'f') == dt(2021, 2, 10) # Tuesday is over, roll to Wed >>> assert au.trade_date(date = dt(2021,2,9,14), adj = 'p') == dt(2021, 2, 9) # roll back to Tues
calendar¶
-
pyg.base._drange.
calendar
(key=None, holidays=None, weekend=None, t0=None, t1=None, day_start=0, day_end=235959)¶ A function to returns either an existing calendar or construct a new one. - calendar(‘US’) will return a US calendar if that is already cached - calendar(‘US’, us_holiday_dates) will construct a calendar with holiday dates and then cache it
as_time¶
-
pyg.base._drange.
as_time
(t=None)¶ parses t into a datetime.time object
- Example
>>> assert as_time('10:30:40') == datetime.time(10, 30, 40) >>> assert as_time('103040') == datetime.time(10, 30, 40) >>> assert as_time('10:30') == datetime.time(10, 30) >>> assert as_time('1030') == datetime.time(10, 30) >>> assert as_time('05') == datetime.time(5) >>> assert as_time(103040) == datetime.time(10, 30, 40) >>> assert as_time(13040) == datetime.time(1, 30, 40) >>> assert as_time(130) == datetime.time(1, 30) >>> assert as_time(datetime.time(1, 30)) == datetime.time(1, 30) >>> assert as_time(datetime.datetime(2000, 1, 1, 1, 30)) == datetime.time(1, 30)
- tstr/int/datetime.time/datetime.datetime
time of day
datetime.time
clock¶
-
pyg.base._drange.
clock
(ts, time=None, t=None)¶ returns a vector marking the passage of time.
- Parameters
ts : timeseries time : None, a string or a Calendar, or already a timeseries of times
None: Will increment by 1 every non-nan observation ‘i’ : increment by 1 every date in index (nan or not) ‘b’ : weekdays distance ‘d’ : day-distance (ignore intraday stamp) ‘f’ : fraction-of-day-distance (do not ignore intraday stamp) ‘m’ : month-distance ‘q’ : quarter-distance ‘y’ : year-distance calendar: uses the business-days distance between any two dates
t: starting time in the past.
- Returns
- an array
an increasing array of time such that distance between points match the above.
- Example
>>> from pyg import * >>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9))), np.arange(1,11)) >>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), t = 5), np.arange(6,16)) >>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), 'i'), np.arange(1,11)) >>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), 'b'), np.array([26090, 26090, 26090, 26091, 26092, 26093, 26094, 26095, 26095, 26095])) >>> assert eq(clock(pd.Series(np.arange(10), drange(2000, '9b', '1b')), 'b'), np.arange(26090, 26100))
>>> assert eq(clock(np.arange(10)), np.arange(1,11)) >>> assert eq(clock(pd.Series(np.arange(10)), t = 5), np.arange(6,16)) >>> assert eq(clock(np.arange(10), 'i'), np.arange(1,11))
text manipulation¶
lower¶
-
pyg.base._txt.
lower
(value)¶ - equivalent to txt.lower() but:
does not throw on non-string
supports lists/dicts
- Example
>>> assert lower(['The Brown Fox',1]) == ['the brown fox',1] >>> assert lower(dict(a = 'The Brown Fox', b = 3.0)) == {'a': 'the brown fox', 'b': 3.0}
upper¶
-
pyg.base._txt.
upper
(value)¶ - equivalent to txt.upper() but:
does not throw on non-string
supports lists/dicts
- Example
>>> assert upper(['The Brown Fox',1]) == ['THE BROWN FOX',1] >>> assert upper(dict(a = 'The Brown Fox', b = 3.0)) == {'a': 'THE BROWN FOX', 'b': 3.0}
proper¶
-
pyg.base._txt.
proper
(value)¶ - equivalent to Excel’s PROPER(txt) but:
does not throw on non-string
supports lists/dicts
- Example
>>> assert proper(['THE BROWN FOX',1]) == ['The Brown Fox',1] >>> assert proper(dict(a = 'THE BROWN FOX', b = 3.0)) == {'a': 'The Brown Fox', 'b': 3.0}
capitalize¶
-
pyg.base._txt.
capitalize
(value)¶ - equivalent to text.capitalize() but:
does not throw on non-string
supports lists/dicts
- Example
>>> assert capitalize('alan howard') == 'Alan howard' # use proper to get Alan Howard >>> assert capitalize(['alan howard', 'donald trump']) == ['Alan howard', 'Donald trump'] # use proper?
strip¶
-
pyg.base._txt.
strip
(value)¶ - equivalent to txt.strip() but:
does not throw on non-string
supports lists/dicts
- Example
>>> assert strip([' whatever you say ',' whatever you do.. ']) == ['whatever you say', 'whatever you do..'] >>> assert strip(dict(a = ' whatever you say ', b = 3.0)) == {'a': 'whatever you say', 'b': 3.0}
split¶
-
pyg.base._txt.
split
(text, sep=' ', dedup=False)¶ - equivalent to txt.split(sep) but supporsts:
does not throw on non-string
removal of multiple seps
ensuring there is a unique single separator
- Parameters
- textstr
text to be stipped.
- sepstr, list of str, optional
text used to strip. The default is ‘ ‘.
- dedupbool, optional
If True, will remove duplicated instances of seps. The default is False.
- Returns
- str
splitted text
- Example
>>> text = ' The quick... brown .. fox... ' >>> assert split(text) == ['', '', '', 'The', 'quick...', 'brown', '..', 'fox...', ''] >>> assert split(text, [' ', '.'], True) == ['The', 'quick', 'brown', 'fox'] >>> text = dict(a = 'Can split this', b = '..and split this too') >>> assert split(text, [' ', '.'], True) == {'a': ['Can', 'split', 'this'], 'b': ['and', 'split', 'this', 'too']}
replace¶
-
pyg.base._txt.
replace
(text, old, new=None)¶ A souped up version of text.replace(old, new)
- Example
replace continues to replace until no-more is found
>>> assert replace('this has lots of double spaces', ' '*2, ' ') == 'this has lots of double spaces' >>> assert replace('this, sentence? has! too, many, punctuations!', list(',?!.')) == 'this sentence has too many punctuations' >>> assert replace(dict(a = 1, b = [' text within a list ', 'and within a dict']), ' ') == {'a': 1, 'b': ['textwithinalist', 'andwithinadict']}
common_prefix¶
-
pyg.base._txt.
common_prefix
(*values)¶ - Parameters
- *valueslist of iterables
values for which we want to find common prefix
- Returns
- iterable
the common prefix.
- Example
>>> assert common_prefix(['abra', 'abba', 'abacus']) == 'ab' >>> assert common_prefix('abra', 'abba', 'abacus') == 'ab' >>> assert common_prefix() is None >>> assert common_prefix([1,2,3,4], [1,2,3,5,8]) == [1,2,3]
files & directory¶
tree manipulation¶
Trees are dicts of dicts. just like an item in a dict is (key, value), tree items are just longer tuples: (key1, key2, key3, value) We deliberately avoid creating a tree class so that the functionality is available on ordinary tree-like structures.
tree_keys¶
-
pyg.base._dict.
tree_keys
(tree, types=None)¶ returns the keys (branches) of a tree as a list of of tuples
- Example
>>> tree = dict(a = 1, b = dict(c = 2, d = 3, e = dict(f = 4))) >>> assert tree_keys(tree) == [('a',), ('b', 'c'), ('b', 'd'), ('b', 'e', 'f')]
- Parameters
tree : tree (dict of dicts) types : types of dicts, optional
tree_values¶
-
pyg.base._dict.
tree_values
(tree, types=None)¶ returns the values (leaf) of a tree (a collection of tuples)
- Example
>>> tree = dict(a = 1, b = dict(c = 2, d = 3, e = dict(f = 4))) >>> assert tree_values(tree) == [1,2,3,4]
- Parameters
tree : tree (dict of dicts) types : types of dicts, optional
tree_items¶
-
pyg.base._dict.
tree_items
(tree, types=None)¶ An extension of dict.items(), returning a list of tuples but of varying length, each a branch of a tree
- Parameters
- treedict of dicts
a tree of data.
- typesdict or a list of dict-types, optional
The types that we consider as ‘branches’ of the tree. Default is (dict, Dict, dictattr).
- Returns
- a list of tuples
these are an extension of dict.items() and are of varying length
- Example
>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'), id2 = dict(name = 'adam', surname = 'smith', gender = 'm'), id3 = dict(name = 'michell', surname = 'obama', gender = 'f'), ), teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3), english = dict(name = 'william', surname = 'shakespeare', grade = 3), physics = dict(name = 'richard', surname = 'feyman', grade = 4) ))
>>> items = tree_items(school) >>> items
>>> [('pupils', 'id1', 'name', 'james'), >>> ('pupils', 'id1', 'surname', 'maxwell'), >>> ('pupils', 'id1', 'gender', 'm'), >>> ('pupils', 'id2', 'name', 'adam'), >>> ('pupils', 'id2', 'surname', 'smith'), >>> ('pupils', 'id2', 'gender', 'm'), >>> ('pupils', 'id3', 'name', 'michell'), >>> ('pupils', 'id3', 'surname', 'obama'), >>> ('pupils', 'id3', 'gender', 'f'), >>> ('teachers', 'math', 'name', 'albert'), >>> ('teachers', 'math', 'surname', 'einstein'), >>> ('teachers', 'math', 'grade', 3), >>> ('teachers', 'english', 'name', 'william'), >>> ('teachers', 'english', 'surname', 'shakespeare'), >>> ('teachers', 'english', 'grade', 3), >>> ('teachers', 'physics', 'name', 'richard'), >>> ('teachers', 'physics', 'surname', 'feyman'), >>> ('teachers', 'physics', 'grade', 4)]
#To reverse this, we call:
>>> assert items_to_tree(items) == school
tree_update¶
-
pyg.base._dict.
tree_update
(tree, update, types=(<class 'dict'>, <class 'pyg.base._dict.Dict'>, <class 'pyg.base._dictattr.dictattr'>), ignore=None)¶ equivalent to dict.update() except: not in-place and also updates further down the tree
- Example
>>> ranking = dict(cambridge = dict(trinity = 1, stjohns = 2, christ = 3), oxford = dict(trinity = 1, jesus = 2, magdalene = 3)) >>> new_ranking = dict(oxford = dict(wolfson = 3, magdalene = 4))
>>> print(tree_repr(tree_update(ranking, new_ranking)))
>>> cambridge: >>> {'trinity': 1, 'stjohns': 2, 'christ': 3} >>> oxford: >>> {'trinity': 1, 'jesus': 2, 'magdalene': 4, 'wolfson': 3}
Note how values for magdalene in Oxford were overwritten even though they are further down the tree
- Example
using ignore
>>> update = dict(a = None, b = np.nan, c = 0) >>> tree = dict(a = 1, b = 2, c = 3) >>> assert tree_update(tree, update) == update >>> assert tree_update(tree, update, ignore = [None]) == dict(a = 1, b = np.nan, c = 0) >>> assert tree_update(tree, update, ignore = [None, np.nan]) == dict(a = 1, b = 2, c = 0) >>> assert tree_update(tree, update, ignore = [None, np.nan, 0]) == tree
- Parameters
- treetree
existing tree.
- updatetree
new information.
- typestypes, optional
see tree_items. The default is (dict, Dict, dictattr).
- Returns
- tree
updated tree.
tree_setitem¶
-
pyg.base._dict.
tree_setitem
(tree, key, value, ignore=None, types=None)¶ sets an item of a tree
- Parameters
tree : tree (dicts of dict) key : a dot-separated string or a tuple of values
the branch to hang value on
- valueobject
the leaf at the end of the branch
- ignoreNone or list, optional
what values of leaf will be ignored and not overwrite existing data. The default is None.
- typestypes, optional
As we go down the tree, when do we stop and say: what we have is a leaf already?
- Example
>>> tree = dict() >>> tree_setitem(tree, 'a', 1) >>> assert tree == dict(a = 1) >>> tree_setitem(tree, 'b.c', 2) >>> assert tree == {'a': 1, 'b': {'c': 2}} >>> tree_setitem(tree, ('b','c','d'), 2) >>> tree_setitem(tree, ('b','c','e'), 3) >>> assert tree == {'a': 1, 'b': {'c': {'d': 2, 'e': 3}}}
- Example
types
>>> from pyg import * >>> tree = dict(mycell = cell(lambda a, b: a * b, b = 2, a = cell(lambda x: x**2, x = cell(lambda y: y*3)))) >>> # We are missing y.... >>> tree_setitem(tree, 'mycell.a.x.y', 3, types = (dict,cell)) ## drill into cell >>> assert tree['mycell'].a.x.y == 3 >>> tree_setitem(tree, 'mycell.a.x.y', 1) ## stop when you hit cell >>> assert tree['mycell'].a.x == dict(y = 1)
None.
tree_repr¶
-
pyg.base._tree_repr.
tree_repr
(value, offset=0)¶ a cleaner representation of a tree
- Example
>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'), >>> id2 = dict(name = 'adam', surname = 'smith', gender = 'm'), >>> id3 = dict(name = 'michell', surname = 'obama', gender = 'f'), >>> ), >>> teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3), >>> english = dict(name = 'william', surname = 'shakespeare', grade = 3), >>> physics = dict(name = 'richard', surname = 'feyman', grade = 4) >>> ))
>>> print(tree_repr(school, 4)) >>> pupils: >>> id1: >>> {'name': 'james', 'surname': 'maxwell', 'gender': 'm'} >>> id2: >>> {'name': 'adam', 'surname': 'smith', 'gender': 'm'} >>> id3: >>> {'name': 'michell', 'surname': 'obama', 'gender': 'f'} >>> teachers: >>> math: >>> {'name': 'albert', 'surname': 'einstein', 'grade': 3} >>> english: >>> {'name': 'william', 'surname': 'shakespeare', 'grade': 3} >>> physics: >>> {'name': 'richard', 'surname': 'feyman', 'grade': 4}
- Parameters
value : a tree
- offsetint, optional
offset from the left for printing. The default is 0.
- Returns
- string
a tree-like string representation of a dict-of-dicts.
items_to_tree¶
-
pyg.base._dict.
items_to_tree
(items, tree=None, raise_if_duplicate=True, ignore=None, types=None)¶ converts items to branches of a tree. If an original tree is provided, hang the additional branches on the existing tree If ignore is provided as a list of values, will not overwrite branches with last value (the leaf) in these values
- Example
>>> items = [('cambridge', 'smith', 'economics',), ('cambridge', 'keynes', 'economics'), ('cambridge', 'lyons', 'maths'), ('cambridge', 'maxwell', 'maths'), ('oxford', 'penrose', 'maths'), ]
>>> tree = items_to_tree(items) >>> print(tree_repr(tree))
>>> cambridge: >>> smith: >>> economics >>> keynes: >>> economics >>> lyons: >>> maths >>> maxwell: >>> maths >>> oxford: >>> {'penrose': 'maths'}
We can add to tree:
- Parameters
- itemslist of tuples,
items are just like dict items, only longer,
- treetree, optional
a pre-existing tree of trees. The default is None.
- raise_if_duplicateTYPE, optional
DESCRIPTION. The default is True.
- ignorelist, optional
list of values that when over-writing an existing tree, should ignore. The default is None.
- Example
using ignore
>>> tree = dict(a = 1, b = 'keep_old_value') >>> update = dict(a = 'valid_new_value', b = None, c = None) >>> tree_update(tree, update, ignore = [None]) >>> {'a': valid_new_value, 'b': 'keep_old_value', 'c': None}
a is over-ridden as the new value is valid
b is not over-ridden since the update b = None is considereed invalid
c is added as it did not exist before, even though c = None is invalid value
- Returns
tree : dict of dicts
tree_to_table¶
-
pyg.base._tree.
tree_to_table
(tree, pattern)¶ The best way to understand is to give an example:
- Examples
>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'), id2 = dict(name = 'adam', surname = 'smith', gender = 'm'), id3 = dict(name = 'michell', surname = 'obama', gender = 'f'), ), teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3), english = dict(name = 'william', surname = 'shakespeare', grade = 3), physics = dict(name = 'richard', surname = 'feyman', grade = 4) ))
Suppose we wanted to identify all male students:
>>> res = tree_to_table(school, 'pupils/%id/gender/m') >>> assert res == [dict(id = 'id1'), dict(id = 'id2')]
or grades:
>>> res = tree_to_table(school, 'teachers/%subject/grade/%grade') >>> assert res == [{'grade': 3, 'subject': 'math'}, {'grade': 3, 'subject': 'english'}, {'grade': 4, 'subject': 'physics'}]
- Parameters
- treetree (dict of dicts)
tree is a yaml-like structure
- patternstring
The pattern whose instances we wish to find in tree
- Returns
list of dicts
list functions¶
as_list¶
-
pyg.base._as_list.
as_list
(value, none=False)¶ returns a list of the original object.
- Example
>>> assert as_list(None) == [] >>> assert as_list(4) == [4] >>> assert as_list((1,2,)) == [1,2] >>> assert as_list([1,2,]) == [1,2] >>> assert eq(as_list(np.array([1,2,])) , [np.array([1,2,])]) >>> assert as_list(dict(a = 1)) == [dict(a=1)]
In practice, this function is has an incredible useful usage:
- Example
using as_list to give flexibility on *args
>>> def my_sum(*values): >>> values = as_list(values) >>> return sum(values)
>>> assert my_sum(1,2,3) == 6 >>> assert my_sum([1,2,3]) == 6 ## This is nice... wasn't possible before
- Parameters
value : anything none : bool optional
Shall I return None as a value? The default is False and we return [], if True, returns [None]
- Returns
- list
a list of original objects.
as_tuple¶
-
pyg.base._as_list.
as_tuple
(value, none=False)¶ returns a tuple of the original object.
- Example
>>> assert as_tuple(None) == () >>> assert as_tuple(4) == (4,) >>> assert as_tuple((1,2,)) == (1,2) >>> assert as_tuple([1,2,]) == (1,2) >>> assert eq(as_tuple(np.array([1,2,])) , (np.array([1,2,]),)) >>> assert as_tuple(dict(a = 1)) == (dict(a=1),)
In practice, this function is has an incredible useful usage:
- Example
using as_list to give flexibility on *args
>>> def my_sum(*values): >>> values = as_tuple(values) >>> return sum(values)
>>> assert my_sum(1,2,3) == 6 >>> assert my_sum([1,2,3]) == 6 ## This is nice... wasn't possible before
- Parameters
value : anything none : bool optional
Shall I return None as a value? The default is False and we return [], if True, returns [None]
- Returns
- tuple
a tuple of original objects.
first¶
-
pyg.base._as_list.
first
(value)¶ returns the first value in a list (None if empty list) or the original if value not a list
- Example
>>> assert first(5) == 5 >>> assert first([5,5]) == 5 >>> assert first([]) is None >>> assert first([1,2]) == 1
last¶
-
pyg.base._as_list.
last
(value)¶ returns the last value in a list (None if empty list) or the original if value not a list
- Example
>>> assert last(5) == 5 >>> assert last([5,5]) == 5 >>> assert last([]) is None >>> assert last([1,2]) == 2
unique¶
-
pyg.base._as_list.
unique
(value)¶ returns the asserted unique value in a list (None if empty list) or the original if value not a list. Throws an exception if list non-unique
- Example
>>> assert unique(5) == 5 >>> assert unique([5,5]) == 5 >>> assert unique([]) is None >>> with pytest.raises(ValueError): >>> unique([1,2])
Comparing and Sorting¶
cmp¶
-
pyg.base._sort.
cmp
(x, y)¶ Implements lexcompare while allowing for comparison of different types. First compares by type, then by length, then by keys and finally on values
- Parameters
- xobj
1st object to be compared.
- yobj
2nd object to be compared.
- Returns
- int
returns -1 if x<y else 1 if x>y else 0
- Examples
>>> assert cmp('2', 2) == 1 >>> assert cmp(np.int64(2), 2) == 0 >>> assert cmp(None, 2.0) == -1 # None is smallest >>> assert cmp([1,2,3], [4,5]) == 1 # [1,2,3] is longer >>> assert cmp([1,2,3], [1,2,0]) == 1 # lexical sorting >>> assert cmp(dict(a = 1, b = 2), dict(a = 1, c = 2)) == -1 # lexical sorting on keys >>> assert cmp(dict(a = 1, b = 2), dict(b = 2, a = 1)) == 0 # order does not matter
Cmp¶
-
pyg.base._sort.
Cmp
(x)¶ class wrapper of cmp, allowing us to compare objects of different types
- Example
>>> with pytest.raises(TypeError): >>> sorted([1,2,3,None])
>>> # but this is fine: >>> assert sorted([1,3,2,None], key = Cmp) == [None, 1, 2, 3]
sort¶
-
pyg.base._sort.
sort
(iterable)¶ implements sorting allowing for comparing of not-same-type objects
- Parameters
- iterableiterable
values to be sorted
- Returns
- list
sorted list.
- Example
>>> with pytest.raises(TypeError): >>> sorted([1,2,3,None]) >>> # but this is fine: >>> sort([1,3,2,None]) == [None, 1, 2, 3]
eq¶
-
pyg.base._eq.
eq
(x, y)¶ A better nan-handling equality comparison. Here is the problem:
>>> import numpy as np >>> assert not np.nan == np.nan ## What?
The nan issue extends to np.arrays…
>>> assert list(np.array([np.nan,2]) == np.array([np.nan,2])) == [False, True]
but not to lists…
>>> assert [np.nan] == [np.nan]
But wait, if the lists are derived from np.arrays, then no equality…
>>> assert not list(np.array([np.nan])) == list(np.array([np.nan]))
The other issue is inheritance:
>>> class FunnyDict(dict): >>> def __getitem__(self, key): >>> return 5 >>> assert dict(a = 1) == FunnyDict(a=1) ## equality seems to ignore any type mismatch >>> assert not dict(a = 1)['a'] == FunnyDict(a = 1)['a']
There are also issues with partial
>>> from functools import partial >>> f = lambda a: a + 1 >>> x = partial(f, a = 1) >>> y = partial(f, a = 1) >>> assert not x == y
>>> import pandas as pd >>> import pytest >>> from pyg import eq
>>> assert eq(np.nan, np.nan) ## That's better >>> assert eq(x = np.array([np.nan,2]), y = np.array([np.nan,2])) >>> assert eq(np.array([np.array([1,2]),2], dtype = 'object'), np.array([np.array([1,2]),2], dtype = 'object')) >>> assert eq(np.array([np.nan,2]),np.array([np.nan,2])) >>> assert eq(dict(a = np.array([np.array([1,2]),2], dtype = 'object')) , dict(a = np.array([np.array([1,2]),2], dtype = 'object'))) >>> assert eq(dict(a = np.array([np.array([1,np.nan]),np.nan])) , dict(a = np.array([np.array([1,np.nan]),np.nan]))) >>> assert eq(np.array([np.array([1,2]),dict(a = np.array([np.array([1,2]),2]))]), np.array([np.array([1,2]),dict(a = np.array([np.array([1,2]),2]))]))
>>> assert not eq(dict(a = 1), FunnyDict(a=1)) >>> assert eq(1, 1.0) >>> assert eq(x = pd.DataFrame([1,2]), y = pd.DataFrame([1,2])) >>> assert eq(pd.DataFrame([np.nan,2]), pd.DataFrame([np.nan,2])) >>> assert eq(pd.DataFrame([1,np.nan], columns = ['a']), pd.DataFrame([1,np.nan], columns = ['a'])) >>> assert not eq(pd.DataFrame([1,np.nan], columns = ['a']), pd.DataFrame([1,np.nan], columns = ['b']))
bits and pieces¶
type functions¶
-
pyg.base._types.
is_arr
(value)¶ is value a numpy array of non-zero-size
-
pyg.base._types.
is_date
(value)¶ is value a date type: either datetime.date, datetime.datetime or np.datetime64
-
pyg.base._types.
is_df
(value)¶ is value a pd.DataFrame
-
pyg.base._types.
is_dict
(value)¶ is value a dict
-
pyg.base._types.
is_float
(value)¶ is value an float, or any variant of np.float
-
pyg.base._types.
is_int
(value)¶ is value an int, or any variant of np.intN type
-
pyg.base._types.
is_iterable
(value)¶ is value Iterable excluding a string
-
pyg.base._types.
is_len
(value)¶ is value of zero length (or has no len at all)
-
pyg.base._types.
is_list
(value)¶ is value a list
-
pyg.base._types.
is_nan
(value)¶ is value a nan or an inf. Unlike np.isnan, works for non numeric
-
pyg.base._types.
is_none
(value)¶ is value None
-
pyg.base._types.
is_num
(value)¶ is _int(value) or is_float(value)
-
pyg.base._types.
is_pd
(value)¶ is value a pd.DataFrame/pd.Series
-
pyg.base._types.
is_series
(value)¶ is value a pd.Series
-
pyg.base._types.
is_ts
(value)¶ is value a pandas datafrome whch is indexed by datetimes
-
pyg.base._types.
is_tuple
(value)¶ is value a tuple
-
pyg.base._types.
nan2none
(value)¶ convert np.nan/np.inf to None
zipper¶
-
pyg.base._zip.
zipper
(*values)¶ a safer version of zip
- Examples
zipper works with single values as well as full list:
>>> assert list(zipper([1,2,3], 4)) == [(1, 4), (2, 4), (3, 4)] >>> assert list(zipper([1,2,3], [4,5,6])) == [(1, 4), (2, 5), (3, 6)] >>> assert list(zipper([1,2,3], [4,5,6], [7])) == [(1, 4, 7), (2, 5, 7), (3, 6, 7)] >>> assert list(zipper([1,2,3], [4,5,6], None)) == [(1, 4, None), (2, 5, None), (3, 6, None)] >>> assert list(zipper((1,2,3), np.array([4,5,6]), None)) == [(1, 4, None), (2, 5, None), (3, 6, None)]
- Examples
zipper rejects multi-length lists
>>> import pytest >>> with pytest.raises(ValueError): >>> zipper([1,2,3], [4,5])
- Parameters
- *valueslists
values to be zipped
- Returns
zipped values
reducer¶
-
pyg.base._reducer.
reducer
(function, sequence, default=None)¶ reduce adds stuff to zero by defaults. This is not needed.
- Parameters
- functioncallable
binary function.
- sequenceiterable
list of inputs to be applied iteratively to reduce.
- defaultTYPE, optional
A default value to be returned with an empty sequence
- Example
>>> from operator import add, mul >>> from functools import reduce >>> import pytest
>>> assert reducer(add, [1,2,3,4]) == 10 >>> assert reducer(mul, [1,2,3,4]) == 24 >>> assert reducer(add, [1]) == 1
>>> assert reducer(add, []) is None >>> with pytest.raises(TypeError): >>> reduce(add, [])
reducing¶
-
class
pyg.base._reducer.
reducing
(function=None, *args, **kwargs)¶ Makes a bivariate-function being able to act on a sequence of elements using reduction
- Example
>>> from operator import mul >>> assert reducing(mul)([1,2,3,4]) == 24 >>> assert reducing(mul)(6,4) == 24
Since a.join(b).join(c).join(d) is also very common, we provide a simple interface for that:
- Example
chaining
>>> assert reducing('__add__')([1,2,3,4]) == 10 >>> assert reducing('__add__')(6,4) == 10
d = dictable(a = [1,2,3,5,4]) reducing(‘inc’)(d, dict(a=1))
logger and get_logger¶
-
pyg.base._logger.
get_logger
(name='pyg', level='info', fmt='%(asctime)s - %(name)s - %(levelname)s - %(message)s', file=False, console=True)¶ quick utility to simplify loggers creation and ensure we cache them and do not add to many handlers
- Parameters
- namestr, optional
name of logger. The default is ‘pyg’.
- levelstr, optional
DEBUG/INFO/WARN etc. The default is ‘info’.
- fmtstr, optional
string formatting for messages. The default is ‘%(asctime)s - %(name)s - %(levelname)s - %(message)s’.
- filebool/str, optional
the name of the file to log to. The default is False = do not log to file.
- consolebool, optional
log to console? The default is True.
- Returns
logging.logger
access functions¶
These are useful to convert object-oriented code to declarative functions
-
pyg.base._getitem.
callattr
(value, attr, args=None, kwargs=None)¶ gets the attribute(s) from a value and calls its
- Example
>>> from pyg import * >>> value = Dict(function = lambda a, b: a + b) >>> assert callattr(value, 'function', kwargs = dict(a = 1, b = 2)) == 3 >>> assert callattr(value, attr = 'function', args = (1, 2), kwargs = None) == 3
>>> ts = pd.Series(np.random.normal(0,1,1000)) >>> assert ts.std() == callattr(ts, 'std') >>> assert eq(ts.ewm(com = 10).mean(), callattr(ts, ['ewm','mean'], kwargs = [{'com':10}, {}]))
>>> d = dictable(a = [1,2,3,4,1,2], b = list('abcdef')) >>> assert callattr(d, ['inc', 'exc'], kwargs = [dict(a = 2), dict(b = 'f')]) == d.inc(a = 2).exc(b = 'f')
- valueobj
object that contrains an item.
- attrstring(s)
key within object.
- argstuple, optional
tuple of values to be fed to function. The default is None.
- kwargsdict, optional
kwargs to be fed to the method. The default is None.
-
pyg.base._getitem.
callitem
(value, key, args=None, kwargs=None)¶ gets an item and calls it
- Example
>>> c = dict(function = lambda a, b: a + b) >>> assert callitem(c, 'function', kwargs = dict(a = 1, b = 2)) == 3 >>> assert callitem(c, 'function', args = (1, 2)) == 3
- valueobj
object that contrains an item.
- keystring
key within object.
- argstuple, optional
tuple of values to be fed to function. The default is None.
- kwargsdict, optional
kwargs to be fed to the method. The default is None.
-
pyg.base._getitem.
getitem
(value, key, *default)¶ gets an item, like getattr
- Example
>>> a = dict(a = 1) >>> assert getitem(a, 'a') == 1 >>> assert getitem(a, 'b', 2) == 2
>>> import pytest >>> with pytest.raises(KeyError): >>> getitem(a, 'b')
inspection¶
There are a few functions extending the inspect module.
-
pyg.base._inspect.
argspec_add
(fullargspec, **update)¶ adds new args with default values at the end of the existing args
- Parameters
- fullargspecFullArgSpec
DESCRIPTION.
- **updatedict
parameter names with their default values.
- Returns
FullArgSpec
- Example
>>> f = lambda b : b >>> argspec = getargspec(f) >>> updated = argspec_add(argspec, axis = 0) >>> assert updated.args == ['b', 'axis'] and updated.defaults == (0,)
>>> f = lambda b, axis : None ## axis already exists without a default >>> argspec = getargspec(f) >>> updated = argspec_add(argspec, axis = 0) >>> assert updated == argspec
>>> f = lambda b, axis =1 : None ## axis already exists with a different default >>> argspec = getargspec(f) >>> updated = argspec_add(argspec, axis = 0) >>> assert updated == argspec
-
pyg.base._inspect.
argspec_defaults
(function)¶ - Returns
the function defaults as a dict rather than using the inspect structure
- Example
>>> f = lambda a, b = 1: a+b >>> assert argspec_defaults(f) == dict(b=1)
>>> g = partial(f, b = 2) >>> assert argspec_defaults(g) == dict(b=2)
- Parameters
function : callable
- Returns
defaults as a dict.
-
pyg.base._inspect.
argspec_required
(function)¶ - Parameters
function : callable
- Returns
- list
parameters that must be provided in order to run the function
-
pyg.base._inspect.
argspec_update
(argspec, **kwargs)¶ generic function to create new FullArgSpec (python 3) or normal ArgSpec (python 2)
- Parameters
- argspecFullArgSpec
The argspec of the dunction
- **kwargsTYPE
updates
- Returns
FullArgSpec
- Example
>>> f = lambda a, b =1 : a + b >>> argspec = getargspec(f) >>> assert argspec_update(argspec, args = ['a', 'b', 'c']) == inspect.FullArgSpec(**{'annotations': {}, 'args': ['a', 'b', 'c'], 'defaults': (1,), 'kwonlyargs': [], 'kwonlydefaults': None, 'varargs': None, 'varkw': None})
-
pyg.base._inspect.
call_with_callargs
(function, callargs)¶ replicates inspect.getcallargs with support to functions within decorators
- Example
>>> function = lambda a, b, *args, **kwargs: 1+b+len(args)+10*len(kwargs) >>> args = (1,2,3,4,5); kwargs = dict(c = 6, d = 7) >>> assert function(*args, **kwargs) == 26 >>> callargs = getcallargs(function, *args, **kwargs) >>> assert call_with_callargs(function, callargs) == 26
-
pyg.base._inspect.
getargs
(function, n=0)¶ - Parameters
- functioncallable
The function for which we want the args
- nint optional
get the name opf the args after allowing for n args to be set by *args. The default is 0.
- Returns
None or a list of args
-
pyg.base._inspect.
getargspec
(function)¶ Extends inspect.getfullargspec to allow us to decorate functions with a signature.
- Parameters
- functioncallable
function for which we want to know argspec.
- Returns
inspect.FullArgSpec
-
pyg.base._inspect.
getcallargs
(function, *args, **kwargs)¶ replicates inspect.getcallargs with support to functions within decorators
- Example
>>> from pyg import *; import inspect >>> function = lambda a, b, *myargs, **mykwargs: 1 >>> args = (1,2,3,4,5); kwargs = dict(c = 6, d = 7) >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == {'a': 1, 'b': 2, 'myargs': (3, 4, 5), 'mykwargs': {'c': 6, 'd': 7}}
>>> function = lambda a: a + 1 >>> args = (); kwargs = dict(a=1) >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1)
>>> function = lambda a, b = 1: 1 >>> args = (); kwargs = dict(a=1) >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 1) >>> args = (); kwargs = dict(a=1, b = 2) >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2) >>> args = (1,); kwargs = {} >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 1) >>> args = (1,2); kwargs = {} >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2) >>> args = (1,); kwargs = {'b' : 2} >>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2)
-
pyg.base._inspect.
kwargs2args
(function, args, kwargs)¶ converts a list of paramters that were provided as kwargs, into args
- Example
>>> assert kwargs2args(lambda a, b: a+b, (), dict(a = 1, b=2)) == ([1,2], {})
- Parameters
function : callable args : tuple
parameters of function.
- kwargsdict
key-word parameters of function.
- Returns
- tuple
a pair of a function args, kwargs.