pyg.base

extensions to dict

dictattr

class pyg.base._dictattr.dictattr
A simple dict with extended member manipulation
  1. access using d.key

  2. access multiple elements using d[key1, key2]

Example

members access

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2, c = 3)
>>> assert isinstance(d, dict)
>>> assert d.a == 1
>>> assert d['a','b'] == [1,2]
>>> assert d[['a','b']] == dictattr(a = 1, b = 2)

In addition, it has extended key selection/subsetting

Example

subsetting

>>> d = dictattr(a = 1, b = 2, c = 3)
>>> assert d - 'a' == dictattr(b = 2, c = 3)
>>> assert d & ['b', 'c', 'not in keys'] == dictattr(b = 2, c = 3)

dictattr supports not in-place ‘update’:

Example

updating via adding another dict

>>> d = dictattr(a = 1, b = 2) + dict(b = 'replacing old value', c = 'new key')
>>> assert d == dictattr(a = 1, b = 'replacing old value', c = 'new key')
copy() → a shallow copy of D
keys()

dictattr returns an actual list rather than a generator. Further, this recognises that the keys are necessarily unique so it returns a ulist which is also a set

Returns

ulist

list of keys of dictattr.

Example

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2)
>>> assert d.keys() == ulist(['a', 'b'])
>>> assert d.keys() & ['a', 'c', 'd'] == ['a']
relabel(*args, **relabels)

easy relabel/rename of keys

Parameters

*argsstr or callable
  • a string ending/starting with _ will trigger a prefix/suffix to all keys

  • callable function will be applied to the keys to update them

**relabelsstrings

individual relabeling of keys

Returns

dictattr

new dict with renamed keys.

Example

suffix/prefix

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2, c = 3)
>>> assert d.relabel('x_') == dictattr(x_a = 1, x_b = 2, x_c = 3) # prefixing
>>> assert d.relabel('_x') == dictattr(a_x = 1, b_x = 2, c_x = 3) # suffixing
Example

callable

>>> assert d.rename(upper) == dictattr(A = 1, B = 2, C = 3)
Example

individual relabelling

>>> assert d.rename(a = 'A') == dictattr(A = 1, b = 2, c = 3)
>>> assert d.rename(['A', 'B', 'C']) == d.relabel(upper)
rename(*args, **relabels)

Identical to relabel. See relabel for full docs

values() → an object providing a view on D’s values
pyg.base._dictattr.dictattr.__add__(self, other)

dictattr uses add as a copy + update. Similar to the latest python |=

Example

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2)
>>> assert d + dict(b = 3, c = 5) == dictattr(a = 1, b = 3, c = 5)
Parameters

other: dict

a dict used to update current dict.

pyg.base._dictattr.dictattr.__sub__(self, key, copy=True)

deletes an item but does not throw an exception if not there dictattr uses subtraction to remove key(s)

Returns

updated dictattr

Example

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2, c = 3)
>>> assert d - ['b','c'] == dictattr(a = 1)
>>> assert d - 'c' == dictattr(a = 1, b = 2)
>>> assert d - 'key not there' == d
>>> #commutative
>>> assert (d - 'c').keys() == d.keys() - 'c'
pyg.base._dictattr.dictattr.__and__(self, other)

dictattr uses & as a set operator for key filtering

Returns

updated dictattr

Example

>>> from pyg import *
>>> d = dictattr(a = 1, b = 2, c = 3)
>>> assert d & ['a', 'b', 'not_there'] == dictattr(a = 1, b = 2)
>>> #commutative
>>> assert (d & ['a', 'b', 'x']).keys() == d.keys() & ['a', 'b', 'x']

ulist

The dictattr.keys() method returns a ulist: a list with unique elements:

class pyg.base._ulist.ulist(*args, unique=False)

A list whose members are unique. It has +/- operations overloaded while also supporting set opeations &/|

Example

>>> assert ulist([1,3,2,1]) == list([1,3,2])
Example

addition adds element(s)

>>> assert ulist([1,3,2,1]) + 4  == list([1,3,2,4])
>>> assert ulist([1,3,2,1]) + [4,1] == list([1,3,2,4])
>>> assert ulist([1,3,2,1]) + [4,1,5] == list([1,3,2,4,5])
Example

subtraction removes element(s)

>>> assert ulist([1,3,2,1]) - 1 == [3,2]
>>> assert ulist([1,3,2,1]) - [1,3,4] == [2]
Example

set operations

>>> assert ulist([1,3,2,1]) & 1 == [1]
>>> assert ulist([1,3,2,1]) & [1,3,4] == [1,3]
>>> assert ulist([1,3,2,1]) | 1 == [1,3,2]
>>> assert ulist([1,3,2,1]) | 4 == [1,3,2,4]
>>> assert ulist([1,3,2,1]) | [1,3,4] == [1,3,2,4]
copy()

Return a shallow copy of the list.

Dict

class pyg.base._dict.Dict

Dict extends dictattr to allow access to functions of members

Example

>>> from pyg import *
>>> d = Dict(a = 1, b=2)
>>> assert d[lambda a, b: a+b] == 3
>>> assert d['a','b', lambda a,b: a+b] == [1,2,3]

Dict is also callable where the key-value is used to add/update current members

Example

>>> from pyg import *
>>> d = Dict(a = 1, b=2)
>>> assert d(c = 3) == Dict(a = 1, b = 2, c = 3)
>>> assert d(c = lambda a,b: a+b) == Dict(a = 1, b = 2, c = 3)
>>> assert d(c = 3) == Dict(a = 1, b = 2) + Dict(c = 3)
>>> assert Dict(a = 1)(b = lambda a: a+1)(c = lambda a,b: a+b) == Dict(a = 1,b = 2,c = 3)
do(function, *keys)

applies a function(s) on multiple keys at the same time

Parameters

functioncallable or list of callables

function to be applied per column

*keysstring/list of strings

list of columns to be applied. If missing, applied to all columns

Returns

res : Dict

Example

>>> from pyg import *
>>> d = Dict(name = 'adam', surname = 'atkins')
>>> assert d.do(proper) == Dict(name = 'Adam', surname = 'Atkins')
Example

using another key in the calculation

>>> from pyg import *
>>> d = Dict(a = 1, b = 5, denominator = 10)
>>> d = d.do(lambda value, denominator: value/denominator, 'a', 'b')
>>> assert d == Dict(a = 0.1, b = 0.5, denominator = 10)
pyg.base._dict.Dict.__call__(self, **kwargs)

Call self as a function.

dictable

class pyg.base._dictable.dictable(data=None, columns=None, **kwargs)
What is dictable?

dictable is a table, a collection of iterable records. It is also a dict with each key being a column. Why not use a pandas.DataFrame? pd.DataFrame leads a dual life:

  • by day an index-based optimized numpy array supporting e.g. timeseries analytics etc.

  • by night, a table with keys supporting filtering, aggregating, pivoting on keys as well as inner/outer joining on keys.

dictable only tries to do the latter. dictable should be thought of as a ‘container for complicated objects’ rather than just an array of primitive floats. In general, each cell may contain timeseries, yield_curves, machine-learning experiments etc. The interface is very succinct and allows the user to concentrate on logic of the calculations rather than boilerplate.

dictable supports quite a flexible construction:

Example

construction using records

>>> from pyg import *; import pandas as pd
>>> d = dictable([dict(name = 'alan', surname = 'atkins', age = 39, country = 'UK'), 
>>>               dict(name = 'barbara', surname = 'brown', age = 29, country = 'UK')])
Example

construction using columns and constants

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
Example

construction using pandas.DataFrame

>>> original = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> df_from_dictable = pd.DataFrame(original)
>>> dictable_from_df = dictable(df_from_dictable)
>>> assert original == dictable_from_df
Example

construction rows and columns

>>> d = dictable([['alan', 'atkins', 39, 'UK'], ['barbara', 'brown', 29, 'UK']], columns = ['name', 'surname', 'age', 'country'])
Access

column access

>>> assert d.keys() ==  ['name', 'surname', 'age', 'country']
>>> assert d.name == ['alan', 'barbara']
>>> assert d['name'] == ['alan', 'barbara']
>>> assert d['name', 'surname'] == [('alan', 'atkins'), ('barbara', 'brown')]
>>> assert d[lambda name, surname: '%s %s'%(name, surname)] == ['alan atkins', 'barbara brown']
Access

row access & iteration

>>> assert d[0] == {'name': 'alan', 'surname': 'atkins', 'age': 39, 'country': 'UK'}
>>> assert [row for row in d] == [{'name': 'alan', 'surname': 'atkins', 'age': 39, 'country': 'UK'},
>>>                               {'name': 'barbara', 'surname': 'brown', 'age': 29, 'country': 'UK'}]

Note that members access is commutative:

>>> assert d.name[0] == d[0].name == 'alan'
>>> d[lambda name, surname: name + surname][0] == d[0][lambda name, surname: name + surname]
>>> assert sum([row for row in d], dictable()) == d
Example

adding records

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> d = d + {'name': 'charlie', 'surname': 'chocolate', 'age': 49} # can add a record directly
>>> assert d[-1] == {'name': 'charlie', 'surname': 'chocolate', 'age': 49, 'country': None}
>>> d += dictable(name = ['dana', 'ender'], surname = ['deutch', 'esterhase'], age = [10, 20], country = ['Germany', 'Hungary'])
>>> assert d.name == ['alan', 'barbara', 'charlie', 'dana', 'ender']
>>> assert len(dictable.concat([d,d])) == len(d) * 2
Example

adding columns

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> ### all of the below are ways of adding columns ####
>>> d.gender == ['m', 'f']
>>> d = d(gender = ['m', 'f'])
>>> d['gender'] == ['m', 'f']
>>> d2 = dictable(gender = ['m', 'f'], profession = ['astronaut', 'barber'])
>>> d = d(**d2)
Example

adding derived columns

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> d = d(full_name = lambda name, surname: proper('%s %s'%(name, surname))) 
>>> d['full_name'] = d[lambda name, surname: proper('%s %s'%(name, surname))]
>>> assert d.full_name == ['Alan Atkins', 'Barbara Brown']
Example

dropping columns

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> del d.country # in place
>>> del d['age'] # in place
>>> assert (d - 'name')[0] ==  {'surname': 'atkins'} and d[0] == {'name': 'alan', 'surname': 'atkins'}
Example

row selection, inc/exc

>>> d = dictable(name = ['alan', 'barbara'], surname = ['atkins', 'brown'], age = [39, 29], country = 'UK')
>>> assert len(d.exc(name = 'alan')) == 1
>>> assert len(d.exc(lambda age: age<30)) == 1 # can filter on *functions* of members, not just members.
>>> assert d.inc(name = 'alan').surname == ['atkins']
>>> assert d.inc(lambda age: age<30).name == ['barbara']
>>> assert d.exc(lambda age: age<30).name == ['alan']
dictable supports:
  • sort

  • group-by/ungroup

  • list-by/ unlist

  • pivot/unpivot

  • inner join, outer join and xor

Full details are below.

classmethod concat(*others)

adds together multiple dictables. equivalent to sum(others, self) but a little faster

Parameters

*othersdictables

records to be added to current table

Returns

mergeddictable

sum of all records

Example

>>> from pyg import *
>>> d1 = dictable(a = [1,2,3])
>>> d2 = dictable(a = [4,5,6])
>>> d3 = dictable(a = [7,8,9])
>>> assert dictable.concat(d1,d2,d3) == dictable(a = range(1,10))
>>> assert dictable.concat([d1,d2,d3]) == dictable(a = range(1,10))
do(function, *keys)

applies a function(s) on multiple keys at the same time

Parameters

functioncallable or list of callables

function to be applied per column

*keysstring/list of strings

list of columns to be applied. If missing, applied to all columns

Returns

res : dictable

Example

>>> from pyg import *
>>> d = dictable(name = ['adam', 'barbara', 'chris'], surname = ['atkins', 'brown', 'cohen'])
>>> assert d.do(proper) == dictable(name = ['Adam', 'Barbara', 'Chris'], surname = ['Atkins', 'Brown', 'Cohen'])
Example

using another column in the calculation

>>> from pyg import *
>>> d = dictable(a = [1,2,3,4], b = [5,6,9,8], denominator = [10,20,30,40])
>>> d = d.do(lambda value, denominator: value/denominator, 'a', 'b')
>>> assert d == dictable(a = 0.1, b = [0.5,0.3,0.3,0.2], denominator = [10,20,30,40])
exc(*functions, **filters)

performs a filter on what rows to exclude

Parameters

*functionscallables or a dict

filters based on functions of each row

**filtersvalue or list of values

filters per each column

Returns

dictable

table with rows that satisfy all conditions excluded.

Example

filtering on keys

>>> from pyg import *; import numpy as np
>>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5])
>>> assert d.exc(x = np.nan) == dictable(x = [1,2,3], y = [0,4,3])         
>>> assert d.exc(x = 1) == dictable(x = [2,3,np.nan], y = [4,3,5])
>>> assert d.exc(x = [1,2]) == dictable(x = [1,2], y = [0,4]) 
Example

filtering on callables

>>> from pyg import *; import numpy as np
>>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5])
>>> assert d.exc(lambda x,y: x>y) == dictable(x = 1, y = 0)
get(key, default=None)

Return the value for key if key is in the dictionary, else default.

groupby(*by, grp='grp')

Similar to pandas groupby but returns a dictable of dictables with a new column ‘grp’

Example

>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0])
>>> res = x.groupby('b')
>>> assert res.keys() == ['b', 'grp']
>>> assert is_dictable(res[0].grp) and res[0].grp.keys() == ['a']
Parameters

*by : str or list of strings

gr.

grpstr, optional

The name of the column for the dictables per each key. The default is ‘grp’.

Returns

dictable

A dictable containing the original keys and a dictable per unique key.

inc(*functions, **filters)

performs a filter on what rows to include

Parameters

*functionscallables or a dict

filters based on functions of each row

**filtersvalue or list of values

filters per each column

Returns

dictable

table with rows that satisfy all conditions.

Example

filtering on keys

>>> from pyg import *; import numpy as np
>>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5])
>>> assert d.inc(x = np.nan) == dictable(x = np.nan, y = 5)            
>>> assert d.inc(x = 1) == dictable(x = 1, y = 0)            
>>> assert d.inc(x = [1,2]) == dictable(x = [1,2], y = [0,4]) 
Example

filtering on regex

>>> import re
>>> d = dictable(text = ['once', 'upon', 'a', 'time', 'in', 'the', 'west', 1, 2, 3])
>>> assert d.inc(text = re.compile('o')) == dictable(text = ['once', 'upon'])
>>> assert d.exc(text = re.compile('e')) == dictable(text = ['upon', 'a', 'in', 1, 2, 3])
Example

filtering on callables

>>> from pyg import *; import numpy as np
>>> d = dictable(x = [1,2,3,np.nan], y = [0,4,3,5])
>>> assert d.inc(lambda x,y: x>y) == dictable(x = 1, y = 0)
join(other, lcols=None, rcols=None, mode=None)

Performs either an inner join or a cross join between two dictables

Example

inner join

>>> from pyg import *
>>> x = dictable(a = ['a','b','c','a']) 
>>> y = dictable(a = ['a','y','z'])
>>> assert x.join(y) == dictable(a = ['a', 'a'])
Example

outer join

>>> from pyg import *
>>> x = dictable(a = ['a','b']) 
>>> y = dictable(b = ['x','y'])
>>> assert x.join(y) == dictable(a = ['a', 'a', 'b', 'b'], b = ['x', 'y', 'x', 'y'])
pivot(x, y, z, agg=None)

pivot table functionality.

Parameters

xstr/list of str

unique keys per each row

ystr

unique key per each column

zstr/callable

A column in the table or an evaluated quantity per each row

aggNone/callable or list of callables, optional

Each (x,y) cell can potentially contain multiple z values. so if agg = None, a list is returned If you want the data aggregated in any way, then supply an aggregating function(s)

Returns

A dictable which is a pivot table of the original data

Example

>>> from pyg import *
>>> timetable_as_list = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) 
>>> timetable = timetable_as_list.xyz('x','y',lambda x, y: x * y)
>>> assert timetable = dictable(x = [1,2,3], )
Example

>>> self = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) 
>>> x = 'x'; y = 'y'; z = lambda x, y: x * y
>>> self.exc(lambda x, y: x+y==5).xyz(x,y,z, len)
sort(*by)

Sorts the table either using a key, list of keys or functions of members

Example

>>> import numpy as np
>>> self = dictable(a = [_ for _ in 'abracadabra'], b=range(11), c = range(0,33,3))
>>> self.d = list(np.array(self.c) % 11)
>>> res = self.sort('a', 'd')
>>> assert list(res.c) == list(range(11))
>>> d = dictable(a = ['a', 1, 'c', 0, 'b', 2]).sort('a')        
>>> res = d.sort('a','c')
>>> print(res)
>>> assert ''.join(res.a) == 'aaaaabbcdrr' and list(res.c) == [0,4,8,9,10] + [2,3] + [1] + [7] + [5,6]
>>> d = d.sort(lambda b: b*3 % 11) ## sorting again by c but using a function
>>> assert list(d.c) == list(range(11))
ungroup(grp='grp')

Undoes groupby

Example

>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0])
>>> self = x.groupby('b')
Parameters

grpstr, optional

column name where dictables are. The default is ‘grp’.

Returns

dictable.

unlist()

undoes listby…

Example

>>> x = dictable(a = [1,2,3,4], b= [1,0,1,0])
>>> x.listby('b')

dictable[2 x 2] b|a 0|[2, 4] 1|[1, 3]

>>> assert x.listby('b').unlist().sort('a') == x
Returns

dictable

a dictable where all rows with list in them have been ‘expanded’.

unpivot(x, y, z)

undoes self.xyz / self.pivot

Example

>>> from pyg import *
>>> orig = (dictable(x = [1,2,3,4]) * dict(y = [1,2,3,4,5]))(z = lambda x, y: x*y)
>>> pivot = orig.xyz('x', 'y', 'z', last)
>>> unpivot = pivot.unpivot('x','y','z').do(int, 'y') # the conversion to column names mean y is now string... so we convert back to int
>>> assert orig == unpivot
Parameters

xstr/list of strings

list of keys in the pivot table.

ystr

name of the columns that wil be used for the values that are currently column headers.

zstr

name of the column that describes the data currently within the pivot table.

Returns

dictable

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

xor(other, lcols=None, rcols=None, mode='l')

returns what is in lhs but NOT in rhs (or vice versa if mode = ‘r’). Together with inner joining, can be used as left/right join

Examples

>>> from pyg import *
>>> self = dictable(a = [1,2,3,4])
>>> other = dictable(a = [1,2,3,5])
>>> assert self.xor(other) == dictable(a = 4) # this is in lhs but not in rhs
>>> assert self.xor(other, lcols = lambda a: a * 2, rcols = 'a') == dictable(a = [2,3,4]) # fit can be done using formulae rather than actual columns

The XOR functionality can be performed using quotient (divide): >>> assert lhs/rhs == dictable(a = 4) >>> assert rhs/lhs == dictable(a = 5)

>>> rhs = dictable(a = [1,2], b = [3,4])
>>> left_join_can_be_done_simply_as = lhs * rhs + lhs/rhs
Parameters

otherdictable (or something that can be turned to one)

what we exclude with.

lcolsstr/list of strs, optional

the left columns/formulae on which we match. The default is None.

rcolsstr/list of strs, optional

the right columns/formulae on which we match. The default is None.

modestring, optional

When set to ‘r’, performs xor the other way. The default is ‘l’.

Returns

dictable

a dictable containing what is in self but not in ther other dictable.

xyz(x, y, z, agg=None)

pivot table functionality.

Parameters

xstr/list of str

unique keys per each row

ystr

unique key per each column

zstr/callable

A column in the table or an evaluated quantity per each row

aggNone/callable or list of callables, optional

Each (x,y) cell can potentially contain multiple z values. so if agg = None, a list is returned If you want the data aggregated in any way, then supply an aggregating function(s)

Returns

A dictable which is a pivot table of the original data

Example

>>> from pyg import *
>>> timetable_as_list = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) 
>>> timetable = timetable_as_list.xyz('x','y',lambda x, y: x * y)
>>> assert timetable = dictable(x = [1,2,3], )
Example

>>> self = dictable(x = [1,2,3]) * dictable(y = [1,2,3]) 
>>> x = 'x'; y = 'y'; z = lambda x, y: x * y
>>> self.exc(lambda x, y: x+y==5).xyz(x,y,z, len)
pyg.base._dictable.dictable.__call__(self, **kwargs)

Call self as a function.

perdictable

pyg.base._perdictable.perdictable()

A decorator that makes a function works per dictable and not just on original value

Example

>>> f = lambda a, b: a+b
>>> p = perdictable(f, on = ['key'])     

The new modified function p now works the same on old values:

Paramaters

functioncallable

A function

on: str/list of str

perform join based on these keys

renames: dict

This tells us which column to grab from which table

defaults: dict

If a default is provided for a parameter, we will perform a left join, substituting missing values with the defaults

if_none: bool, list of keys

If historic data is None while the row has expired, should we force a recalculation? if True, will be done.

output_is_input: bool, list of keys

Some functions want their own outut to be presented to them. If you see to True, if cached values exist for these columns, these are provided to the function

include_inputs:

When we return the outputs, do you want the inputs to be included as well in the dictable.

col: str

the name of the variable output.

Example

>>> f = lambda a, b: a+b
>>> p = perdictable(f, include_inputs = True)     
>>> assert p(a = 1, b = 2) == 3
>>> assert p(a = dictable(a = [1,2,3]), b = 3) == dictable(a = [1,2,3], b = 3, expiry = None, data = [4,5,6])

# some parameters are constant, some are tables…

>>> assert p(a = 1, b = dictable(key = ['a','b','c'], b = [1,2,3])) == dictable(key  = ['a', 'b', 'c'], data = [2,3,4])  

# multiple tables… some unkeyed

>>> assert p(a = dictable(a = [1,2]), b = dictable(key = ['a','b','c'], b = [1,2,3])) == dictable(key  = ['a','a', 'b', 'b', 'c','c'], data = [2,3,3,4,4,5])

# multiple tables… all keyed

>>> a = dictable(key = ['x', 'y'], data = [1,2])
>>> b = dictable(key = ['y', 'z'], data = [3,4])
>>> assert p(a = a, b = b) == dictable(key  = ['y'], data = [5])
Example

existing data provided using data and expiry

>>> a = dictable(key = ['x', 'y', 'z'], data = [1,2,3])
>>> b = dictable(key = ['x', 'y', 'z'], data = [1,3,4])
>>> data = dictable(key = ['x', 'y'], data = ['we calculated this before', 'we calculated before but hasnt expired'])
>>> expiry = dictable(key = ['x', 'y'], data = [dt(2000,1,1), dt(3000,1,1)])
>>> inputs = dict(a = a, b = b)
>>> res = p(a = a, b = b, data = data, expiry = expiry)
>>> assert res.find_data(key = 'x').data == 'we calculated this before'
>>> assert res.find_data(key = 'y').data == 5  # although calculated before, we recalculate as its expiry is in the future

join

pyg.base._perdictable.join(inputs, on=None, renames=None, defaults=None)

Suppose we have a function which is defined on simple numbers

Example

>>> from pyg import *
>>> profit = lambda amount, price: amount  * price    

The amounts sold are available in one table and prices in another

Example

>>> amounts = dictable(product = ['apple', 'orange', 'pear'], amount = [1,2,3])
>>> prices = dictable(product = ['apple', 'orange', 'pear', 'banana'], price = [4,5,6,8])
>>> join(dict(amount = amounts, price = prices), on = 'product')(profit = profit)    
>>> dictable[3 x 4]
>>> product|amount|price|profit
>>> apple  |1     |4    |4     
>>> orange |2     |5    |10    
>>> pear   |3     |6    |18    
Parameters

inputsdict

a dict of input parameters, some of them may be dictables.

onstr/list of str

when we have dictables

renamesdict, optional

remapping. if the datasets contain multiple columns, you can say renames = dict(price = ‘price_in_dollar’) to tell the algo, this is the column to use The default is None.

defaultsdict, optional

Normally, an inner join is performed. However, if there is a default value/formula for when e.g. a price is missing, use this. The default is None.

Returns

dictable

a dictable of an inner join.

Example

how column mapping is done

>>> on = 'a'
>>> ## if there is only one column apart from keys, then it is selected:
>>> assert join(dict(x = dictable(a = [1,2], data = [2,3])), on = on) == dictable(a = [1,2], x = [2,3])
>>> assert join(dict(x = dictable(a = [1,2], random_name = [2,3])), on = on) == dictable(a = [1,2], x = [2,3])
>>> ## if there are multiple columns, if variable name is there, we use it:
>>> assert join(dict(x = dictable(a = [1,2], z = [2,3], x = [4,5])), on) == dictable(a = [1,2], x = [4,5])
>>> ## if there are multiple columns, and 'data' is one of the columns, we use it:
>>> assert join(dict(x = dictable(a = [1,2], z = [2,3], data = [4,5])), on) == dictable(a = [1,2], x = [4,5])
Example

how column mapping is done with rename

>>> with pytest.raises(KeyError):
>>>     join(dict(x = dictable(a = [1,2], b = [2,3], c = [4,5])), on = 'a') ## pick b or c?
>>> assert join(dict(x = dictable(a = [1,2], b = [2,3], c = [4,5])), on = 'a', renames = dict(x = 'c')) == dictable(a = [1,2,], x = [4, 5])
Example

joins with partial columns in some tables

>>> on = ['a', 'b', 'c']
>>> a = dictable(a = [1,2,3,4], x = [1,2,3,4]) ## only column a here
>>> b = dictable(b = [1,2,3,4], y = [1,2,3,4]) ## only column b here
>>> c = dictable(a = [1,2,3,4], b = [1,2,3,4], c = [1,2,3,4], z = [1,2,3,4])
>>> j = join(dict(x = a, y = b, z = c), on = ['a', 'b', 'c'])    
>>> assert len(j) == 4 and sorted(j.keys()) == ['a', 'b', 'c', 'x', 'y', 'z']
Example

join with defaults

If no defaults are provided, we need all variables to be present. However, if we specify defaults, we left-join on that variable and insert the default value

>>> x = dictable(a = [1,2,4], x = [1,2,4])
>>> y = dictable(a = [1,2,3], x = [5,6,7])
>>> on = 'a'
>>> assert join(dict(x = x, y = y), on = on) == dictable(a = [1,2,], x = [1,2], y = [5,6])
>>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(x = None)) == dictable(a = [1,2,3], x = [1,2,None], y = [5,6,7])
>>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(y = 0)) == dictable(a = [1,2,4], x = [1,2,4], y = [5,6,0])
>>> assert join(dict(x = x, y = y), on = 'a', defaults = dict(x = None, y = 0)) == dictable(a = [1,2,3,4], x = [1,2,None,4], y = [5,6,7,0])

named_dict

pyg.base._named_dict.named_dict(name, keys, defaults={}, types={}, casts={}, basedict='pyg.base.dictattr', debug=False)

This forms a base for all classes. It is similar to named_tuple but:

  • supports additional features such as casting/type checking.

  • support default values

The resulting class is a dict so can be stored in MongoDB, sent to json or be used to construct a pd.Series automatically.

Example

Simple construction

>>> Customer = named_dict('Customer', ['name', 'date', 'balance'])
>>> james = Customer('james', 'today', 10)
>>> assert james['balance'] == 10
>>> assert james.date == 'today'
Example

How named_dict works with json/pandas/other named_dicts

>>> class Customer(named_dict('Customer', ['name', 'date', 'balance'])):
>>>     def add_to_balance(self, value):
>>>         res = self.copy()
>>>         res.balance += value
>>>         return res
>>> james = Customer('james', 'date', 10)    
>>> assert james.add_to_balance(10).balance == 20
>>> import json
>>> assert pd.Series(james).date == 'date'
>>> assert dict(james) == {'name': 'james', 'date': 'date', 'balance': 10}
>>> assert json.dumps(james) == '{"name": "james", "date": "date", "balance": 10}'
>>> class VIP(named_dict('VIP', ['name', 'date'])):
>>>     def some_method(self):
>>>         return 'inheritence between classes works as long as members can share'
>>> vip = VIP(james)
>>> assert vip.name == 'james' ## members moved seemlessly
>>> assert vip.some_method() == 'inheritence between classes works as long as members can share' 
Example

Adding defaults

>>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0))
>>> james = Customer('james', 'today')
>>> assert james['balance'] == 0
Example

types checking

>>> import datetime
>>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0), types = dict(date = 'datetime.datetime'))
>>> james = Customer('james', datetime.datetime.now())
>>> assert james['balance'] == 0
Example

casting

>>> Customer = named_dict('Customer', ['name', 'date', 'balance'], defaults = dict(balance = 0), types = dict(date = 'datetime.datetime'), casts = dict(balance = 'float'))
>>> james = Customer('james', datetime.datetime.now(), balance = '10.3')
>>> assert james['balance'] == 10.3
Parameters

namestr

name of new class.

keyslist

list of keys that the class must have as members.

defaultsdict, optional

default values for the keys. The default is {}.

typestype or callable, optional

A test to be applied for keys either as a callable or as a type. The default is {}.

castsdict, optional

function. The default is {}.

basedictstr, optional

name of the dict class to inherit from. The default is ‘dict’.

debugbool, optional

output the construction text if set to True. The default is False.

ValueError

DESCRIPTION.

Returns

result : new class that inherits from a dict

decorators

wrapper

class pyg.base._decorators.wrapper(function=None, *args, **kwargs)

A base class for all decorators. It is similar to functools.wraps but better. See below why wrapt cannot be used… You basically need to define the wrapped method and everything else is handled for you. - You can then use it either directly to decorate functions - Or use it to create parameterized decorators - the __name__, __wrapped__, __doc__ and the getargspec will all be taken care of.

Example

>>> class and_add(wrapper):
>>>     def wrapped(self, *args, **kwargs):
>>>         return self.function(*args, **kwargs) + self.add ## note that we are assuming self.add exists
>>> @and_add(add = 3) ## create a decorator and decorate the function
>>> def f(a,b):
>>>     return a+b
>>> assert f.add == 3
>>> assert f(1,2) == 6

Alternatively you can also use it this directly:

>>> def f(a,b):
>>>     return a+b
>>> 
>>> assert and_add(f, add = 3)(1,2) == 6
Example

Explicit parameter construction

You can make the init more explict, also adding defaults for the parameters:

>>> class and_add_version_2(wrapper):
>>>     def __init__(self, function = None, add = 3):
>>>         super(and_add, self).__init__(function = function, add = add)
>>>     def wrapped(self, *args, **kwargs):
>>>         return self.function(*args, **kwargs) + self.add
>>> @and_add_version_2
>>> def f(a,b):
>>>     return a+b
>>> assert f(1,2) == 6
Example

No recursion

The decorator is designed to have a single instance of a specific wrapper

>>> f = lambda a, b: a+b
>>> assert and_add(and_add(f)) == and_add(f)

This holds even for multiple levels of wrapping:

>>> x = try_none(and_add(f))
>>> y = try_none(and_add(x))
>>> assert x == y        
>>> assert x(1, 'no can add') is None        
Example

wrapper vs wrapt

wrapt (wrapt.readthedocs.io) is an awesome wrapping tool. If you have static library functions, none is better. The problem we face is that wrapt is too good in pretending the wrapped up object is the same as original function:

>>> import wrapt    
>>> def add_value(value):
>>>     @wrapt.decorator
>>>     def wrapper(wrapped, instance, args, kwargs):
>>>         return wrapped(*args, **kwargs) + value
>>>     return wrapper
>>> def f(x,y):
>>>     return x*y
>>> add_three = add_value(value = 3)(f)
>>> add_four = add_value(value = 4)(f)
>>> assert add_four(3,4) == 16 and add_three(3,4) == 15
>>> ## but here is the problem:
>>> assert encode(add_three) == encode(add_four) == encode(f)

So if we ever encode the function and send it across json/Mongo, the wrapping is lost and the user when she receives it cannot use it

>>> class add_value(wrapper):
>>>     def wrapped(self, *args, **kwargs):
>>>         return self.function(*args, **kwargs) + self.value
>>> add_three = add_value(value = 3)(f)
>>> add_four = add_value(value = 4)(f)
>>> encode(add_three)
>>> {'value': 3, 'function': '{"py/function": "__main__.f"}', '_obj': '{"py/type": "__main__.add_value"}'}
>>> encode(add_three)
>>> {'value': 4, 'function': '{"py/function": "__main__.f"}', '_obj': '{"py/type": "__main__.add_value"}'}

timer

class pyg.base._decorators.timer(function, n=1, time=False)

timer is similar to timeit but rather than execution of a Python statement, timer wraps a function to make it log its evaluation time before returning output

Parameters

function: callable

The function to be wraooed

n: int, optional

Number of times the function is to be evaluated. Default is 1

time: bool, optional

If set to True, function will return the TIME it took to evaluate rather than the original function output.

Example

>>> from pyg import *; import datetime
>>> f = lambda a, b: a+b
>>> evaluate_100 = timer(f, n = 100, time = True)(1,2)
>>> evaluate_10000 = timer(f, n = 10000, time = True)(1,2)
>>> assert evaluate_10000> evaluate_100
>>> assert isinstance(evaluation_time, datetime.timedelta)

try_value

pyg.base._decorators.try_value()

wraps a function to try an evaluation. If an exception is thrown, returns a cached argument

Parameters

function callable

The function we want to decorate

value:

If the function fails, it will return value instead. Default is None

verbose: bool

If set to True, the logger will warn with the error message.

There are various convenience functions with specific values try_zero, try_false, try_true, try_nan and try_none will all return specific values if function fails.

Example

>>> from pyg import *
>>> f = lambda a: a[0]
>>> assert try_none(f)(4) is None
>>> assert try_none(f, 'failed')(4) == 'failed'

try_back

pyg.base._decorators.try_back()

wraps a function to try an evaluation. If an exception is thrown, returns first argument

Example

>>> f = lambda a: a[0]
>>> assert try_back(f)('hello') == 'h' and try_back(f)(5) == 5

loops

class pyg.base._loop.loops(function=None, types=None)

converts a function to loop over the arguments, depending on the type of the first argument

Examples

>>> @loop(dict, list, pd.DataFrame, pd.Series)
>>> def f(a,b):
>>>     return a+b
>>> assert f(1,2) == 3
>>> assert f([1,2,3],2) == [3,4,5]
>>> assert f([1,2,3], [4,5,6]) == [5,7,9]    
>>> assert f(dict(x=1,y=2), 3) == dict(x = 4, y = 5)
>>> assert f(dict(x=1,y=2), dict(x = 3, y = 4)) == dict(x = 4, y = 6)
>>> a = pd.Series(dict(x=1,y=2))
>>> b = dict(x=3,y=4)
>>> assert np.all(f(a,b) == pd.Series(dict(x=4,y=6)))    
>>> a = pd.DataFrame(dict(x=[1,1],y=[2,2])); a.index = [5,10]
>>> b = dict(x=3,y=4)
>>> res =  f(a,b)
>>> assert np.all(res == pd.DataFrame(dict(x=[4,4],y=[6,6]), index = [5,10]))    
>>> a = pd.DataFrame(dict(x=[1,1],y=[2,2])); a.index = [5,10]
>>> res =  f(a,[3,4])
>>> assert np.all( res == pd.DataFrame(dict(x=[4,4],y=[6,6]), index = [5,10]))    

loop

pyg.base._loop.loop(*types)

returns an instance of loops(types = types)

loop_all is an instance of loops that loops over dict, list, tuple, np.ndarray and pandas.DataFrame/Series

kwargs_support

pyg.base._decorators.kwargs_support()

Extends a function to support **kwargs inputs

Example

>>> from pyg import *
>>> @kwargs_support
>>> def f(a,b):
>>>     return a+b
>>> assert f(1,2, what_is_this = 3, not_used = 4, ignore_this_too = 5) == 3

graphs & cells

cell

class pyg.base._cell.cell(function=None, output=None, **kwargs)

cell is a Dict that can be though of as a node in a calculation graph. The nearest parallel is actually an Excel cell:

  • cell contains both its function and its output. cell.output defines the keys where the output is supposed to be

  • cell contains reference to all the function outputs

  • cell contains its locations and the means to manage its own persistency

Parameters

  • function is the function to be called

  • ** kwargs are the function named key value args. NOTE: NO SUPPORT for *args nor **kwargs in function

  • output: where should the function output go?

Example

simple construction

>>> from pyg import *
>>> c = cell(lambda a, b: a+b, a = 1, b = 2)
>>> assert c.a == 1
>>> c = c.go()
>>> assert c.output == ['data'] and c.data == 3
Example

make output go to ‘value’ key

>>> c = cell(lambda a, b: a+b, a = 1, b = 2, output = 'value')
>>> assert c.go().value == 3
Example

multiple outputs by function

>>> f = lambda a, b: dict(sum = a+b, prod = a*b)
>>> c = cell(f, a = 1, b = 2, output  = ['sum', 'prod'])
>>> c = c.go()
>>> assert c.sum == 3 and c.prod == 2
Methods

  • cell.run() returns bool if cell needs to be run

  • cell.go() calculates the cell and returns the function with cell.output keys now populated.

  • cell.load()/cell.save() interface for self load/save persistence

copy() → a shallow copy of D
go(go=1, mode=0, **kwargs)

calculates the cell (if needed). By default, will then run cell.save() to save the cell. If you don’t want to save the output (perhaps you want to check it first), use cell._go()

Parameters

goint, optional

a parameter that forces calculation. The default is 0. go = 0: calculate cell only if cell.run() is True go = 1: calculate THIS cell regardless. calculate the parents only if their cell.run() is True go = 2: calculate THIS cell and PARENTS cell regardless, calculate grandparents if cell.run() is True etc. go = -1: calculate the entire tree again.

**kwargsparameters

You can actually allocate the variables to the function at runtime

Note that by default, cell.go() will default to go = 1 and force a calculation on cell while cell() is lazy and will default to assuming go = 0

Returns

cell

the cell, calculated

Example

different values of go

>>> from pyg import *
>>> f = lambda x=None,y=None: max([dt(x), dt(y)])
>>> a = cell(f)()
>>> b = cell(f, x = a)()
>>> c = cell(f, x = b)()
>>> d = cell(f, x = c)()
>>> e = d.go()
>>> e0 = d.go(0)
>>> e1 = d.go(1)
>>> e2 = d.go(2)
>>> e_1 = d.go(-1)
>>> assert not d.run() and e.data == d.data 
>>> assert e0.data == d.data 
>>> assert e1.data > d.data and e1.x.data == d.x.data
>>> assert e2.data > d.data and e2.x.data > d.x.data and e2.x.x.data == d.x.x.data
>>> assert e_1.data > d.data and e_1.x.data > d.x.data and e_1.x.x.data > d.x.x.data
Example

adding parameters on the run

>>> c = cell(lambda a, b: a+b)
>>> d = c(a = 1, b =2)
>>> assert d.data == 3
load(mode=0)

Loads the cell from the database based on primary keys of cell perhaps. Not implemented for simple cell. see db_cell

Returns

cell

self, updated with values from database.

run()

checks if the cell needs calculation. This depends on the nature of the cell. By default (for cell and db_cell), if the cell is already calculated so that cell._output exists, then returns False. otherwise True

bool

run cell?

Example

>>> c = cell(lambda x: x+1, x = 1)
>>> assert c.run()
>>> c = c()
>>> assert c.data == 2 and not c.run()
save()

Saves the cell for persistency. Not implemented for simple cell. see db_cell

Returns

cell

self, saved.

cell_go

pyg.base._cell.cell_go(value, go=0, mode=0)

cell_go makes a cell run (using cell.go(go)) and returns the calculated cell. If value is not a cell, value is returned.

Parameters

valuecell

The cell (or anything else).

goint

same inputs as per cell.go(go). 0: run if cell.run() is True 1: run this cell regardless, run parent cells only if they need to calculate too n: run this cell & its nth parents regardless.

Returns

The calculated cell

Example

calling non-cells

>>> assert cell_go(1) == 1
>>> assert cell_go(dict(a=1,b=2)) == dict(a=1,b=2)
Example

calling cells

>>> c = cell(lambda a, b: a+b, a = 1, b = 2)
>>> assert cell_go(c) == c(data = 3)

cell_item

pyg.base._cell.cell_item(value, key=None)

returns an item from a cell (if not cell, returns back the value). If no key is provided, will return the output of the cell

Parameters

valuecell or object or list of cells/objects

cell

keystr, optional

The key within cell we are interested in. Note that key is treated as GUIDANCE only. Our strong preference is to return valid output from cell_output(cell)

Example

non cells

>>> assert cell_item(1) == 1
>>> assert cell_item(dict(a=1,b=2)) == dict(a=1,b=2)
Example

cells, simple

>>> c = cell(lambda a, b: a+b, a = 1, b = 2)
>>> assert cell_item(c) is None
>>> assert cell_item(c.go()) == 3

cell_func

pyg.base._cell.cell_func()

cell_func is a wrapped and wraps a function to act on cells rather than just on values

When called, it will returns not just the function, but also args, kwargs used to call it.

Example

>>> from pyg import *
>>> a = cell(lambda x: x**2, x  = 3)
>>> b = cell(lambda y: y**3, y  = 2)
>>> function = lambda a, b: a+b
>>> self = cell_func(function)
>>> result, args, kwargs = self(a,b)
>>> assert result == 8 + 9
>>> assert args[0].data == 3 ** 2
>>> assert args[1].data == 2 ** 3

cell_clear

pyg.base._cell.cell_clear(value)

cell_clear clears a cell of its output so that it contains only the essentil stuff to do its calculations. This will be used when we save the cell or we want to recalculate it.

Example

>>> from pyg import *    
>>> a = cell(add_, a = 1, b = 2)
>>> b = cell(add_, a = 2, b = 3)
>>> c = cell(add_, a = a, b = b)()
>>> assert c.data == 8    
>>> assert c.a.data == 3
>>> bare = cell_clear(c)
>>> assert 'data' not in bare and 'data' not in bare.a
>>> assert bare() == c
Parameters

value: obj

cell (or list/dict of) to be cleared of output

encode and decode/save and load

encode

pyg.base._encode.encode(value)

encode/decode are performed prior to sending to mongodb or after retrieval from db. The idea is to make object embedding in Mongo transparent to the user.

  • We use jsonpickle package to embed general objects. These are encoded as strings and can be decoded as long as the original library exists when decoding.

  • pandas.DataFrame are encoded to bytes using pickle while numpy arrays are encoded using the faster array.tobytes() with arrays’ shape & type exposed and searchable.

Example

>>> from pyg import *; import numpy as np
>>> value = Dict(a=1,b=2)
>>> assert encode(value) == {'a': 1, 'b': 2, '_obj': '{"py/type": "pyg.base._dict.Dict"}'}
>>> assert decode({'a': 1, 'b': 2, '_obj': '{"py/type": "pyg.base._dict.Dict"}'}) == Dict(a = 1, b=2)
>>> value = dictable(a=[1,2,3], b = 4)
>>> assert encode(value) == {'a': [1, 2, 3], 'b': [4, 4, 4], '_obj': '{"py/type": "pyg.base._dictable.dictable"}'}
>>> assert decode(encode(value)) == value
>>> assert encode(np.array([1,2])) ==  {'data': bytes,
>>>                                     'shape': (2,),
>>>                                     'dtype': '{"py/reduce": [{"py/type": "numpy.dtype"}, {"py/tuple": ["i4", false, true]}, {"py/tuple": [3, "<", null, null, null, -1, -1, 0]}]}',
>>>                                     '_obj': '{"py/function": "pyg.base._encode.bson2np"}'}
Example

functions and objects

>>> from pyg import *; import numpy as np
>>> assert encode(ewma) == '{"py/function": "pyg.timeseries._ewm.ewma"}'
>>> assert encode(Calendar) == '{"py/type": "pyg.base._drange.Calendar"}'
Parameters

valueobj

An object to be encoded

Returns

A pre-json object

decode

pyg.base._encode.decode(value, date=None)

decodes a string or an object dict

Parameters

valuestr or dict

usually a json

dateNone, bool or a regex expression, optional

date format to be decoded

Returns

obj

the json decoded.

Examples

>>> from pyg import *
>>> class temp(dict):
>>>    pass
>>> orig = temp(a = 1, b = dt(0))
>>> encoded = encode(orig)
>>> assert eq(decode(encoded), orig) # type matching too...

pd_to_parquet

pyg.base._parquet.pd_to_parquet(value, path, compression='GZIP')

a small utility to save df to parquet, extending both pd.Series and non-string columns

Example

>>> from pyg import *
>>> import pandas as pd
>>> import pytest
>>> df = pd.DataFrame([[1,2],[3,4]], drange(-1), columns = [0, dt(0)])
>>> s = pd.Series([1,2,3], drange(-2))
>>> with pytest.raises(ValueError): ## must have string column names
        df.to_parquet('c:/temp/test.parquet')
>>> with pytest.raises(AttributeError): ## pd.Series has no to_parquet
        s.to_parquet('c:/temp/test.parquet')
>>> df_path = pd_to_parquet(df, 'c:/temp/df.parquet')
>>> series_path = pd_to_parquet(s, 'c:/temp/series.parquet')
>>> df2 = pd_read_parquet(df_path)
>>> s2 = pd_read_parquet(series_path)
>>> assert eq(df, df2)
>>> assert eq(s, s2)

pd_read_parquet

pyg.base._parquet.pd_read_parquet(path)

a small utility to read df/series from parquet, extending both pd.Series and non-string columns

Example

>>> from pyg import *
>>> import pandas as pd
>>> import pytest
>>> df = pd.DataFrame([[1,2],[3,4]], drange(-1), columns = [0, dt(0)])
>>> s = pd.Series([1,2,3], drange(-2))
>>> with pytest.raises(ValueError): ## must have string column names
        df.to_parquet('c:/temp/test.parquet')
>>> with pytest.raises(AttributeError): ## pd.Series has no to_parquet
        s.to_parquet('c:/temp/test.parquet')
>>> df_path = pd_to_parquet(df, 'c:/temp/df.parquet')
>>> series_path = pd_to_parquet(s, 'c:/temp/series.parquet')
>>> df2 = pd_read_parquet(df_path)
>>> s2 = pd_read_parquet(series_path)
>>> assert eq(df, df2)
>>> assert eq(s, s2)

parquet_encode

pyg.mongo._encoders.parquet_encode(value, path, compression='GZIP')

encodes a single DataFrame or a document containing dataframes into a an abject that can be decoded

>>> from pyg import *     
>>> path = 'c:/temp'
>>> value = dict(key = 'a', n = np.random.normal(0,1, 10), data = dictable(a = [pd.Series([1,2,3]), pd.Series([4,5,6])], b = [1,2]), other = dict(df = pd.DataFrame(dict(a=[1,2,3], b= [4,5,6]))))
>>> encoded = parquet_encode(value, path)
>>> assert encoded['n']['file'] == 'c:/temp/n.npy'
>>> assert encoded['data'].a[0]['path'] == 'c:/temp/data/a/0.parquet'
>>> assert encoded['other']['df']['path'] == 'c:/temp/other/df.parquet'
>>> decoded = decode(encoded)
>>> assert eq(decoded, value)

csv_encode

pyg.mongo._encoders.csv_encode(value, path)

encodes a single DataFrame or a document containing dataframes into a an abject that can be decoded while saving dataframes into csv

>>> path = 'c:/temp'
>>> value = dict(key = 'a', data = dictable(a = [pd.Series([1,2,3]), pd.Series([4,5,6])], b = [1,2]), other = dict(df = pd.DataFrame(dict(a=[1,2,3], b= [4,5,6]))))
>>> encoded = csv_encode(value, path)
>>> assert encoded['data'].a[0]['path'] == 'c:/temp/data/a/0.csv'
>>> assert encoded['other']['df']['path'] == 'c:/temp/other/df.csv'
>>> decoded = decode(encoded)
>>> assert eq(decoded, value)

convertors to bytes

pyg.base._encode.pd2bson(value)

converts a value (usually a pandas.DataFrame/Series) to bytes using pickle

pyg.base._encode.np2bson(value)

converts a numpy array to bytes using value.tobytes(). This is much faster than pickle but does not save shape/type info which we save separately.

pyg.base._encode.bson2np(data, dtype, shape)

converts a byte with dtype and shape information into a numpy array.

pyg.base._encode.bson2pd(data)

converts a pickled object back to an object. We insist that new object has .shape to ensure we did not unpickle gibberish.

dates and calendar

dt

pyg.base._dates.dt(*args, dialect='uk', none=<built-in method now of type object>)

A more generic constructor for datetime.datetime.

Example

Simple construction

>>> assert dt(2000,1 ,1) == datetime.datetime(2000, 1, 1, 0, 0) # name of month
>>> assert dt(2000,'jan',1) == datetime.datetime(2000, 1, 1, 0, 0) # name of month
>>> assert dt(2000,'f',1) == datetime.datetime(2000, 1, 1, 0, 0) # future month code
>>> assert dt('01-02-2002') == datetime.datetime(2002, 2, 1)
>>> assert dt('01-02-2002', dialect = 'US') == datetime.datetime(2002, 1, 2)
>>> assert dt('01 March 2002') == datetime.datetime(2002, 3, 1)
>>> assert dt('01 March 2002', dialect = 'US') == datetime.datetime(2002, 3, 1)
>>> assert dt('01 March 2002 10:20:30') == datetime.datetime(2002, 3, 1, 10, 20, 30)
>>> assert dt(20020301) == datetime.datetime(2002, 3, 1)
>>> assert dt(37316) == datetime.datetime(2002, 3, 1) # excel date
>>> assert dt(730180) == datetime.datetime(2000,3,1) # ordinal for 1/3/2000
>>> assert dt(2000,3,1).timestamp() == 951868800.0
>>> assert dt(951868800.0) == datetime.datetime(2000,3,1) # utc timestamp
>>> assert dt(np.datetime64(dt(2000,3,1))) == dt(2000,3,1) ## numpy.datetime64 object
>>> assert dt(2000) == datetime.datetime(2000,1,1)
>>> assert dt(2000,3) == datetime.datetime(2000,3,1)
>>> assert dt(2000,3, 1) == datetime.datetime(2000,3,1)
>>> assert dt(2000,3, 1, 10,20,30) == datetime.datetime(2000,3,1,10,20,30)
>>> assert dt(2000,'march', 1) == datetime.datetime(2000,3,1)
>>> assert dt(2000,'h', 1) == datetime.datetime(2000,3,1) # future codes
Example

date as offset from today

>>> today = dt(0); 
>>> import datetime
>>> day = datetime.timedelta(1)
>>> assert dt(-3) == today - 3 * day
>>> assert dt('-10b') == today - 14 * day
Example

datetime arithmetic:

dt has an interesting logic in implementing datetime arithmentic:

  • day and month parameters can be negative or bigger than the days of month

  • dt() will roll back/forward from the date which is valid

>>> assert dt(2000,4,1) == datetime.datetime(2000, 4, 1, 0, 0)
>>> assert dt(2000,4,0) == datetime.datetime(2000, 3, 31, 0, 0) # a day before dt(2000,4,1)

and rolling back months:

>>> assert dt(2000,0,1) == datetime.datetime(1999, 12, 1, 0, 0) # a month before dt(2000,1,1)
>>> assert dt(2000,13,1) == datetime.datetime(2001, 1, 1, 0, 0) # a month after dt(2000,12,1)

This may feel unnatural at first, but does allow for much nicer code, e.g.: [dt(2000,i,1) for i in range(-10,10)]

Parameters

*argsstr, int or dates

argument to be converted into dates

dialectstr, optional

parsing of 01/02/2020 is it 1st Feb or 2nd Jan? The default is ‘uk’, i.e. dd/mm/yyyy

nonecallable, optional

What is dt()? The default is datetime.datetime.now()

ymd

pyg.base._dates.ymd(*args, dialect='uk', none=<built-in method now of type object>)

just like dt() but always returns date only (year/month/date) without fractions. see dt() for full documentation

datetime.datetime

dt_bump

pyg.base._dates.dt_bump(t, *bumps)
Example

>>> from pyg import *
>>> t  = pd.Series([1,2,3], drange(dt(2000,1,1),2))
>>> assert eq(dt_bump(t, 1), pd.Series([1,2,3], drange(dt(2000,1,2),2)))

drange

pyg.base._drange.drange(t0=None, t1=None, bump=None)

A quick and happy wrapper for dateutil.rrule

Examples

>>> drange(2000, 10, 1) # 10 days starting from dt(2000,1,1)
>>> drange(2000, '10b', '1b') # weekdays between dt(2000,1,1) and dt(2000,1,17)
>>> drange('-10b', 0, '1b') # business days since 10 bdays ago
>>> drange('-10b', '10b', '1w') # starting 10b days ago, to 10b from now, counting in weekly jumps
Parameters

t0date, optional

start date. The default is None.

t1date, optional

end date. The default is None.

bumptimedelta, int, string, optional

bump period. The default is None.

Returns

list of dates

Example

>>> t0 = 2000; t1 = 1999
>>> bump = '-1b'
Example

>>> t0 = dt(2020); t1 = dt(2021); bump = datetime.timedelta(hours = 4)

date_range

pyg.base._drange.date_range(t0=None, t1=None)

Calendar

class pyg.base._drange.Calendar(key=None, holidays=None, weekend=None, t0=None, t1=None, adj='m', day_start=0, day_end=235959)
Calendar is
  • a dict

  • containing holiday dates

  • implementing business day arithmetic

Calendar is restricted to operate between cal.t0 and cal.t1 which default to TMIN = 1900 and TMAX = 2300

Calendar does this by having two key members:
  • dt2int: a mapping from all business dates to their integer ‘clock’

  • int2dt: a mapping from integer value to the date

Since Calendar is an ‘expensive’ memory wise, we assign a key to the calendar and the Calendar is stored in the singleton calendars under this key

Example

>>> from pyg import *
>>> holidays = dictable([[1,'2012-01-02','New Year Day',],
                        [2,'2012-01-16','Martin Luther King Jr. Day',],
                        [3,'2012-02-20','Presidents Day (Washingtons Birthday)',],
                        [4,'2012-05-28','Memorial Day',],
                        [5,'2012-07-04','Independence Day',],
                        [6,'2012-09-03','Labor Day',],
                        [7,'2012-10-08','Columbus Day',],
                        [8,'2012-11-12','Veterans Day',],
                        [9,'2012-11-22','Thanksgiving Day',],
                        [10,'2012-12-25','Christmas Day',],
                        [11,'2013-01-01','New Year Day',],
                        [12,'2013-01-21','Martin Luther King Jr. Day',],
                        [13,'2013-02-18','Presidents Day (Washingtons Birthday)',],
                        [14,'2013-05-27','Memorial Day',],
                        [15,'2013-07-04','Independence Day',],
                        [16,'2013-09-02','Labor Day',],
                        [17,'2013-10-14','Columbus Day',],
                        [18,'2013-11-11','Veterans Day',],
                        [19,'2013-11-28','Thanksgiving Day',],
                        [20,'2013-12-25','Christmas Day',],
                        [21,'2014-01-01','New Year Day',],
                        [22,'2014-01-20','Martin Luther King Jr. Day',],
                        [23,'2014-02-17','Presidents Day (Washingtons Birthday)',],
                        [24,'2014-05-26','Memorial Day',],
                        [25,'2014-07-04','Independence Day',],
                        [26,'2014-09-01','Labor Day',],
                        [27,'2014-10-13','Columbus Day',],
                        [28,'2014-11-11','Veterans Day',],
                        [29,'2014-11-27','Thanksgiving Day',],], ['i', 'date', 'name']).do(dt, 'date')
>>> cal = calendar('US', holidays.date, t0 = 2012, t1 = 2015)
>>> assert not cal.is_bday(dt(2013,9,2))        # Labor day
>>> cached_calendar = calendar('US')
>>> assert not cached_calendar.is_bday(dt(2013,9,2))   # Labor day
>>> assert cal.adjust(dt(2013,9,2)) == dt(2013,9,3)
>>> assert cal.drange(dt(2013,9,0), dt(2013,9,7), '1b') == [dt(2013,8,30), dt(2013,9,3), dt(2013,9,4), dt(2013,9,5), dt(2013,9,6),] ## skipped labour day and weekend prior
>>> assert cal.bdays(dt(2013,9,0), dt(2013,9,7)) == 5
adjust(date, adj=None)

adjust a non-business day to prev/following bussiness date

Parameters

date : datetime. adj : None or p/f/m

adjustment convention: ‘prev/following/modified following’

Returns

dateime

nearby business day

dt_bump(t, bump, adj=None)

adds a bump to a date

Parameters

tdatetime

date to bump.

bumpint, str

bump e.g. ‘-1y’ or ‘1b’ or 3

adjadjustement type

The default is None.

Returns

datetime

bumped date.

is_trading(date=None)

calculates if we are within a trading session

Parameters

datedatetime, optional

the time & date we want to check. The default is None (i.e. now)

Returns

bool:

are we within a trading session

trade_date(date=None, adj=None)

This is very similar for adjust, but it also takes into account the time of the day. if day_start = 0 and day_end = 23:59:59 then this is exactly adjust.

Parameters

datedatetime, optional

date (with time). The default is None.

adjf/p, optional

If date isn’t within trading day, which direction to adjust to? The default is None.

Example

>>> from pyg import *; import datetime
>>> uk = calendar('UK', day_start = 8, day_end = 17)
>>> assert uk.trade_date(dt(2021,2,9,5), 'f') == dt(2021, 2, 9)  # Tuesday morning rolls into Tuesday
>>> assert uk.trade_date(dt(2021,2,9,5), 'p') == dt(2021, 2, 8)  # Tuesday morning back into Monday
>>> assert uk.trade_date(dt(2021,2,7,5), 'f') == dt(2021, 2, 8)  # Sunday rolls into Monday
>>> assert uk.trade_date(dt(2021,2,7,5), 'p') == dt(2021, 2, 5)  # Sunday rolls back to Friday
>>> assert uk.trade_date(date = dt(2021,2,9,23), adj = 'f') == dt(2021, 2, 10)  # Tuesday eve rolls into Wed
>>> assert uk.trade_date(date = dt(2021,2,9,23), adj = 'p') == dt(2021, 2, 9)  # Tuesday eve back into Tuesday
>>> assert uk.trade_date(date = dt(2021,2,7,23), adj = 'f') == dt(2021, 2, 8)  # Sunday rolls into Monday
>>> assert uk.trade_date(date = dt(2021,2,7,23), adj = 'p') == dt(2021, 2, 5)  # Sunday rolls back to Friday
>>> assert uk.trade_date(date = dt(2021,2,9,12), adj = 'f') == dt(2021, 2, 9)  # Tuesday is Tuesday
>>> assert uk.trade_date(date = dt(2021,2,9,12), adj = 'p') == dt(2021, 2, 9)  # Tuesday is Tuesday
>>> au = calendar('AU', day_start = 2230, day_end = 1300)
>>> assert au.trade_date(dt(2021,2,9,5), 'f') == dt(2021, 2, 9)  # Tuesday morning in session
>>> assert au.trade_date(dt(2021,2,9,5), 'p') == dt(2021, 2, 9)  # Tuesday morning in session
>>> assert au.trade_date(dt(2021,2,7,5), 'f') == dt(2021, 2, 8)  # Sunday rolls into Monday
>>> assert au.trade_date(dt(2021,2,7,5), 'p') == dt(2021, 2, 5)  # Sunday rolls back to Friday
>>> assert au.trade_date(date = dt(2021,2,9,23), adj = 'f') == dt(2021, 2, 10)  # Tuesday eve rolls into Wed
>>> assert au.trade_date(date = dt(2021,2,9,23), adj = 'p') == dt(2021, 2, 10)  # Already in Wed
>>> assert au.trade_date(date = dt(2021,2,7,23), adj = 'f') == dt(2021, 2, 8)  # Sunday rolls into Monday
>>> assert au.trade_date(date = dt(2021,2,7,23), adj = 'p') == dt(2021, 2, 8)  # Already on Monday
>>> assert au.trade_date(date = dt(2021,2,5,23), adj = 'f') == dt(2021, 2, 8)  # Friday afternoon rolls into Monday
>>> assert au.trade_date(date = dt(2021,2,9,14), adj = 'f') == dt(2021, 2, 10)  # Tuesday is over, roll to Wed
>>> assert au.trade_date(date = dt(2021,2,9,14), adj = 'p') == dt(2021, 2, 9)  # roll back to Tues

calendar

pyg.base._drange.calendar(key=None, holidays=None, weekend=None, t0=None, t1=None, day_start=0, day_end=235959)

A function to returns either an existing calendar or construct a new one. - calendar(‘US’) will return a US calendar if that is already cached - calendar(‘US’, us_holiday_dates) will construct a calendar with holiday dates and then cache it

as_time

pyg.base._drange.as_time(t=None)

parses t into a datetime.time object

Example

>>> assert as_time('10:30:40') == datetime.time(10, 30, 40)
>>> assert as_time('103040') == datetime.time(10, 30, 40)
>>> assert as_time('10:30') == datetime.time(10, 30)
>>> assert as_time('1030') == datetime.time(10, 30)
>>> assert as_time('05') == datetime.time(5)
>>> assert as_time(103040) == datetime.time(10, 30, 40)
>>> assert as_time(13040) == datetime.time(1, 30, 40)
>>> assert as_time(130) == datetime.time(1, 30)
>>> assert as_time(datetime.time(1, 30)) == datetime.time(1, 30)
>>> assert as_time(datetime.datetime(2000, 1, 1, 1, 30)) == datetime.time(1, 30)
tstr/int/datetime.time/datetime.datetime

time of day

datetime.time

clock

pyg.base._drange.clock(ts, time=None, t=None)

returns a vector marking the passage of time.

Parameters

ts : timeseries time : None, a string or a Calendar, or already a timeseries of times

None: Will increment by 1 every non-nan observation ‘i’ : increment by 1 every date in index (nan or not) ‘b’ : weekdays distance ‘d’ : day-distance (ignore intraday stamp) ‘f’ : fraction-of-day-distance (do not ignore intraday stamp) ‘m’ : month-distance ‘q’ : quarter-distance ‘y’ : year-distance calendar: uses the business-days distance between any two dates

t: starting time in the past.

Returns

an array

an increasing array of time such that distance between points match the above.

Example

>>> from pyg import *
>>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9))), np.arange(1,11))
>>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), t = 5), np.arange(6,16))
>>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), 'i'), np.arange(1,11))
>>> assert eq(clock(pd.Series(np.arange(10), drange(2000, 9)), 'b'), np.array([26090, 26090, 26090, 26091, 26092, 26093, 26094, 26095, 26095, 26095]))
>>> assert eq(clock(pd.Series(np.arange(10), drange(2000, '9b', '1b')), 'b'), np.arange(26090, 26100))
>>> assert eq(clock(np.arange(10)), np.arange(1,11))
>>> assert eq(clock(pd.Series(np.arange(10)), t = 5), np.arange(6,16))
>>> assert eq(clock(np.arange(10), 'i'), np.arange(1,11))

text manipulation

lower

pyg.base._txt.lower(value)
equivalent to txt.lower() but:
  • does not throw on non-string

  • supports lists/dicts

Example

>>> assert lower(['The Brown Fox',1]) == ['the brown fox',1]
>>> assert lower(dict(a = 'The Brown Fox', b = 3.0)) ==  {'a': 'the brown fox', 'b': 3.0}

upper

pyg.base._txt.upper(value)
equivalent to txt.upper() but:
  • does not throw on non-string

  • supports lists/dicts

Example

>>> assert upper(['The Brown Fox',1]) == ['THE BROWN FOX',1]
>>> assert upper(dict(a = 'The Brown Fox', b = 3.0)) ==  {'a': 'THE BROWN FOX', 'b': 3.0}

proper

pyg.base._txt.proper(value)
equivalent to Excel’s PROPER(txt) but:
  • does not throw on non-string

  • supports lists/dicts

Example

>>> assert proper(['THE BROWN FOX',1]) == ['The Brown Fox',1]
>>> assert proper(dict(a = 'THE BROWN FOX', b = 3.0)) ==  {'a': 'The Brown Fox', 'b': 3.0}

capitalize

pyg.base._txt.capitalize(value)
equivalent to text.capitalize() but:
  • does not throw on non-string

  • supports lists/dicts

Example

>>> assert capitalize('alan howard') == 'Alan howard' # use proper to get Alan Howard
>>> assert capitalize(['alan howard', 'donald trump']) == ['Alan howard', 'Donald trump'] # use proper?

strip

pyg.base._txt.strip(value)
equivalent to txt.strip() but:
  • does not throw on non-string

  • supports lists/dicts

Example

>>> assert strip([' whatever you say  ','  whatever you do..   ']) == ['whatever you say', 'whatever you do..']
>>> assert strip(dict(a = ' whatever you say  ', b = 3.0)) ==  {'a': 'whatever you say', 'b': 3.0}

split

pyg.base._txt.split(text, sep=' ', dedup=False)
equivalent to txt.split(sep) but supporsts:
  • does not throw on non-string

  • removal of multiple seps

  • ensuring there is a unique single separator

Parameters

textstr

text to be stipped.

sepstr, list of str, optional

text used to strip. The default is ‘ ‘.

dedupbool, optional

If True, will remove duplicated instances of seps. The default is False.

Returns

str

splitted text

Example

>>> text = '   The quick... brown .. fox... '    
>>> assert split(text) == ['', '', '', 'The', 'quick...', 'brown', '..', 'fox...', '']
>>> assert split(text, [' ', '.'], True) == ['The', 'quick', 'brown', 'fox']
>>> text = dict(a = 'Can split this', b = '..and split this too')
>>> assert split(text, [' ', '.'], True) == {'a': ['Can', 'split', 'this'], 'b': ['and', 'split', 'this', 'too']}

replace

pyg.base._txt.replace(text, old, new=None)

A souped up version of text.replace(old, new)

Example

replace continues to replace until no-more is found

>>> assert replace('this    has lots  of   double    spaces', ' '*2, ' ') == 'this has lots of double spaces'
>>> assert replace('this, sentence? has! too, many, punctuations!', list(',?!.')) == 'this sentence has too many punctuations'
>>> assert replace(dict(a = 1, b = [' text within a list ', 'and within a dict']), ' ') == {'a': 1, 'b': ['textwithinalist', 'andwithinadict']}

common_prefix

pyg.base._txt.common_prefix(*values)
Parameters

*valueslist of iterables

values for which we want to find common prefix

Returns

iterable

the common prefix.

Example

>>> assert common_prefix(['abra', 'abba', 'abacus']) == 'ab'
>>> assert common_prefix('abra', 'abba', 'abacus') == 'ab'
>>> assert common_prefix() is None
>>> assert common_prefix([1,2,3,4], [1,2,3,5,8]) == [1,2,3]

files & directory

mkdir

pyg.base._file.mkdir(path)

makes a new directory if not exists. It works if path is a filename too.

read_csv

pyg.base._file.read_csv(path)

light-weight csv reader, unlike pandas heavy duty :-)

tree manipulation

Trees are dicts of dicts. just like an item in a dict is (key, value), tree items are just longer tuples: (key1, key2, key3, value) We deliberately avoid creating a tree class so that the functionality is available on ordinary tree-like structures.

tree_keys

pyg.base._dict.tree_keys(tree, types=None)

returns the keys (branches) of a tree as a list of of tuples

Example

>>> tree = dict(a = 1, b = dict(c = 2, d = 3, e = dict(f = 4)))
>>> assert tree_keys(tree) == [('a',), ('b', 'c'), ('b', 'd'), ('b', 'e', 'f')]
Parameters

tree : tree (dict of dicts) types : types of dicts, optional

tree_values

pyg.base._dict.tree_values(tree, types=None)

returns the values (leaf) of a tree (a collection of tuples)

Example

>>> tree = dict(a = 1, b = dict(c = 2, d = 3, e = dict(f = 4)))
>>> assert tree_values(tree) == [1,2,3,4]
Parameters

tree : tree (dict of dicts) types : types of dicts, optional

tree_items

pyg.base._dict.tree_items(tree, types=None)

An extension of dict.items(), returning a list of tuples but of varying length, each a branch of a tree

Parameters

treedict of dicts

a tree of data.

typesdict or a list of dict-types, optional

The types that we consider as ‘branches’ of the tree. Default is (dict, Dict, dictattr).

Returns

a list of tuples

these are an extension of dict.items() and are of varying length

Example

>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'),
                      id2 = dict(name = 'adam', surname = 'smith', gender = 'm'),
                      id3 = dict(name = 'michell', surname = 'obama', gender = 'f'),
                      ),
        teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3),
                        english = dict(name = 'william', surname = 'shakespeare', grade = 3),
                        physics = dict(name = 'richard', surname = 'feyman', grade = 4)
                        ))
>>> items = tree_items(school)
>>> items 
>>> [('pupils', 'id1', 'name', 'james'),
>>>  ('pupils', 'id1', 'surname', 'maxwell'),
>>>  ('pupils', 'id1', 'gender', 'm'),
>>>  ('pupils', 'id2', 'name', 'adam'),
>>>  ('pupils', 'id2', 'surname', 'smith'),
>>>  ('pupils', 'id2', 'gender', 'm'),
>>>  ('pupils', 'id3', 'name', 'michell'),
>>>  ('pupils', 'id3', 'surname', 'obama'),
>>>  ('pupils', 'id3', 'gender', 'f'),
>>>  ('teachers', 'math', 'name', 'albert'),
>>>  ('teachers', 'math', 'surname', 'einstein'),
>>>  ('teachers', 'math', 'grade', 3),
>>>  ('teachers', 'english', 'name', 'william'),
>>>  ('teachers', 'english', 'surname', 'shakespeare'),
>>>  ('teachers', 'english', 'grade', 3),
>>>  ('teachers', 'physics', 'name', 'richard'),
>>>  ('teachers', 'physics', 'surname', 'feyman'),
>>>  ('teachers', 'physics', 'grade', 4)]

#To reverse this, we call:

>>> assert items_to_tree(items) == school

tree_update

pyg.base._dict.tree_update(tree, update, types=(<class 'dict'>, <class 'pyg.base._dict.Dict'>, <class 'pyg.base._dictattr.dictattr'>), ignore=None)

equivalent to dict.update() except: not in-place and also updates further down the tree

Example

>>> ranking = dict(cambridge = dict(trinity = 1, stjohns = 2, christ = 3), 
            oxford = dict(trinity = 1, jesus = 2, magdalene = 3))
>>> new_ranking = dict(oxford = dict(wolfson = 3, magdalene = 4))
>>> print(tree_repr(tree_update(ranking, new_ranking)))
>>> cambridge:
>>>     {'trinity': 1, 'stjohns': 2, 'christ': 3}
>>> oxford:
>>>     {'trinity': 1, 'jesus': 2, 'magdalene': 4, 'wolfson': 3}

Note how values for magdalene in Oxford were overwritten even though they are further down the tree

Example

using ignore

>>> update = dict(a = None, b = np.nan, c = 0)
>>> tree = dict(a = 1, b = 2, c = 3)
>>> assert tree_update(tree, update) == update
>>> assert tree_update(tree, update, ignore = [None]) == dict(a = 1, b = np.nan, c = 0)
>>> assert tree_update(tree, update, ignore = [None, np.nan]) == dict(a = 1, b = 2, c = 0)
>>> assert tree_update(tree, update, ignore = [None, np.nan, 0]) == tree
Parameters

treetree

existing tree.

updatetree

new information.

typestypes, optional

see tree_items. The default is (dict, Dict, dictattr).

Returns

tree

updated tree.

tree_setitem

pyg.base._dict.tree_setitem(tree, key, value, ignore=None, types=None)

sets an item of a tree

Parameters

tree : tree (dicts of dict) key : a dot-separated string or a tuple of values

the branch to hang value on

valueobject

the leaf at the end of the branch

ignoreNone or list, optional

what values of leaf will be ignored and not overwrite existing data. The default is None.

typestypes, optional

As we go down the tree, when do we stop and say: what we have is a leaf already?

Example

>>> tree = dict()
>>> tree_setitem(tree, 'a', 1)
>>> assert tree == dict(a = 1)
>>> tree_setitem(tree, 'b.c', 2)
>>> assert tree == {'a': 1, 'b': {'c': 2}}
>>> tree_setitem(tree, ('b','c','d'), 2)
>>> tree_setitem(tree, ('b','c','e'), 3)
>>> assert tree == {'a': 1, 'b': {'c': {'d': 2, 'e': 3}}}
Example

types

>>> from pyg import *
>>> tree = dict(mycell = cell(lambda a, b: a * b, b = 2, a = cell(lambda x: x**2, x = cell(lambda y: y*3))))
>>> # We are missing y....
>>> tree_setitem(tree, 'mycell.a.x.y', 3, types = (dict,cell)) ## drill into cell
>>> assert tree['mycell'].a.x.y == 3
>>> tree_setitem(tree, 'mycell.a.x.y', 1) ## stop when you hit cell
>>> assert tree['mycell'].a.x == dict(y = 1)

None.

tree_repr

pyg.base._tree_repr.tree_repr(value, offset=0)

a cleaner representation of a tree

Example

>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'),
>>>                   id2 = dict(name = 'adam', surname = 'smith', gender = 'm'),
>>>                   id3 = dict(name = 'michell', surname = 'obama', gender = 'f'),
>>>                   ),
>>>     teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3),
>>>                     english = dict(name = 'william', surname = 'shakespeare', grade = 3),
>>>                     physics = dict(name = 'richard', surname = 'feyman', grade = 4)
>>>                     ))
>>> print(tree_repr(school, 4))
>>> pupils:
>>>     id1:
>>>         {'name': 'james', 'surname': 'maxwell', 'gender': 'm'}
>>>     id2:
>>>         {'name': 'adam', 'surname': 'smith', 'gender': 'm'}
>>>     id3:
>>>         {'name': 'michell', 'surname': 'obama', 'gender': 'f'}
>>> teachers:
>>>     math:
>>>         {'name': 'albert', 'surname': 'einstein', 'grade': 3}
>>>     english:
>>>         {'name': 'william', 'surname': 'shakespeare', 'grade': 3}
>>>     physics:
>>>         {'name': 'richard', 'surname': 'feyman', 'grade': 4}
Parameters

value : a tree

offsetint, optional

offset from the left for printing. The default is 0.

Returns

string

a tree-like string representation of a dict-of-dicts.

items_to_tree

pyg.base._dict.items_to_tree(items, tree=None, raise_if_duplicate=True, ignore=None, types=None)

converts items to branches of a tree. If an original tree is provided, hang the additional branches on the existing tree If ignore is provided as a list of values, will not overwrite branches with last value (the leaf) in these values

Example

>>> items = [('cambridge', 'smith', 'economics',),
         ('cambridge', 'keynes', 'economics'), 
         ('cambridge', 'lyons',  'maths'),
         ('cambridge', 'maxwell', 'maths'),
         ('oxford', 'penrose', 'maths'),
         ]
>>> tree = items_to_tree(items)
>>> print(tree_repr(tree))
>>> cambridge:
>>>     smith:
>>>         economics
>>>     keynes:
>>>         economics
>>>     lyons:
>>>         maths
>>>     maxwell:
>>>         maths
>>> oxford:
>>>     {'penrose': 'maths'}

We can add to tree:

Parameters

itemslist of tuples,

items are just like dict items, only longer,

treetree, optional

a pre-existing tree of trees. The default is None.

raise_if_duplicateTYPE, optional

DESCRIPTION. The default is True.

ignorelist, optional

list of values that when over-writing an existing tree, should ignore. The default is None.

Example

using ignore

>>> tree = dict(a = 1, b = 'keep_old_value')
>>> update = dict(a = 'valid_new_value', b = None, c = None)
>>> tree_update(tree, update, ignore = [None])
>>> {'a': valid_new_value, 'b': 'keep_old_value', 'c': None}
  • a is over-ridden as the new value is valid

  • b is not over-ridden since the update b = None is considereed invalid

  • c is added as it did not exist before, even though c = None is invalid value

Returns

tree : dict of dicts

tree_to_table

pyg.base._tree.tree_to_table(tree, pattern)

The best way to understand is to give an example:

Examples

>>> school = dict(pupils = dict(id1 = dict(name = 'james', surname = 'maxwell', gender = 'm'),
                          id2 = dict(name = 'adam', surname = 'smith', gender = 'm'),
                          id3 = dict(name = 'michell', surname = 'obama', gender = 'f'),
                          ),
            teachers = dict(math = dict(name = 'albert', surname = 'einstein', grade = 3),
                            english = dict(name = 'william', surname = 'shakespeare', grade = 3),
                            physics = dict(name = 'richard', surname = 'feyman', grade = 4)
                            ))

Suppose we wanted to identify all male students:

>>> res = tree_to_table(school, 'pupils/%id/gender/m')
>>> assert res == [dict(id = 'id1'), dict(id = 'id2')]

or grades:

>>> res = tree_to_table(school, 'teachers/%subject/grade/%grade')
>>> assert res == [{'grade': 3, 'subject': 'math'},
                     {'grade': 3, 'subject': 'english'},
                     {'grade': 4, 'subject': 'physics'}]
Parameters

treetree (dict of dicts)

tree is a yaml-like structure

patternstring

The pattern whose instances we wish to find in tree

Returns

list of dicts

list functions

as_list

pyg.base._as_list.as_list(value, none=False)

returns a list of the original object.

Example

>>> assert as_list(None) == []
>>> assert as_list(4) == [4]
>>> assert as_list((1,2,)) == [1,2]
>>> assert as_list([1,2,]) == [1,2]
>>> assert eq(as_list(np.array([1,2,])) , [np.array([1,2,])])
>>> assert as_list(dict(a = 1)) == [dict(a=1)]

In practice, this function is has an incredible useful usage:

Example

using as_list to give flexibility on *args

>>> def my_sum(*values):
>>>     values = as_list(values)
>>>     return sum(values)
>>> assert my_sum(1,2,3) == 6    
>>> assert my_sum([1,2,3]) == 6 ## This is nice... wasn't possible before
Parameters

value : anything none : bool optional

Shall I return None as a value? The default is False and we return [], if True, returns [None]

Returns

list

a list of original objects.

as_tuple

pyg.base._as_list.as_tuple(value, none=False)

returns a tuple of the original object.

Example

>>> assert as_tuple(None) == ()
>>> assert as_tuple(4) == (4,)
>>> assert as_tuple((1,2,)) == (1,2)
>>> assert as_tuple([1,2,]) == (1,2)
>>> assert eq(as_tuple(np.array([1,2,])) , (np.array([1,2,]),))
>>> assert as_tuple(dict(a = 1)) == (dict(a=1),)

In practice, this function is has an incredible useful usage:

Example

using as_list to give flexibility on *args

>>> def my_sum(*values):
>>>     values = as_tuple(values)
>>>     return sum(values)
>>> assert my_sum(1,2,3) == 6    
>>> assert my_sum([1,2,3]) == 6 ## This is nice... wasn't possible before
Parameters

value : anything none : bool optional

Shall I return None as a value? The default is False and we return [], if True, returns [None]

Returns

tuple

a tuple of original objects.

first

pyg.base._as_list.first(value)

returns the first value in a list (None if empty list) or the original if value not a list

Example

>>> assert first(5) == 5
>>> assert first([5,5]) == 5
>>> assert first([]) is None
>>> assert first([1,2]) == 1

last

pyg.base._as_list.last(value)

returns the last value in a list (None if empty list) or the original if value not a list

Example

>>> assert last(5) == 5
>>> assert last([5,5]) == 5
>>> assert last([]) is None
>>> assert last([1,2]) == 2

unique

pyg.base._as_list.unique(value)

returns the asserted unique value in a list (None if empty list) or the original if value not a list. Throws an exception if list non-unique

Example

>>> assert unique(5) == 5
>>> assert unique([5,5]) == 5
>>> assert unique([]) is None
>>> with pytest.raises(ValueError):
>>>     unique([1,2])  

Comparing and Sorting

cmp

pyg.base._sort.cmp(x, y)

Implements lexcompare while allowing for comparison of different types. First compares by type, then by length, then by keys and finally on values

Parameters

xobj

1st object to be compared.

yobj

2nd object to be compared.

Returns

int

returns -1 if x<y else 1 if x>y else 0

Examples

>>> assert cmp('2', 2) == 1
>>> assert cmp(np.int64(2), 2) == 0
>>> assert cmp(None, 2.0) == -1 # None is smallest
>>> assert cmp([1,2,3], [4,5]) == 1 # [1,2,3] is longer
>>> assert cmp([1,2,3], [1,2,0]) == 1 # lexical sorting 
>>> assert cmp(dict(a = 1, b = 2), dict(a = 1, c = 2)) == -1 # lexical sorting on keys
>>> assert cmp(dict(a = 1, b = 2), dict(b = 2, a = 1)) == 0 # order does not matter

Cmp

pyg.base._sort.Cmp(x)

class wrapper of cmp, allowing us to compare objects of different types

Example

>>> with pytest.raises(TypeError):
>>>     sorted([1,2,3,None])
>>> # but this is fine:
>>> assert sorted([1,3,2,None], key = Cmp) == [None, 1, 2, 3]

sort

pyg.base._sort.sort(iterable)

implements sorting allowing for comparing of not-same-type objects

Parameters

iterableiterable

values to be sorted

Returns

list

sorted list.

Example

>>> with pytest.raises(TypeError):
>>>     sorted([1,2,3,None])
>>> # but this is fine:
>>> sort([1,3,2,None]) == [None, 1, 2, 3]

eq

pyg.base._eq.eq(x, y)

A better nan-handling equality comparison. Here is the problem:

>>> import numpy as np
>>> assert not np.nan == np.nan  ## What?

The nan issue extends to np.arrays…

>>> assert list(np.array([np.nan,2]) == np.array([np.nan,2])) == [False, True]

but not to lists…

>>> assert [np.nan] == [np.nan]

But wait, if the lists are derived from np.arrays, then no equality…

>>> assert not list(np.array([np.nan])) == list(np.array([np.nan]))

The other issue is inheritance:

>>> class FunnyDict(dict):
>>>    def __getitem__(self, key):
>>>        return 5
>>> assert dict(a = 1) == FunnyDict(a=1) ## equality seems to ignore any type mismatch
>>> assert not dict(a = 1)['a'] == FunnyDict(a = 1)['a'] 

There are also issues with partial

>>> from functools import partial
>>> f = lambda a: a + 1    
>>> x = partial(f, a = 1)
>>> y = partial(f, a = 1)    
>>> assert not x == y
>>> import pandas as pd
>>> import pytest
>>> from pyg import eq
>>> assert eq(np.nan, np.nan) ## That's better
>>> assert eq(x = np.array([np.nan,2]), y = np.array([np.nan,2]))    
>>> assert eq(np.array([np.array([1,2]),2], dtype = 'object'), np.array([np.array([1,2]),2], dtype = 'object'))
>>> assert eq(np.array([np.nan,2]),np.array([np.nan,2]))    
>>> assert eq(dict(a = np.array([np.array([1,2]),2], dtype = 'object')) ,  dict(a = np.array([np.array([1,2]),2], dtype = 'object')))
>>> assert eq(dict(a = np.array([np.array([1,np.nan]),np.nan])) ,  dict(a = np.array([np.array([1,np.nan]),np.nan])))
>>> assert eq(np.array([np.array([1,2]),dict(a = np.array([np.array([1,2]),2]))]), np.array([np.array([1,2]),dict(a = np.array([np.array([1,2]),2]))]))
>>> assert not eq(dict(a = 1), FunnyDict(a=1))    
>>> assert eq(1, 1.0)
>>> assert eq(x = pd.DataFrame([1,2]), y = pd.DataFrame([1,2]))
>>> assert eq(pd.DataFrame([np.nan,2]), pd.DataFrame([np.nan,2]))
>>> assert eq(pd.DataFrame([1,np.nan], columns = ['a']), pd.DataFrame([1,np.nan], columns = ['a']))
>>> assert not eq(pd.DataFrame([1,np.nan], columns = ['a']), pd.DataFrame([1,np.nan], columns = ['b']))

in

pyg.base._eq.in_(x, seq)

Evaluates if x is in seq, avoiding issues such as these:

>>> s = pd.Series([1,2,3])
>>> with pytest.raises(ValueError):
>>>     s in [None]    
>>> assert not in_(s, [None])
>>> assert in_(s, [None, s])    

bits and pieces

type functions

pyg.base._types.is_arr(value)

is value a numpy array of non-zero-size

pyg.base._types.is_bool(value)

is value a Bool, or a np.bool_ type

pyg.base._types.is_date(value)

is value a date type: either datetime.date, datetime.datetime or np.datetime64

pyg.base._types.is_df(value)

is value a pd.DataFrame

pyg.base._types.is_dict(value)

is value a dict

pyg.base._types.is_float(value)

is value an float, or any variant of np.float

pyg.base._types.is_int(value)

is value an int, or any variant of np.intN type

pyg.base._types.is_iterable(value)

is value Iterable excluding a string

pyg.base._types.is_len(value)

is value of zero length (or has no len at all)

pyg.base._types.is_list(value)

is value a list

pyg.base._types.is_nan(value)

is value a nan or an inf. Unlike np.isnan, works for non numeric

pyg.base._types.is_none(value)

is value None

pyg.base._types.is_num(value)

is _int(value) or is_float(value)

pyg.base._types.is_pd(value)

is value a pd.DataFrame/pd.Series

pyg.base._types.is_series(value)

is value a pd.Series

pyg.base._types.is_str(value)

is value a str, or a np.str_ type

pyg.base._types.is_ts(value)

is value a pandas datafrome whch is indexed by datetimes

pyg.base._types.is_tuple(value)

is value a tuple

pyg.base._types.nan2none(value)

convert np.nan/np.inf to None

zipper

pyg.base._zip.zipper(*values)

a safer version of zip

Examples

zipper works with single values as well as full list:

>>> assert list(zipper([1,2,3], 4)) == [(1, 4), (2, 4), (3, 4)]
>>> assert list(zipper([1,2,3], [4,5,6])) == [(1, 4), (2, 5), (3, 6)]
>>> assert list(zipper([1,2,3], [4,5,6], [7])) ==  [(1, 4, 7), (2, 5, 7), (3, 6, 7)]
>>> assert list(zipper([1,2,3], [4,5,6], None)) ==  [(1, 4, None), (2, 5, None), (3, 6, None)]
>>> assert list(zipper((1,2,3), np.array([4,5,6]), None)) ==  [(1, 4, None), (2, 5, None), (3, 6, None)]
Examples

zipper rejects multi-length lists

>>> import pytest
>>> with pytest.raises(ValueError):
>>>     zipper([1,2,3], [4,5])
Parameters

*valueslists

values to be zipped

Returns

zipped values

reducer

pyg.base._reducer.reducer(function, sequence, default=None)

reduce adds stuff to zero by defaults. This is not needed.

Parameters

functioncallable

binary function.

sequenceiterable

list of inputs to be applied iteratively to reduce.

defaultTYPE, optional

A default value to be returned with an empty sequence

Example

>>> from operator import add, mul
>>> from functools import reduce
>>> import pytest
>>> assert reducer(add, [1,2,3,4]) == 10
>>> assert reducer(mul, [1,2,3,4]) == 24
>>> assert reducer(add, [1]) == 1
>>> assert reducer(add, []) is None
>>> with pytest.raises(TypeError):
>>>     reduce(add, [])

reducing

class pyg.base._reducer.reducing(function=None, *args, **kwargs)

Makes a bivariate-function being able to act on a sequence of elements using reduction

Example

>>> from operator import mul
>>> assert reducing(mul)([1,2,3,4]) == 24    
>>> assert reducing(mul)(6,4) == 24    

Since a.join(b).join(c).join(d) is also very common, we provide a simple interface for that:

Example

chaining

>>> assert reducing('__add__')([1,2,3,4]) == 10
>>> assert reducing('__add__')(6,4) == 10

d = dictable(a = [1,2,3,5,4]) reducing(‘inc’)(d, dict(a=1))

logger and get_logger

pyg.base._logger.get_logger(name='pyg', level='info', fmt='%(asctime)s - %(name)s - %(levelname)s - %(message)s', file=False, console=True)

quick utility to simplify loggers creation and ensure we cache them and do not add to many handlers

Parameters

namestr, optional

name of logger. The default is ‘pyg’.

levelstr, optional

DEBUG/INFO/WARN etc. The default is ‘info’.

fmtstr, optional

string formatting for messages. The default is ‘%(asctime)s - %(name)s - %(levelname)s - %(message)s’.

filebool/str, optional

the name of the file to log to. The default is False = do not log to file.

consolebool, optional

log to console? The default is True.

Returns

logging.logger

access functions

These are useful to convert object-oriented code to declarative functions

pyg.base._getitem.callattr(value, attr, args=None, kwargs=None)

gets the attribute(s) from a value and calls its

Example

>>> from pyg import *
>>> value = Dict(function = lambda a, b: a + b)
>>> assert callattr(value, 'function', kwargs = dict(a = 1, b = 2)) == 3
>>> assert callattr(value, attr = 'function', args = (1, 2), kwargs = None) == 3
>>> ts = pd.Series(np.random.normal(0,1,1000))    
>>> assert ts.std() == callattr(ts, 'std')
>>> assert eq(ts.ewm(com = 10).mean(), callattr(ts, ['ewm','mean'], kwargs = [{'com':10}, {}]))
>>> d = dictable(a = [1,2,3,4,1,2], b = list('abcdef'))
>>> assert callattr(d, ['inc', 'exc'], kwargs = [dict(a = 2), dict(b = 'f')]) == d.inc(a = 2).exc(b = 'f')
valueobj

object that contrains an item.

attrstring(s)

key within object.

argstuple, optional

tuple of values to be fed to function. The default is None.

kwargsdict, optional

kwargs to be fed to the method. The default is None.

pyg.base._getitem.callitem(value, key, args=None, kwargs=None)

gets an item and calls it

Example

>>> c = dict(function = lambda a, b: a + b)
>>> assert callitem(c, 'function', kwargs = dict(a = 1, b = 2)) == 3
>>> assert callitem(c, 'function', args = (1, 2)) == 3
valueobj

object that contrains an item.

keystring

key within object.

argstuple, optional

tuple of values to be fed to function. The default is None.

kwargsdict, optional

kwargs to be fed to the method. The default is None.

pyg.base._getitem.getitem(value, key, *default)

gets an item, like getattr

Example

>>> a = dict(a = 1)
>>> assert getitem(a, 'a') == 1
>>> assert getitem(a, 'b', 2) == 2
>>> import pytest
>>> with pytest.raises(KeyError):
>>>     getitem(a, 'b') 

inspection

There are a few functions extending the inspect module.

pyg.base._inspect.argspec_add(fullargspec, **update)

adds new args with default values at the end of the existing args

Parameters

fullargspecFullArgSpec

DESCRIPTION.

**updatedict

parameter names with their default values.

Returns

FullArgSpec

Example

>>> f = lambda b : b
>>> argspec = getargspec(f)
>>> updated = argspec_add(argspec, axis = 0)
>>> assert updated.args == ['b', 'axis'] and updated.defaults == (0,)
>>> f = lambda b, axis : None ## axis already exists without a default
>>> argspec = getargspec(f)
>>> updated = argspec_add(argspec, axis = 0)
>>> assert updated == argspec
>>> f = lambda b, axis =1 : None ## axis already exists with a different default
>>> argspec = getargspec(f)
>>> updated = argspec_add(argspec, axis = 0)
>>> assert updated == argspec
pyg.base._inspect.argspec_defaults(function)
Returns

the function defaults as a dict rather than using the inspect structure

Example

>>> f = lambda a, b = 1: a+b
>>> assert argspec_defaults(f) == dict(b=1)
>>> g = partial(f, b = 2)
>>> assert argspec_defaults(g) == dict(b=2)
Parameters

function : callable

Returns

defaults as a dict.

pyg.base._inspect.argspec_required(function)
Parameters

function : callable

Returns

list

parameters that must be provided in order to run the function

pyg.base._inspect.argspec_update(argspec, **kwargs)

generic function to create new FullArgSpec (python 3) or normal ArgSpec (python 2)

Parameters

argspecFullArgSpec

The argspec of the dunction

**kwargsTYPE

updates

Returns

FullArgSpec

Example

>>> f = lambda a, b =1 : a + b
>>> argspec = getargspec(f)    
>>> assert argspec_update(argspec, args = ['a', 'b', 'c']) == inspect.FullArgSpec(**{'annotations': {},
                                                                             'args': ['a', 'b', 'c'],
                                                                             'defaults': (1,),
                                                                             'kwonlyargs': [],
                                                                             'kwonlydefaults': None,
                                                                             'varargs': None,
                                                                             'varkw': None})
pyg.base._inspect.call_with_callargs(function, callargs)

replicates inspect.getcallargs with support to functions within decorators

Example

>>> function = lambda a, b, *args, **kwargs: 1+b+len(args)+10*len(kwargs)
>>> args = (1,2,3,4,5); kwargs = dict(c = 6, d = 7)
>>> assert function(*args, **kwargs) == 26
>>> callargs = getcallargs(function, *args, **kwargs)
>>> assert call_with_callargs(function, callargs) == 26
pyg.base._inspect.getargs(function, n=0)
Parameters

functioncallable

The function for which we want the args

nint optional

get the name opf the args after allowing for n args to be set by *args. The default is 0.

Returns

None or a list of args

pyg.base._inspect.getargspec(function)

Extends inspect.getfullargspec to allow us to decorate functions with a signature.

Parameters

functioncallable

function for which we want to know argspec.

Returns

inspect.FullArgSpec

pyg.base._inspect.getcallargs(function, *args, **kwargs)

replicates inspect.getcallargs with support to functions within decorators

Example

>>> from pyg import *; import inspect
>>> function = lambda a, b, *myargs, **mykwargs: 1 
>>> args = (1,2,3,4,5); kwargs = dict(c = 6, d = 7)
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == {'a': 1, 'b': 2, 'myargs': (3, 4, 5), 'mykwargs': {'c': 6, 'd': 7}} 
>>> function = lambda a: a + 1
>>> args = (); kwargs = dict(a=1)
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1)
>>> function = lambda a, b = 1: 1
>>> args = (); kwargs = dict(a=1)
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 1)
>>> args = (); kwargs = dict(a=1, b = 2)
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2)
>>> args = (1,); kwargs = {}
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 1)
>>> args = (1,2); kwargs = {}
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2)
>>> args = (1,); kwargs = {'b' : 2}
>>> assert getcallargs(function, *args, **kwargs) == inspect.getcallargs(function, *args, **kwargs) == dict(a = 1, b = 2)
pyg.base._inspect.kwargs2args(function, args, kwargs)

converts a list of paramters that were provided as kwargs, into args

Example

>>> assert kwargs2args(lambda a, b: a+b, (), dict(a = 1, b=2)) == ([1,2], {})
Parameters

function : callable args : tuple

parameters of function.

kwargsdict

key-word parameters of function.

Returns

tuple

a pair of a function args, kwargs.