Hierarchy module

Contains functions to help manage hierarchical data in pandas.

pandas_extras.hierarchy.flatten_adjacency_list(dataframe, parent, right_on=None)[source]

Creates the flattened hierarchy out of an adjancecy list.

>>> df = pd.DataFrame([
...     {'employee': 0, 'manager': None},
...     {'employee': 1, 'manager': 0},
...     {'employee': 2, 'manager': 0},
...     {'employee': 3, 'manager': 0},
...     {'employee': 4, 'manager': 1},
...     {'employee': 5, 'manager': 1},
...     {'employee': 6, 'manager': 2},
...     {'employee': 7, 'manager': 6},
... ])
>>> df.pipe(flatten_adjacency_list, 'manager', right_on='employee')
    employee    manager     manager_1   manager_2
0   0           NaN         NaN         NaN
1   1           0           NaN         NaN
2   2           0           NaN         NaN
3   3           0           NaN         NaN
4   4           1           0           NaN
5   5           1           0           NaN
6   6           2           0           NaN
7   7           6           2           0

>>> df.set_index('employee').pipe(flatten_adjacency_list, 'manager')
            manager     manager_1   manager_2
employee
0           NaN         NaN         NaN
1           0           NaN         NaN
2           0           NaN         NaN
3           0           NaN         NaN
4           1           0           NaN
5           1           0           NaN
6           2           0           NaN
7           6           2           0
Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • parent (str) – The name of the column that contains the parent id.
  • right_on (str) – Name of the primary key column. If not given, the indices will be used.
Returns:

The flattened DataFrame

Return type:

DataFrame

pandas_extras.hierarchy.get_adjacency_list_depth(dataframe, parent, right_on=None, new_column='depth')[source]

Calculates node depth in the adjancecy list hierarchy.

>>> df = pd.DataFrame([
...     {'employee': 0, 'manager': None},
...     {'employee': 1, 'manager': 0},
...     {'employee': 2, 'manager': 0},
...     {'employee': 3, 'manager': 0},
...     {'employee': 4, 'manager': 1},
...     {'employee': 5, 'manager': 1},
...     {'employee': 6, 'manager': 2},
...     {'employee': 7, 'manager': 6},
... ])
>>> df.pipe(get_adjacency_list_depth, 'manager', right_on='employee')
    employee    manager     depth
0   0           NaN         0
1   1           0           1
2   2           0           1
3   3           0           1
4   4           1           2
5   5           1           2
6   6           2           2
7   7           6           3

>>> df.set_index('employee').pipe(
...     get_adjacency_list_depth, 'manager', new_column='level'
... )
            manager     level
employee
0           NaN         0
1           0           1
2           0           1
3           0           1
4           1           2
5           1           2
6           2           2
7           6           3
Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • parent (str) – The name of the column that contains the parent id.
  • right_on (str) – Name of the primary key column. If not given, the indices will be used.
  • new_column (str) – Name of the new column to be created. By default depth will be used.
Returns:

The flattened DataFrame

Return type:

DataFrame