Hierarchy module¶
Contains functions to help manage hierarchical data in pandas.
-
pandas_extras.hierarchy.
flatten_adjacency_list
(dataframe, parent, right_on=None)[source]¶ Creates the flattened hierarchy out of an adjancecy list.
>>> df = pd.DataFrame([ ... {'employee': 0, 'manager': None}, ... {'employee': 1, 'manager': 0}, ... {'employee': 2, 'manager': 0}, ... {'employee': 3, 'manager': 0}, ... {'employee': 4, 'manager': 1}, ... {'employee': 5, 'manager': 1}, ... {'employee': 6, 'manager': 2}, ... {'employee': 7, 'manager': 6}, ... ]) >>> df.pipe(flatten_adjacency_list, 'manager', right_on='employee') employee manager manager_1 manager_2 0 0 NaN NaN NaN 1 1 0 NaN NaN 2 2 0 NaN NaN 3 3 0 NaN NaN 4 4 1 0 NaN 5 5 1 0 NaN 6 6 2 0 NaN 7 7 6 2 0 >>> df.set_index('employee').pipe(flatten_adjacency_list, 'manager') manager manager_1 manager_2 employee 0 NaN NaN NaN 1 0 NaN NaN 2 0 NaN NaN 3 0 NaN NaN 4 1 0 NaN 5 1 0 NaN 6 2 0 NaN 7 6 2 0
Parameters: Returns: The flattened DataFrame
Return type:
-
pandas_extras.hierarchy.
get_adjacency_list_depth
(dataframe, parent, right_on=None, new_column='depth')[source]¶ Calculates node depth in the adjancecy list hierarchy.
>>> df = pd.DataFrame([ ... {'employee': 0, 'manager': None}, ... {'employee': 1, 'manager': 0}, ... {'employee': 2, 'manager': 0}, ... {'employee': 3, 'manager': 0}, ... {'employee': 4, 'manager': 1}, ... {'employee': 5, 'manager': 1}, ... {'employee': 6, 'manager': 2}, ... {'employee': 7, 'manager': 6}, ... ]) >>> df.pipe(get_adjacency_list_depth, 'manager', right_on='employee') employee manager depth 0 0 NaN 0 1 1 0 1 2 2 0 1 3 3 0 1 4 4 1 2 5 5 1 2 6 6 2 2 7 7 6 3 >>> df.set_index('employee').pipe( ... get_adjacency_list_depth, 'manager', new_column='level' ... ) manager level employee 0 NaN 0 1 0 1 2 0 1 3 0 1 4 1 2 5 1 2 6 2 2 7 6 3
Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - parent (str) – The name of the column that contains the parent id.
- right_on (str) – Name of the primary key column. If not given, the indices will be used.
- new_column (str) – Name of the new column to be created. By default depth will be used.
Returns: The flattened DataFrame
Return type: - dataframe (