Transformations module¶
Contains functions to help transform columns data containing complex types, like lists or dictionaries.
-
pandas_extras.transformations.
concatenate_columns
(dataframe, columns, new_column, descriptor=None, mapper=None)[source]¶ Concatenates columns together along the indeces and adds a descriptor column, if specified, with the column name where the data originates from.
>>> df = pd.DataFrame([ ... {'key': 'TICKET-1', 'assignee': 'Bob', 'reporter': 'Alice'}, ... {'key': 'TICKET-2', 'assignee': 'Bob', 'reporter': 'Alice'}, ... {'key': 'TICKET-3', 'assignee': 'Bob', 'reporter': 'Alice'}, ... ]) >>> df.pipe(concatenate_columns, ['assignee', 'reporter'], 'user') key user descriptor 0 'TICKET-1' 'Alice' 'reporter' 0 'TICKET-1' 'Bob' 'assignee' 1 'TICKET-2' 'Alice' 'reporter' 1 'TICKET-2' 'Bob' 'assignee' 2 'TICKET-3' 'Alice' 'reporter' 2 'TICKET-3' 'Bob' 'assignee'
Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - columns – The name of the columns which should be concatenated.
- new_column – Name of the new column.
- descriptor – Name of the new descriptor column.
- mapper – A map to apply to descriptor values
Returns: The concatenated DataFrame
Return type: - dataframe (
-
pandas_extras.transformations.
expand_list
(dataframe, column, new_column=None)[source]¶ Expands lists to new rows.
>>> df = DataFrame({ ... 'trial_num': [1, 2, 3, 1, 2, 3], ... 'subject': [1, 1, 1, 2, 2, 2], ... 'samples': [ ... [1, 2, 3, 4], ... [1, 2, 3], ... [1, 2], ... [1], ... [], ... None, ... ] ... }) >>> df.pipe(expand_list, 'samples', new_column='sample_id').head(7) trial_num subject sample_id 0 1 1 1 0 1 1 2 0 1 1 3 0 1 1 4 1 2 1 1 1 2 1 2 1 2 1 3
Warning
Between calls of
expand_list
and/orexpand_lists
, the dataframe index duplications must be removed, otherwise plenty of duplications will occur.Warning
Calling
expand_list
on multiple columns might cause data duplications, that shall be handled.Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - column – The name of the column which should be extracted.
- new_column – Name of the new columns. If not defined, columns will not be renamed.
Returns: The expanded DataFrame
Return type: - dataframe (
-
pandas_extras.transformations.
expand_lists
(dataframe, columns, new_columns=None)[source]¶ Expands multiple lists to new rows. Pairs elements of lists respective to their index. Pads with
None
to the longest list.>>> df = DataFrame({ ... 'trial_num': [1, 2, 3, 1, 2, 3], ... 'subject': [1, 1, 1, 2, 2, 2], ... 'samples': [ ... [1, 2, 3, 4], ... [1, 2, 3], ... [1, 2], ... [1], ... [], ... None, ... ], ... 'samples2': [ ... [1, 2], ... [1, 2, 3], ... [1, 2], ... [1], ... [], ... None, ... ] ... }) >>> df.pipe( ... expand_lists, ['samples', 'samples'], new_column=['sample_id', 'sample_id2'] ... ).head(7) trial_num subject sample_id sample_id2 0 1 1 1 1 0 1 1 2 2 0 1 1 3 Nan 0 1 1 4 Nan 1 2 1 1 1 1 2 1 2 2 1 2 1 3 3
Warning
Between calls of
expand_list
and/orexpand_lists
, the dataframe index duplications must be removed, otherwise plenty of duplications will occur.Warning
Calling
expand_lists
on multiple columns might cause data duplications, that shall be handled.Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - columns – The name of the columns which should be extracted.
- new_columns – Name of the new columns. If not defined, columns will not be renamed.
Returns: The expanded DataFrame
Return type: - dataframe (
-
pandas_extras.transformations.
extract_dict_key
(dataframe, column, key, new_column=None, separator='.')[source]¶ Extract values of
key
intonew_column
. If key is missing,None
is added to the column.>>> df = DataFrame({ ... 'trial_num': [1, 2, 1, 2], ... 'subject': [1, 1, 2, 2], ... 'samples': [ ... {'A': 1, 'B': 2, 'C': None}, ... {'A': 3, 'B': 4, 'C': 5}, ... {'A': 6, 'B': 7, 'C': None}, ... None, ... ] ...}) >>>df.pipe(extract_dict_key, 'samples', key='A') trial_num subject samples.A samples 0 1 1 1 {'A': 1, 'B': 2, 'C': None} 1 2 1 3 {'A': 3, 'B': 4, 'C': 5} 2 1 2 6 {'A': 6, 'B': 7, 'C': None} 3 2 2 NaN NaN
Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - column (str) – The name of the column which should be extracted.
- key (str) – Key that should be extracted.
- new_column (str) – Name of the new column. By default,
column
will be applied as prefix tokey
. - separator (str) – The separator between
column
andkey
ifnew_column
is not specified.
Returns: The extracted DataFrame
Return type: - dataframe (
-
pandas_extras.transformations.
extract_dictionary
(dataframe, column, key_list=None, prefix=None, separator='.')[source]¶ Extract values of keys in
key_list
into separate columns.>>> df = DataFrame({ ... 'trial_num': [1, 2, 1, 2], ... 'subject': [1, 1, 2, 2], ... 'samples': [ ... {'A': 1, 'B': 2, 'C': None}, ... {'A': 3, 'B': 4, 'C': 5}, ... {'A': 6, 'B': 7, 'C': None}, ... None, ... ] ...}) >>>df.pipe(extract_dictionary, 'samples', key_list=('A', 'B')) trial_num subject samples.A samples.B 0 1 1 1 2 1 2 1 3 4 2 1 2 6 7 3 2 2 NaN NaN
Warning
column
will be dropped from the DataFrame.Parameters: - dataframe (
DataFrame
) – The DataFrame object to work on. - column (str) – The name of the column which should be extracted.
- key_list (list) – Collection of keys that should be extracted. The new column names will be created from the key names.
- prefix (str) – Prefix for new column names. By default,
column
will be applied as prefix. - separator (str) – The separator between the prefix and the key name for new column names.
Returns: The extracted DataFrame
Return type: - dataframe (
-
pandas_extras.transformations.
merge_columns
(dataframe, col_header_list, new_column_name, keep=None, aggr=None)[source]¶ Add a new column or modify an existing one in dataframe called new_column_name by iterating over the rows and select the proper notnull element from the values of col_header_list columns in the given row if keep is filled OR call the aggr function with the values of col_header_list. Only one of (keep, aggr) can be filled.
Parameters: - dataframe – the pandas.DataFrame object to modify
- col_header_list – list of the names of the headers to merge
- new_column_name (str) – the name of the new column, if it already exists the operation will overwrite it
- keep (str) – Specify whether the first or the last proper value is needed. values: first and last as string.
- aggr – Callable function which will get the values of col_header_list as parameter. The return value of this function will be the value in new_column_name
Returns: The merged DataFrame
Return type: