Transformations module

Contains functions to help transform columns data containing complex types, like lists or dictionaries.

pandas_extras.transformations.concatenate_columns(dataframe, columns, new_column, descriptor=None, mapper=None)[source]

Concatenates columns together along the indeces and adds a descriptor column, if specified, with the column name where the data originates from.

>>> df = pd.DataFrame([
...     {'key': 'TICKET-1', 'assignee': 'Bob', 'reporter': 'Alice'},
...     {'key': 'TICKET-2', 'assignee': 'Bob', 'reporter': 'Alice'},
...     {'key': 'TICKET-3', 'assignee': 'Bob', 'reporter': 'Alice'},
... ])
>>> df.pipe(concatenate_columns, ['assignee', 'reporter'], 'user')
    key           user        descriptor
0   'TICKET-1'    'Alice'     'reporter'
0   'TICKET-1'    'Bob'       'assignee'
1   'TICKET-2'    'Alice'     'reporter'
1   'TICKET-2'    'Bob'       'assignee'
2   'TICKET-3'    'Alice'     'reporter'
2   'TICKET-3'    'Bob'       'assignee'
Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • columns – The name of the columns which should be concatenated.
  • new_column – Name of the new column.
  • descriptor – Name of the new descriptor column.
  • mapper – A map to apply to descriptor values
Returns:

The concatenated DataFrame

Return type:

DataFrame

pandas_extras.transformations.expand_list(dataframe, column, new_column=None)[source]

Expands lists to new rows.

>>> df = DataFrame({
...     'trial_num': [1, 2, 3, 1, 2, 3],
...     'subject': [1, 1, 1, 2, 2, 2],
...     'samples': [
...         [1, 2, 3, 4],
...         [1, 2, 3],
...         [1, 2],
...         [1],
...         [],
...         None,
...     ]
... })
>>> df.pipe(expand_list, 'samples', new_column='sample_id').head(7)
    trial_num  subject  sample_id
0           1        1          1
0           1        1          2
0           1        1          3
0           1        1          4
1           2        1          1
1           2        1          2
1           2        1          3

Warning

Between calls of expand_list and/or expand_lists, the dataframe index duplications must be removed, otherwise plenty of duplications will occur.

Warning

Calling expand_list on multiple columns might cause data duplications, that shall be handled.

Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • column – The name of the column which should be extracted.
  • new_column – Name of the new columns. If not defined, columns will not be renamed.
Returns:

The expanded DataFrame

Return type:

DataFrame

pandas_extras.transformations.expand_lists(dataframe, columns, new_columns=None)[source]

Expands multiple lists to new rows. Pairs elements of lists respective to their index. Pads with None to the longest list.

>>> df = DataFrame({
...     'trial_num': [1, 2, 3, 1, 2, 3],
...     'subject': [1, 1, 1, 2, 2, 2],
...     'samples': [
...         [1, 2, 3, 4],
...         [1, 2, 3],
...         [1, 2],
...         [1],
...         [],
...         None,
...     ],
...     'samples2': [
...         [1, 2],
...         [1, 2, 3],
...         [1, 2],
...         [1],
...         [],
...         None,
...     ]
... })
>>> df.pipe(
...     expand_lists, ['samples', 'samples'], new_column=['sample_id', 'sample_id2']
... ).head(7)
    trial_num  subject  sample_id  sample_id2
0           1        1          1           1
0           1        1          2           2
0           1        1          3         Nan
0           1        1          4         Nan
1           2        1          1           1
1           2        1          2           2
1           2        1          3           3

Warning

Between calls of expand_list and/or expand_lists, the dataframe index duplications must be removed, otherwise plenty of duplications will occur.

Warning

Calling expand_lists on multiple columns might cause data duplications, that shall be handled.

Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • columns – The name of the columns which should be extracted.
  • new_columns – Name of the new columns. If not defined, columns will not be renamed.
Returns:

The expanded DataFrame

Return type:

DataFrame

pandas_extras.transformations.extract_dict_key(dataframe, column, key, new_column=None, separator='.')[source]

Extract values of key into new_column. If key is missing, None is added to the column.

>>> df = DataFrame({
...    'trial_num': [1, 2, 1, 2],
...    'subject': [1, 1, 2, 2],
...    'samples': [
...        {'A': 1, 'B': 2, 'C': None},
...        {'A': 3, 'B': 4, 'C': 5},
...        {'A': 6, 'B': 7, 'C': None},
...        None,
...    ]
...})
>>>df.pipe(extract_dict_key, 'samples', key='A')
    trial_num  subject  samples.A                      samples
0           1        1          1  {'A': 1, 'B': 2, 'C': None}
1           2        1          3     {'A': 3, 'B': 4, 'C': 5}
2           1        2          6  {'A': 6, 'B': 7, 'C': None}
3           2        2        NaN                          NaN
Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • column (str) – The name of the column which should be extracted.
  • key (str) – Key that should be extracted.
  • new_column (str) – Name of the new column. By default, column will be applied as prefix to key.
  • separator (str) – The separator between column and key if new_column is not specified.
Returns:

The extracted DataFrame

Return type:

DataFrame

pandas_extras.transformations.extract_dictionary(dataframe, column, key_list=None, prefix=None, separator='.')[source]

Extract values of keys in key_list into separate columns.

>>> df = DataFrame({
...    'trial_num': [1, 2, 1, 2],
...    'subject': [1, 1, 2, 2],
...    'samples': [
...        {'A': 1, 'B': 2, 'C': None},
...        {'A': 3, 'B': 4, 'C': 5},
...        {'A': 6, 'B': 7, 'C': None},
...        None,
...    ]
...})
>>>df.pipe(extract_dictionary, 'samples', key_list=('A', 'B'))
    trial_num  subject  samples.A  samples.B
0           1        1          1          2
1           2        1          3          4
2           1        2          6          7
3           2        2        NaN        NaN

Warning

column will be dropped from the DataFrame.

Parameters:
  • dataframe (DataFrame) – The DataFrame object to work on.
  • column (str) – The name of the column which should be extracted.
  • key_list (list) – Collection of keys that should be extracted. The new column names will be created from the key names.
  • prefix (str) – Prefix for new column names. By default, column will be applied as prefix.
  • separator (str) – The separator between the prefix and the key name for new column names.
Returns:

The extracted DataFrame

Return type:

DataFrame

pandas_extras.transformations.merge_columns(dataframe, col_header_list, new_column_name, keep=None, aggr=None)[source]

Add a new column or modify an existing one in dataframe called new_column_name by iterating over the rows and select the proper notnull element from the values of col_header_list columns in the given row if keep is filled OR call the aggr function with the values of col_header_list. Only one of (keep, aggr) can be filled.

Parameters:
  • dataframe – the pandas.DataFrame object to modify
  • col_header_list – list of the names of the headers to merge
  • new_column_name (str) – the name of the new column, if it already exists the operation will overwrite it
  • keep (str) – Specify whether the first or the last proper value is needed. values: first and last as string.
  • aggr – Callable function which will get the values of col_header_list as parameter. The return value of this function will be the value in new_column_name
Returns:

The merged DataFrame

Return type:

DataFrame