Python Script
The Python script operator is a built-in notebook that can be used to write Python snippets to manipulate data using Pandas' functionalities (learn more about Pandas here.) The operator provides an interface to a Python run-time environment that can be exploited in its whole potential. You can write a Python snippet by dragging the Python script operator onto the canvas, adding code in the available space, and running it by pressing the play button in the bottom left. A new dataframe will be output in the bottom right.
Using Pandas
Pandas is automatically available for use in the Python operator. Invoke Pandas functions using the identifier pd
.
Working With One Dataframe
The input dataframe is accessible using the name df
. You can then manipulate the dataframe using normal Pandas syntax, e.g. df['time']
. The output dataframe will be df
after any changes, so there is no need to return anything.
In the following example, we use the Python operator to add a column named large city
to our dataframe to indicate whether a city has a population of more than 7000.
Working With Two Dataframes
To access the input dataframes, use dfs[0]
and dfs[1]
. As there are now two dataframes involved, a return dataframe must be specified to produce an output.
In the following example, we use the Python operator to combine two dataframes and compute a new column, Score_change
, based on their respective Score
columns.