The expression operator, as its name suggests, allows you to apply a series of expressions to a dataframe. Each expression produces a new column according to the formula you specify.
Expressions use a syntax very similar to Python 3, explained below in more detail. While any valid Python 3 formula is supported, using the set of specified operations and functions on that page will allow Einblick to translate an expression into a highly optimized, performant form.
Multiple expressions can be used and chained together within a single expression operator. An example expression operator is shown below.
The expression operator supports a syntax generally equivalent to Python 3. Within this syntax, there are a number of supported operations and functions that can be heavily optimized, resulting in greatly improved performance (for further information, consult the
gandiva::Expression specification in Gandiva, an LLVM component of Apache Arrow.) By adhering to this specification, Einblick will be able to translate your expressions into an optimized version. If the expression cannot be translated, it will be evaluated using a standard Python 3 interpreter.
Each column can be used directly with its name, e.g.
age / 10. However, for columns containing whitespace, e.g.
Commercial service exports (current US dollar), enclose the column name in brackets (e.g.
[Commercial service exports (current US dollar )]). If it already contains the characters
], escape the name by using double brackets.
**, which only work for numerical types, with the exception of
+, which can also be used to add an integer number of days to a timestamp or concatenate two strings.
<=, which work for all types. Note that when you want to compare two timestamps, you must use a string literal, e.g.
Year >= '2008-01-01 04:32:00' and Year <= '2014-01-01'.
Python ternary statements are supported, e.g.
"nice" if is_nice else "not nice".
random()which returns a random double between
[0, 1](with uniform distribution), e.g.
random() < 0.2
max(3,4,5), which can take any number of inputs (except 0 or 1).
name.upper() == 'Shang'
likewhich applies wild-card matching on the string, e.g.,
lengthwhich returns the length of a string, e.g.,
concatwhich concatenates two strings together, e.g.,
first_name.concat(last_name)(Can also be done with
substrwhich returns the substring, e.g.,
name.substr(2,1)(offset is 1)
containswhich returns whether a string contains a given substring
name.containsAny('of', 'de'). These can accept any number of arguments except zero.
replacewhich replaces all instances of one or more substrings with another string, e.g.,
sex.replace('Unknown', 'Not Specified'). Note that if multiple substrings are replaced, the operations will be applied one after another;
locatewhich finds a substring within a string, e.g.,
name.locate(' '). Pass in an integer to start the search at that character (e.g.,
castFLOAT4which casts numerical types (e.g., integers) to 32-bit float
castFLOAT8which casts numerical types (e.g., integers) to 64-bit float
castTIMESTAMPwhich casts a string to a time stamp
extractDoyextracts the day of the year
extractDowextracts the day of the week