Skip to main content

Expression

The expression operator, as its name suggests, allows you to apply a series of expressions to a dataframe. Each expression produces a new column according to the formula you specify.

Expressions use a syntax very similar to Python 3, explained below in more detail. While any valid Python 3 formula is supported, using the set of specified operations and functions on that page will allow Einblick to translate an expression into a highly optimized, performant form.

Multiple expressions can be used and chained together within a single expression operator. An example expression operator is shown below.

Example expression

Expression Syntax

The expression operator supports a syntax generally equivalent to Python 3. Within this syntax, there are a number of supported operations and functions that can be heavily optimized, resulting in greatly improved performance (for further information, consult the gandiva::Expression specification in Gandiva, an LLVM component of Apache Arrow.) By adhering to this specification, Einblick will be able to translate your expressions into an optimized version. If the expression cannot be translated, it will be evaluated using a standard Python 3 interpreter.

Column Names

Each column can be used directly with its name, e.g. age / 10. However, for columns containing whitespace, e.g. Commercial service exports (current US dollar), enclose the column name in brackets (e.g. [Commercial service exports (current US dollar )]). If it already contains the characters [ or ], escape the name by using double brackets.

Arithmetic Operations

These include +, -, *, /, and **, which only work for numerical types, with the exception of +, which can also be used to add an integer number of days to a timestamp or concatenate two strings.

Comparison Operations

These include >, <, >=, and <=, which work for all types. Note that when you want to compare two timestamps, you must use a string literal, e.g. Year >= '2008-01-01 04:32:00' and Year <= '2014-01-01'.

Bitwise Operations

These include

  • & (bitwise and)
  • | (bitwise or)
  • ~ (bitwise not)

Ternary Operations

Python ternary statements are supported, e.g. "nice" if is_nice else "not nice".

Mathematical Functions

These include:

  • pow, e.g. pow(5, 3)
  • log, log10, e.g., log(age) or log([age])
  • abs, e.g., abs(-1)
  • ceil, floor, round, e.g., ceil(5.5)
  • random() which returns a random double between [0, 1] (with uniform distribution), e.g. random() < 0.2
  • min and max, e.g., max(3,4,5), which can take any number of inputs (except 0 or 1).

String Functions

These include:

  • upper, lower, e.g., name.upper() == 'Shang'
  • like which applies wild-card matching on the string, e.g., month.like('%uary%')
  • startsWith, endsWith, e.g., name.startsWith('Shang')
  • length which returns the length of a string, e.g., name.length()
  • concat which concatenates two strings together, e.g., first_name.concat(last_name) (Can also be done with +)
  • substr which returns the substring, e.g., name.substr(1) or name.substr(2,1) (offset is 1)
  • contains which returns whether a string contains a given substring
  • containsAny and containsAll, e.g., name.containsAny('of', 'de'). These can accept any number of arguments except zero.
  • replace which replaces all instances of one or more substrings with another string, e.g., sex.replace('Unknown', 'Not Specified'). Note that if multiple substrings are replaced, the operations will be applied one after another; 'example'.replace('pl','amin','in') will become 'exine', not 'examine'.
  • locate which finds a substring within a string, e.g., name.locate(' '). Pass in an integer to start the search at that character (e.g., name.locate(' ',5)).

Cast Functions

These include:

  • castFLOAT4 which casts numerical types (e.g., integers) to 32-bit float
  • castFLOAT8 which casts numerical types (e.g., integers) to 64-bit float
  • castTIMESTAMP which casts a string to a time stamp

Datetime Functions

These include

  • extractYear, extractMonth, extractWeek, extractDay, extractHour, extractMinute, extractSecond
  • extractDoy extracts the day of the year
  • extractDow extracts the day of the week
  • addYears, addQuarters, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds

Other Functions

  • isnull and isnotnull