Skip to main content

Transpose

The transpose operator is a powerful way to turn your data into features for analysis and modeling. For instance, imagine that you have a sales log for customers, or an error log for some type of monitoring service. Transpose is how we would create features for transactions that happened, by summarizing metrics about the event occurrences per unit of measure.

Transpose has three parts, and one optional parameter to set:

  • Column: This operator will take the most frequent distinct values of this selected column, and turn them into many columns.
    • If you select a "product" column, the resulting table will have "ProductA", "ProductB", "ProductC"... each filled with the value as specified in attributes below
  • Group By: These are the unique values that we want to group by, that we want to understand the aggregate information for
    • Perhaps we would want to use "Customer" as the Group By
  • Attributes: These are the values we want to aggregate over
    • We might select "SalesAmount" and "Sum" to get the sales per customer, with a separate column for each distinct product.
  • Rows to analyze / Warn: When the number of unique codes is very high, Einblick will use a subset of rows as a sample to determine what the right output columns are based on highest frequency. If the final most frequently occurring values of the "Column" are different from what we selected, we will either throw an error or warn you based on this parameter

Example

  • We have a transaction log of sales, by item and customer
  • We want to understand for each customer, how much money and how many times was every given SKU purchased
  • We use transpose to create a table with a column remaining for customer ID, and then a set of columns with a summarized total $ spent and count of occurrences that the given customer purchased for top selling SKUs.

Advanced Hint: By transforming a column into 1/0, and using max aggregation, you can create flags for whether a certain value existed or not - e.g. create a column for "bought product X or not" rather than "how much product X did they buy"