Skip to main content

Intro

Meta Select is a powerful tile inside Magic ETL that can perform any function of the Select Columns, Alter Columns, and Add Formula tiles without repetitive configuration. It can also adapt to schema changes over time without needing to be reconfigured. The Meta Select tile’s operations include, but are not limited to:
  • Renaming columns
  • Changing data types
  • Calculating new columns
Most of the tile’s required configuration comes from one of its two input tables, called the Configuration Input. The Meta Select tile interacts with its Configuration Input in precisely the same way as the Add Formula tile operates. The Add Formula tile has the shape of a table with two “columns”—the name and the formula—and each “row” is a column to be added. The rows of data in the Configuration Input are treated as descriptions of columns to calculate from the Meta Select tile’s other input table, the Data Input.
This article provides four examples of how to use the Meta Select tile, including in conjunction with the Get Schema tile. You can follow along with the examples in Magic ETL, referring to the provided graphs frequently and looking at the intermediate result tables of all the tiles. In each example, code blocks and images display the DataFlow and the tile configuration. Unless otherwise mentioned, all Meta Select tiles are configured to use the columns name, expr, and type from their Configuration Input, and all Get Schema tiles are configured with those same column names.

Example 1: Do Nothing With the Meta Select Tile

The first example of Meta Select’s capabilities will demonstrate it doing nothing at all. Rather, we will use the Get Schema tile to select every column from our original input unchanged.
The Get Schema tile exists primarily to support the Meta Select tile, but it does have other applications. The Get Schema tile accepts one input and uses it to create a table that describes that input’s columns or schema.
Note: For MySQL users, the Get Schema tile is comparable to querying INFORMATION_SCHEMA.
The Get Schema tile’s result table has three columns, which contain the following:
  1. Input column names
  2. Input column types
  3. The expressions or formulas used to select each column. These are the names in backquotes.
In this example, the output of the Get Schema tile becomes the Configuration Input of the Meta Select tile, and the same initial input to the Get Schema tile is also passed as input to the Meta Select tile. This pattern appears in almost all of our examples.
Here, the Get Schema tile’s output is unchanged, which causes the Meta Select tile to produce an output table identical to its Data Input. In the rest of the examples, we will add tiles that modify the Get Schema tile’s output, which will change the output for the Meta Select tile.

How the Meta Select Tile Interprets Its Configuration Input

This example focuses on how the Meta Select tile interprets its Configuration Input, which in this case is the output table of the Get Schema tile. The Meta Select tile’s Configuration Input requires two columns: one column containing the column names to be produced from the Data Input, and the other containing the expressions that will be used to calculate them. You can have a third, optional column containing the desired data types of the result columns. This image displays an example of a Get Schema output DataSet with these three columns.
You can see the relationship between the tiles this way: the Get Schema tile produces a “meta” table describing its input, and the Meta Select tile applies the same sort of “meta” table to its Data Input. In the next example, we’ll modify the Get Schema tile’s results before connecting it to the Meta Select tile.

Example 2: Rename, Remove, and Change Type

For our next example, let’s examine the following table. We have three goals:
  1. Currently, every column name has the odbc prefix, is in lowercase, and uses underscores instead of spaces. The goal — Reformat the column names. Example: odbc\_hired\_date becomes Hired Date.
  2. The current date columns include date in the column name, but each column’s data type is String. The goal — Convert these date columns to have a Date data type.
  3. Currently, there are many unwanted columns with names like odbc\_00012. The goal — Remove all of these columns from the DataSet.
It is possible to do all of these things with other tiles in Magic ETL, such as with a single Alter Columns tile. However, only the Meta Select tile allows us to do these tasks without spending as much time as there are columns to change. For example, if the input table had 400 columns instead of 4, configuring an Alter Columns tile to make all these changes would take about 100 times as long (assuming the web browser doesn’t crash in the process). The Meta Select tile uses the same configuration regardless of column count. Additionally, the Meta Select configuration will keep working even after a new column is added to the input DataSet, if the column meets one of our requirements. (The Alter Columns tile would require changing the DataFlow whenever the input schema changes.) We can complete our first two goals—changing the column names and changing the data type—by adding an Add Formula tile after the Get Schema tile. We’ll write a formula that restates the name column and another that restates type.
To reformat column names, use this formula: INITCAP (REPLACE (REPLACE (`name`, 'odbc_', '' ), '_', ' ' )) To change the data type, use this one: CASE WHEN STR_ENDS_WITH(`name`, 'Date' ) THEN 'DATE' ELSE `type` END
Note: Keep in mind that columns restated in an Add Formula tile are restated before the evaluation of subsequent formulas in the same tile. This is why we are looking for ‘Date’ instead of ‘_date’.
To complete our third goal—removing the columns whose names began with ‘odbc_0’ —we’ll use a Filter Rows tile. (After using the Add Formula tile, the name begins with only 0.) Placing the Filter Rows tile after the Add Formula tile and before the Meta Select tile, the Filter Rows tile has just one expression condition: NOT STR_STARTS_WITH(`name`, '0' ) Let’s preview the DataSet, take a look at the result of the Filter Rows tile, and compare it with the result of the Get Schema tile, above. We have successfully completed our three goals:
  1. The name column now contains our desired names.
  2. The type column now contains the correct data types.
  3. The expr column contains valid references to the columns from the original input that correspond to our desired names.
Now that we have our configuration table, we can connect both our original input tile and the Filter Rows tile to the Meta Select tile. As you configure the Meta Selec tile, make sure that Filter Rows is selected as the Configuration Input while the input data tile is the Data Input. To see the results, run a preview of the DataFlow.

Example 3: Work With Expression

As you can see from the last example, configuring the Meta Select tile involves little work with the tile itself. The work happens upstream as we build the Configuration Input table. To review, the Meta Select tile’s Configuration Input table can be any table with at least two columns that can be mapped to these designations:
  • A names column (data type: STRING ) that contains the names of the columns to be created. It must not contain null or empty values.
  • An expressions column (data type: EXPRESSION ) that contains the expressions or formulas to be evaluated against each row of the input data to populate the Meta Select tile’s result rows. It must not contain null or empty values. These expressions must be valid under the Data Input’s schema. This means that they should only contain references to columns that exist in the Data Input.
Optionally, the Configuration Input table may contain a third column:
  • A types column (data type: STRING ) that contains the desired data types of the columns to be created. It must not contain null or empty values. Valid values are STRING, LONG, DOUBLE, DECIMAL, DATE, DATETIME, BOOLEAN, DURATION, and PERIOD.
You can name these columns anything, but in this tutorial, we use the names name, expr, and type.
Note: The Configuration Input can contain other columns, but the Meta Select tile will ignore them.
In the last example, we modified the name and type columns and filtered out some rows from the Configuration Input, which caused the Meta Select tile to rename, cast, and drop columns from its Data Input. In this example, we will be modifying the expr column. The expr column determines the contents of the columns, so modifying it will cause the Meta Select tile to modify the contents of the Data Input’s columns.

The Expression Type

Unlike the name and type columns, the expr column is not a STRING type, but an EXPRESSION. EXPRESSION is a data type introduced alongside the Meta Select and Get Schema tiles. An EXPRESSION is a description of how to compute a value given an input row, or how to compute a column given an input table. EXPRESSIONs appear in the Add Formula, Filter Rows, and Group By tiles. They also appear in a SQL SELECT query in multiple places.
Note: In the SQL query SELECT UPPER(name) AS name FROM t, we have one expression: UPPER(name).
The Get Schema tile provides us with a starting point for our expr column. The expressions in the Get Schema tile’s output are some of the simplest possible expressions: plain references to columns. Despite the simplicity, it is invaluable as a starting point because the primary way that we will construct new Expressions is by templating. Templating is the process of creating new expressions by combining existing expressions with a template. A template is an expression with placeholders in it where other expressions will be inserted. You can create a template with a function in the Add Formula tile, called EXPRESS(). Here is an example: EXPRESS(NULLIF (@ 1, '' ), `expr`) Here, the template is NULLIF(@1, ''). The placeholder is @1, and the expression to replace the placeholder is expr. When this formula is evaluated against a row where expr is, say, odbc\_first\_name, its result will be this expression: NULLIF (odbc\_first\_name, ”)
Tip: “Formula” and “expression” are interchangeable terms, but we use “formula” when referring to something evaluated in the Add Formula tile and “expression” when referring to a column or value of type EXPRESSION to be evaluated later by the Meta Select tile.
With this knowledge, we’re ready to proceed with the example itself.

Zero Every Null With the Meta Select Tile

Here we have an input data table with a variety of numeric columns (integers, floating point numbers, and decimals); all of them contain nulls. For our purposes, these nulls are unwanted and disruptive, and we would rather replace them with zeroes. We could accomplish this with a simple Add Formula tile, but by using the Meta Select tile, we avoid repeating ourselves across what might be dozens of columns and create a solution that is robust against input schema changes.
This graph is identical to the graph from Example #2, except that we don’t need the Filter Rows tile. The only tile we need to change is the Add Formula tile, which will consist of just one formula, which restates the expr column: CASE `type` WHEN 'LONG' THEN EXPRESS(IFNULL(@ 1, 0 ), `expr`) WHEN 'DOUBLE' THEN EXPRESS(IFNULL(@ 1, 0.0 ), `expr`) WHEN 'DECIMAL' THEN EXPRESS(IFNULL(@ 1, DECIMAL (0 )), `expr`) ELSE `expr` END This formula has four different strategies for generating an expression, and it picks a strategy based on the value of the type column. When the type is numeric, we use an expression template to create an IFNULL() expression with the appropriate zero-value for that type, and supply expr to the template to replace the placeholder. When the type is anything other than numeric, we fall back to plain expr, which will be a simple column reference.
Note: Using a different expression for each numerical type in this example is not strictly necessary. IFNULL(@1, 0) would work for all three types, due to union-type behavior: 0 is an integer, but a union of integers and floating point numbers results in a floating point number, and a union of integers and decimals results in a decimal.
We strongly recommend running a preview of the DataFlow to review the results, paying particular attention to the outputs of both the Add Formula and Meta Select tiles.

Example 4: One-Hot Encoding

This final example goes beyond modifying and reducing the output of a Get Schema tile to produce the Meta Select tile’s Configuration Input. This time, we generate additional Configuration Input rows and union them to the Get Schema tile in order to add a variable number of calculated columns to the final result. One-hot encoding is an important step in many statistical analyses. It begins with a “categorical column”, which is a name for a non-numeric column (usually a STRING ) with relatively few unique values. Country of Origin, Company Name, and Department are common examples. The goal is to transform the categorical column into multiple columns, as many as there are unique values. Each of these will be a simple true/false or 1/0 (BOOLEAN ). For each row, only one column will be true.
What is one-hot encoding? One-hot encoding is a method where categorical data is converted into a binary format. For each row, only one column is true or “hot,” as in “one-hot.”
Our approach involves branching the input in three directions.
  • The first branch goes to a Get Schema tile to produce the Meta Select tile configuration rows for selecting the original columns, unchanged.
  • The second branch generates new Meta Select tile configuration rows based on the unique values in the country\_code column.
  • The third branch goes directly to the Meta Select tile as its Data Input.
The first two branches are combined with an Append Rows tile to produce the completed Configuration Input for the Meta Select tile.
Let’s focus on the second branch, which begins with the Remove Duplicates tile. The purpose of the Remove Duplicates tile here is to reduce the input data to one row per unique value of country\_code. We configure the Remove Duplicates to use country\_code alone as the unique key. Next, we need to generate our three Configuration Input columns, using names that will match those from the Get Schema tile: name, expr, and type. We’ll use an Add Formula tile with the following formula for name: CONCAT ('country_code_', `country_code`)
For expr: EXPRESS(`country_code` = @ 1, EXPR_VALUE(`country_code`)) And for type: 'BOOLEAN'
The formulas for name and type don’t contain anything we haven’t seen before, but expr involves a new function: EXPR_VALUE(). Consider the result table at this point in the graph (Add Formula). Note how the reference to country\_code from the template has remained unchanged; the result expression still references that column on the left side of the equals. However, the placeholder has been populated with the value of country\_code in the Add Formula tile’s input. EXPR_VALUE() is another function that produces an expression, like EXPRESS(). It takes only one argument, which is evaluated immediately. This result is then converted into an expression literal. A literal is another fundamental type of expression, like a column reference. A literal is an expression that always evaluates to the same value. All of these expressions are literals: 'US', 0, 1.5, and NULL.
It doesn’t matter how the Append Rows tile is configured, but “Only include shared columns” might make the most sense (since this will reduce the table down to just the three columns that matter: name, expr, and type). As always, preview the whole flow and probe the output at every tile to ensure you understand what is happening at every stage.

Other Functions for Working With Expressions

Here are a few more functions for working with expressions that were not covered in any of these examples:
  1. EXPR_COLUMN() takes a single string argument and converts it to an expression that references the column with that name.
  2. EXPR_PARSE() converts a string to an expression by parsing it in the same manner that expressions are parsed anywhere else in Magic ETL, such as in the Add Formula or SQL Select tiles.
  3. EXPR_FORMAT() converts an expression to a string.
  4. EXPR_PUSH() is similar to EXPRESS(), except that its first argument is evaluated immediately and the result is expected to be an expression itself. This expression is then used as the template.
    Ask the Community: The Domo Community Forums are a great resource for asking questions and sharing ideas with Domo employees and fellow Domo users about the Meta Select and Get Schema tiles.

Known Limitations of the Meta Select Tile

Aggregate functions and window functions are not supported.