Skip to main content

Intro

There are two available tiles under Aggregate in Magic ETL:
  • Group By
  • Rank & Window
Aggregate_Tab.png
Learn how to create a Magic ETL DataFlow.

Group By Tile

Group_By.png
The Group By tile lets you aggregate values from multiple columns of data into one column. For example, you could calculate the average sales per product or find the number of yellow shirts in stock.

Configure Group By Tile

Follow these steps to configure the Group By tile:
  1. Inside Magic ETL, drag a Group By tile to the canvas from the Aggregate section in the left panel and connect it to your DataFlow. The tile editor opens below the canvas.
    group by tile on canvas.jpg
  2. (Optional) Rename the tile by selecting Edit in the tile editor, then enter the name you want.
  3. In the Configuration tab of the tile editor, select the column(s) to include in the grouping or add all columns.
    Screenshot 2024-03-05 at 12.17.41 PM.png
  4. Enter a name for the new aggregated column.
  5. Define how to aggregate the new column. Learn about the aggregation types below.
  6. (Optional) You can choose to add another new column or add a formula.
    1. If you choose to add a new column, repeat steps 4–5.
    2. If you choose to add a formula, you can enter the formula or choose Open the formula editor to use the AI SQL Assistant to create your formula. Validate your formula and save and close to return to the canvas.
      Use the AI SQL Assistant: The AI SQL Assistant is available in the expanded formula editor. Use the prompt field at the bottom of the Calculation field to enter your query in natural language. The SQL appears above in the Calculation field.
Tip: You can preview the data transform by selecting Run Preview. When the transform is complete, the results are available in the Preview tab of the tile editor.

Aggregation Types

Type

Description

Data Types Available For

Count

Returns the number of non-null values in the column.

Decimal, Whole Number, Text, Date

Count including nulls

Returns the number of all values in the column, including nulls.

Decimal, Whole Number, Text, Date

Count distinct

Returns the number of unique values in the column.

Decimal, Whole Number, Text, Date

First non-null value

Returns the first non-null value.

Decimal, Whole Number, Text, Date

Last non-null value

Returns the last non-null value.

Decimal, Whole Number, Text, Date

First value

Returns the first value (including null).

Decimal, Whole Number, Text, Date

Last value

Returns the last value (including null).

Decimal, Whole Number, Text, Date

Sum

Sums all of the values in the column.

Decimal, Whole Number

Average

Returns the average of all of the values in the column.

Decimal, Whole Number

Median

Returns the median of all of the values in the column.

Decimal, Whole Number

Minimum

Returns the minimum value in the column.

Decimal, Whole Number, Date

Maximum

Returns the maximum value in the column.

Decimal, Whole Number, Date

Standard deviation

Returns the standard deviation for the values in the column.

Decimal, Whole Number

Combine strings separated by,

Combines all of the text values in the column, separating them with commas.

Text

Combine distinct strings separated by,

Combines all of the DISTINCT text values in the column, separating them with commas.

Text

Use Group By in Magic ETL (video)

Rank & Window Tile

Rank_and_Window.png
The Rank & Window tile lets you create new columns by applying any of several rank and window functions to columns. For all of these actions, you first define your function, then specify the column the function is to be ordered by, and finally indicate whether the results in the new column should be in ascending or descending order. Rank & Window functions are categorized into three types: Ranking, Framed, and Offset. For more information about all of the functions described in this section, see the AWS docs .

Understanding Partitions

All functions also allow you to optionally specify a column to use as a partition. For example, if you had a series column called State, selecting State as the partition would cause the values in your new column to be divided by state. This is shown in the following example, wherein values in the Amount column are ranked in descending order and partitioned by state:
rank_and_window_partition.png
Note that the two amounts for Connecticut (“CT”) are ranked first, followed by the five amounts for Delaware (“DE”), and so on.

Ranking Functions

Ranking functions derive a ranking number for each value in a selected column and display it in a new column. There are three available ranking functions: Rank, Dense Rank, and Row Number.

Rank

The Rank function assigns a rank number to each value in the selected column. If any cells in the column contain the same value, those cells are given the same ranking number, and a “gap” appears in the numbering depending on how many numbers were skipped. For example, if the first three cells in the column had a value of 100 and the next cell had a value of 200 with the values sorted in ascending order, the first three cells would be ranked “1” and the fourth cell would be ranked “4.” The following screenshot shows an example of this:
rank_and_window_rank.png
Because the first three states have the same $ value, all are ranked as 1. Rank numbers 2 and 3 are then skipped. The next five states also have the same value, and are all ranked as 4, and so on. Follow these steps to configure the Rank function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column that will contain the ranking results (such as Rank).
  5. In the Select function type menu, select Rank.
  6. Select Apply in the top right corner of the tile editor. Three new steps appear.
  7. In step 2, select the column you want to rank. (A value column is recommended.)
  8. In step 3, select the order in which to rank the values in the column.
  9. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this section.

Dense Rank

The Dense Rank function is the same as the Rank function, with one important difference. If any cells in the column contain the same value, those cells are given the same ranking number, as in a Rank function; however, the numbering continues as normal to the next cell—this means there is no “gap.” For example, if the first three cells in the column had a value of 100 and the next cell had a value of 200 with the values sorted in ascending order, the first three cells would be ranked “1” and the fourth cell would be ranked “2.” The following screenshot shows an example of this:
rank_and_window_dense_rank.png
Because the first three states have the same $ value, all are ranked as 1. The next five states also have the same $ value and are ranked as 2, and so on. Follow these steps to configure the Dense Rank function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column that will contain the ranking results (such as Rank).
  5. In the Select function type menu, select Dense Rank.
  6. Select Apply in the top right corner of the tile editor. Three new steps appear.
  7. In step 2, select the column you want to rank. (A value column is recommended.)
  8. In step 3, select the order in which to rank the values in the column.
  9. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this section.

Row Number

The Row Number function returns the row numbers of all values in the selected column. Note that when partitioning is used, rows take the number of their row within the partition group, not necessarily the row number of the DataSet. This is shown in the following image, in which the row numbering derived from the Row Number function restarts with each new partition. Thus, beginning with the displayed row 4, the derived row numbers are not the same as the DataSet row numbers.
rank_and_window_row_numbers.png
Follow these steps to configure the Row Numbers function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column that will contain the ranking results (such as Row Number).
  5. In the Select function type menu, select Row Number.
  6. Select Apply in the top right corner of the tile editor. Three new steps appear.
  7. In step 2, select the column you want to derive row numbers for.
  8. In step 3, select the order in which to sort the values in the column.
  9. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this page.

Framed Functions

Framed functions take a mathematical function and apply it to a cell in a column, along with a specified number of cells before it (Preceding) and after it (Following). The derived values appear in a new column. The following simple example shows how this works. For this example, the user has selected the Sum function and has indicated a Preceding value of 1 and a Following value of 2.
rank_and_window_framing_example.png
In the example, each value in the Original Value column is added to the one cell above it and the two cells below it, and the result appears in the same row in the Derived Value column. So for row 5, 3 is added to 1 (in row 4), 5 (in row 6), and 1 (in row 7). The total—10—appears in the Derived Value column in row 5. If there are not enough values above or below a given value to include in the calculation, these are omitted. For example, for row 7 there is only one Following value, so the equation simply becomes 5 + 1 + 2 = 8. When partitioning is applied, partitions are honored in functions. In the following example, the user has selected Sum and indicated a Preceding value of 1 and a Following value of 2, just as in the previous example. However, they also designate the Class column as a partition.
rank_and_window_framing_example_partition.png
Because of the row grouping, due to the partition, some of the values are summed differently. For example, in row 4, only one Following value is available because the partition separates all “B” values into another group. So the equation for this row becomes 2 + 1 + 3 = 6. Likewise, for row 6, only the two Following values are available because of the partition; therefore, 5 + 1 + 2 = 8. For an explanation of partitioning, see Understanding Partitions at the top of this section.

Average

The Average function takes the average of a given cell and its indicated Preceding and Following values. In the following example, the user has applied the Average function to the Amount column, set Preceding and Following values of 2, and set the Cust column as a partition.
rank_and_window_average.png
Follow these steps to configure the Average function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column containing the derived values. We suggest picking a name indicating the number of Preceding and Following values, such as Average (2P and 2F).
  5. In the Select function type menu, select Average. Two new menus appear after you select your function type.
  6. In the Select column menu, select the column with values to be averaged.
  7. In the Preceding and Following fields, enter the number of Preceding and Following values you want to include in your averages.
  8. Select Apply in the top right corner of the tile editor. Three new steps appear.
  9. In step 2, select the column you want your order to be based on.
  10. In step 3, select the order in which you want to sort the values in the new column.
  11. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this section.

Count

The Count function returns a count of a given cell and its indicated Preceding and Following values. In the following example, the user has applied the Count function to the Amount column, set Preceding and Following values of 2, and set the Cust column as a partition.
rank_and_window_count.png
Follow these steps to configure the Count function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column containing the derived values. We suggest you pick a name indicating the number of Preceding and Following values, such as Count (2P and 2F).
  5. In the Select function type menu, select Count. Two new menus appear after you select your function type.
  6. In the Select column menu, select the column with values to be counted.
  7. In the Preceding and Following fields, enter the number of Preceding and Following values you want to include in your counts.
  8. Select Apply in the top right corner of the tile editor. Three new steps appear.
  9. In step 2, select the column you want your order to be based on.
  10. In step 3, select the order in which you want to sort the values in the new column.
  11. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this section.

Sum

The Sum function returns the sum of a given cell with its indicated Preceding and Following values. In the following example, the user has applied the Sum function to the Amount column, set Preceding and Following values of 2, and set the Cust column as a partition.
rank_and_window_sum.png
Follow these steps to configure the Sum function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column containing the derived values. We suggest picking a name indicating the number of Preceding and Following values, such as Sum (2P and 2F).
  5. In the Select function type menu, select Sum. Two new menus appear after you select your function type.
  6. In the Select column menu, select the column with values to be summed.
  7. In the Preceding and Following fields, enter the number of Preceding and Following values you want to include in your totals.
  8. Select Apply in the top right corner of the tile editor. Three new steps appear.
  9. In step 2, select the column you want your order to be based on.
  10. In step 3, select the order in which you want to sort the values in the new column.
  11. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this page.

Offset Functions

Offset functions add a new column to your DataSet in which values are the same as those in a selected column but offset by a specified number of rows. Offset functions come in two types:
  • Lag function — The values in the new column are offset after the values in the original column.
  • Lead function — The values in the new column are offset before the values in the original column.
Partitions are honored in both Lag and Lead functions. See Understanding Partitions above for more information.

Lag

In a Lag function, the values in your new column follow the values in your original column after a specified offset. In the following example, a Lag of 3 has been applied, and the Cust column has been added as a partition.
rank_and_window_lag.png
Note that an offset of 3 rows has been inserted between the values in the Amount column and the matching values in the Lag column. Also, because of the partition, rows 13 to 15 are the first 3 rows in the “Leo, Inc.” grouping, so their cells in the Lag column appear blank. Follow these steps to configure a Lag function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, rename the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
    add function.jpg
  4. Add a name for the new column containing the derived values. We suggest choosing a name that indicates the offset value of the lag.
  5. In the Select function type menu, select Lag. Two new menus appear after you select your function type.
  6. In the Select column menu, select the column with values that will appear in the new column.
  7. In the What rows should be included field, enter the desired offset value.
    Screenshot 2024-03-06 at 10.20.22 AM.png
  8. Select Apply in the top right corner of the tile editor. Three new steps appear.
  9. In Step 2 of the dialog, select the column you want your order to be based on.
  10. In Step 3 of the dialog, select the order you want the values in the new column to be sorted in.
  11. (Optional) In Step 4 of the dialog, select the column to use as a partition, if any. (For an explanation of partitioning, see Understanding Partitions at the top of this page.)

Lead

In a Lead function, the values in your new column precede the values in your original column after a specified offset. In the following example, a Lead of 3 has been applied, and the Cust column has been added as a partition.
rank_and_window_lead.png
Notice that an offset of 3 rows has been inserted between the values in the Amount column and the matching values in the Lead column. Also, because of the partition, rows 17 to 19 are the last 3 rows in the “Leo, Inc.” grouping, so their cells in the Lead column appear blank. Follow these steps to configure a Lead function:
  1. Select the Rank & Window tile on the canvas to open the tile editor.
  2. (Optional) At the top of the tile editor, reame the tile by selecting Edit and entering the name you want.
  3. Select Add Function.
  4. Add a name for the new column containing the derived values. We suggest choosing a name that indicates the offset value of the lead.
  5. In the Select function type dropdown, select Lead. Two new dropdowns appear after you select your function type.
  6. In the Select column menu, select the column with values that will appear in the new column.
  7. In the What rows should be included field, enter the desired offset value.
  8. Select Apply in the top right corner of the tile editor. Three new steps appear.
  9. In step 2, select the column to base the order of functions on.
  10. In step 3, select the order in which to sort the values: ascending or descending.
  11. (Optional) In step 4, select the column to use as a partition, if any. For an explanation of partitioning, see Understanding Partitions at the top of this page.

Rank & Window Functions in Magic ETL (video)