Intro
There are three tiles available in the DataSets tab of the Magic ETL interface:- Input DataSet
- Output DataSet
- Writeback

- Query DataSet
DataFlows and Personalized Data Permissions (PDP)
- Add an input DataSet to a DataFlow
- Add an output DataSet to a DataFlow
- Query DataSet tile
- Writeback tile
- Fixed input tile
- FAQ
Add an Input DataSet to a DataFlow

- Open Magic ETL.
-
From the left rail under DataSets, drag an Input DataSet tile to the canvas, opening the tile editor below the canvas.

-
In the tile editor, select Choose DataSet to select the input DataSet for the DataFlow.
Note: The maximum number of columns allowed in Magic ETL is 1500 columns.

-
In the Select a DataSet modal, find and choose the DataSet. To add more than one input DataSet, you must repeat steps 2–4 for every input DataSet.

-
(Optional) In the tile editor, go to the Data tab to see a preview of the input data.

Configure Input DataSet Tiles
View DataSet details: The Details tab of the tile editor includes information about the DataSet like its name, owner, update dates, and column names and types.
- Drag an Input DataSet tile onto the canvas to open the tile editor.
-
In the tile editor, select Choose DataSet. In the modal that displays, select either an existing DataSet or Create New to use a new DataSet.

-
In the Configuration tab of the tile editor, use the Data Selection dropdown to choose one of the following:

- All (default) — Loads all rows from the DataSet.
- New — Loads new rows added to the DataSet since its last successful execution. Note: If you choose this option, you must set the data update method to Append, not Replace.
-
Filter by batch — Allows you to choose which batches to load based on creation date or name (for partitions).
Choosing this option displays the Filter by batch area.
Choose a) whether batches are filtered based on when the data was processed or a custom expression and b) the date parameters. Example: Data processed (in the) last 7 days.

- (Optional) If your input DataSet is partitioned, you can expose the partition name using the Partition Column Name field.
-
In the Data Handling area, make any changes to column data types. This prevents any schema changes to the input data from breaking your DataFlow.

-
Use the
Transform settings to set the special handling for each column, including for Bad values and Null values in your column.


Output DataSet Tile

Configure Output DataSet Tiles
Follow these steps to configure an Output DataSet tile:- Make sure all the tiles in your DataFlow are connected and configured.
- In the left panel under DataSets, drag an Output DataSet tile to the canvas and connect it.
- In the tile editor, rename the tile. This is the name of the output DataSet that will display in the Data Center.
-
Use the Update Method dropdown to choose how to handle updated data. Learn about update methods in Domo.
View DataSet details: After the DataFlow has run at least once, the Details tab of the tile editor includes all information about the output DataSet.
- Select where to write your data by using the Add DataSet to Cloud dropdown.
- Enter the name of the Database where you want to store the data in your cloud.
- Enter the name of the Schema where you want to store the data in your database.
- Enter the name of the Table you want to use inside your schema.
Notes:
- The option to Add DataSet to Cloud is only available if you are using Cloud Amplifier and have an integration that be configured for writeback.
- The Database, Schema, and Table fields will be available if your Cloud Amplifier integration supports location parameters inside your cloud (only Snowflake for now, but coming soon to other engines).
Query DataSet Tile
You can use the Query DataSet tile to write a query against a DataSet as part of the input configuration process. This can increase efficiency in loading large tables, accessed through Cloud Amplifier, which may be delayed by resource limitations in the underlying Cloud Data Warehouse (CDW). Proper use of this tile may resolve error messages like the following:- Query partition key
- # loads
Configure the Query DataSet Tile
Follow these steps to configure the Query DataSet tile:- In Magic ETL, expand the DataSets section in the left sidebar and drag a Query DataSet tile to the canvas. Note: This is only available for Domo on Databricks users.
- In the tile configuration panel below the canvas, use the Choose DataSet dropdown to select a Databricks DataSet.
- Write the query that identifies the data you want to bring into the Magic ETL DataFlow.
Note: This query must be a select query only, and cannot use CTE (common table expressions).
-
Configure the
Query Partition Key. The recommended process for determining this for Databricks tables appears below. - Configure the # of Loads section?
Choose the Partition Key (Recommended Process for Databricks)
When choosing the partition key, the more evenly distributed the row count per column value, the better. For Databricks, we recommend choosing the partition key and defining the number of loads so that the largest batch is no larger than 4GB. Follow the steps below to do this:-
Find the overall size of a table in the Databricks UI or by querying as follows:
DESCRIBE DETAIL <catalog>.<schema>.<table> - Convert the sizeInBytes to GB by dividing by 2^30 = 1024^3.
-
Divide by the recommended 4GB to obtain a preliminary value for the number of necessary divisions, as follows:
divisions = (sizeInBytes / 2^30 ) / 4Note: This is only an intermediate calculation, which should not be interpreted as the desired number of loads. - Find the table’s total row count.
-
Estimate how many table rows make 4GB, the limit for the largest batch, using the following equation:
max_rows = row_count / divisions -
You can now begin evaluating which column to use as the partition key. Remember that the column must not contain any NULL values.
Find the number of distinct values present in a candidate column using the following formula:
SELECT COUNT(DISTINCT <column> ) AS distinct_count
FROM <table>;
You’ll use the value fordistinct_countin a later calculation. - Find out how skewed or lopsided the table would be if partitioned according to this candidate column by examining the largest row counts using this formula:
SELECT <column>, COUNT(*) AS row_count
FROM <table>
GROUP BY <column>
ORDER BY row_count DESC;
A batch consists of all rows associated with N distinct partition key values. In a worst-case scenario, the largest batch would be comprised of the partition key values with the N largest row counts. The running total is represented by this query:
WITH grouped_counts AS (
SELECT <column>, COUNT(*) AS row_count
FROM <table>
GROUP BY <column>
)
SELECT <column>, row_count, SUM(row_count )
OVER (
ORDER BY row_count DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_row_count
FROM grouped_counts
ORDER BY row_count DESC;
For example, let’s say a date <column> is being valued as a partition key candidate:
| <column> | row_count | cumulative_row_count |
| 2024-10-13 | 100 | 100 |
| 2024-02-21 | 80 | 180 |
| 2024-05-17 | 60 | 240 |
| 2024-08-19 | 50 | 290 |
| 2024-03-06 | 30 | 320 |
| 2024-02-10 | 20 | 340 |
| 2024-11-30 | 10 | 350 |
WITH grouped_counts AS (
SELECT <column>, COUNT(*) AS row_count
FROM <table>
GROUP BY <column>
),
ranked_counts AS (
SELECT <column>, row_count, SUM(row_count )
OVER (
ORDER BY row_count DESC
ROWS BETWEEN UNBOUNDED PRECEDING
xAND CURRENT ROW
) AS cumulative_row_count
FROM grouped_counts
)
SELECT COUNT(*) AS max_column_values
FROM ranked_counts
WHERE cumulative_row_count < max_rows;
Replace
max_rows with the value calculated in step 4, above. Then execute the preceding query to obtain an integer result: max_column_values.
-
If you decide to use this candidate column as the partition key, use the results from steps 6 and 7 to calculate the minimum value for the number of loads required to ensure that the largest load remains below 4GB, as follows, rounding up:
“# loads” >= distinct_count / max_column_valuesNote that the number of loads necessary to remain below 4GB will depend on the choice of column for the partition key. To get a smaller number of loads, choose a column where the values are more uniformly distributed across the table.
Writeback Tile

Configure Writeback Tile
Follow these steps to configure a Writeback tile:- Drag a Writeback tile to the canvas in place of an Input DataSet tile.
-
Rename the Writeback tile and then specify the DataSet name you want to output by selecting
Edit and entering a name and a description.
- (Optional) Select the Additional Options menu and select Update Method to either replace (default) or append data.
- Select Configure Writeback, then select the Writeback Connector you want to use.
-
Connect other tiles to the Writeback tile to complete your DataFlow and end with an Output DataSet tile.View Writeback tile details: After the DataFlow has run at least once, the Details tab of the tile editor includes all information about the Writeback output DataSet.
Fixed Input Tile
You can use this tile to validate transformations, prototype logic, demonstrate outputs with meaningful placeholder data, or build DataFlows when your input DataSet isn’t available. Find the Fixed Input tile under Datasets in the left panel of the Magic ETL authoring canvas.
Note: To use AI features in the Fixed Input tile, your instance must have AI Services enabled. If AI Services are not enabled, you can still enter data manually, paste values, or upload files.
Input Data Manually into Your Fixed Input Tile
To manually input data, follow these steps:- Drag the Fixed Input tile from the Datasets section to your canvas.
- Enter column names in the New Column header fields.
- Select the small text under the left side of each column name to set the data type.
- Enter data directly into the table.
-
(Optional) Copy and paste data from another source into the rows and columns.

Upload a File to Your Fixed Input Tile
To upload a file, follow these steps:- Drag the Fixed Input tile from Datasets to your canvas.
-
Select the Generate data / Upload button in the lower-right corner.

-
In the modal, go to the Upload File tab.

- Drag and drop your file or select the Upload File button.
- Specify whether the file includes header rows, then select Import. Your data appears in the configuration panel. You can add columns, edit values, or use it as-is.

Use AI to Generate Sample Data for Your Fixed Input Tile
To generate sample data using AI, follow the steps below.Alternatively, you can generate AI data for a single column by selecting the sparkle icon
in that column’s header. Enter a short description of the values and the number of rows. This method is faster than generating an entire DataSet but may still take a few seconds.
- Drag the Fixed Input tile to your canvas.
- Select the Generate data / Upload button.
- In the Generate Data tab, enter a short prompt describing the data (for example: “50 customer transactions with name, date, and amount”).
-
Choose the number of rows to generate (recommended: 50).

- Select Generate Data.
-
Wait a few moments for the table to populate.

Save and Use the Data in Your Fixed Input Tile
When your Fixed Input data is ready, connect it to any downstream transformation tile in your DataFlow. The data is stored in the DataFlow definition—not as a standalone DataSet. To create a separate DataSet, connect the tile to an Output DataSet tile. When the DataFlow runs, Domo creates a new DataSet from that output. Learn how to configure Output DataSet Tiles.FAQ
Why are my decimal columns being truncated to five decimal places?
Why are my decimal columns being truncated to five decimal places?
Domo uses a precision/scale of (19,5) for fixed decimals, meaning that fixed decimals will only ever have 5 digits to the right of the decimal point. You can request that your DataSet be configured to use a higher precision/scale.
Are there limits to the amount of data I can create in Fixed Input tiles?
Are there limits to the amount of data I can create in Fixed Input tiles?
Uploaded files are limited to 10,000 rows and 1,000 columns. There’s no defined limit for manually entered or pasted data, though future limits may apply.
Does the Fixed Input tile create a separate DataSet?
Does the Fixed Input tile create a separate DataSet?
No. Data stays in the DataFlow definition. To create a standalone DataSet, connect it to an Output tile.
