Skip to main content

Intro

There are two available scripting tiles in Magic ETL: Python Script and R Script.
Note: These tiles are available by default to users on the Domo Consumption agreement.For non-consumption users, the tiles are available on demand and paid . To enable scripting tiles, contact your Domo account team. You may need to complete training before you can use a tile.
Screenshot 2023-04-27 at 8.46.52 PM.png


General Information

This section provides information about all scripting tiles in Magic ETL. For information about specific tiles, use one of the following links to jump to the API reference for that tile: Scripting tiles are a powerful tool within Magic ETL. They allow you to write custom R or Python algorithms and implement them directly into DataFlows. Use them to create complex data science analyses that run every time your data updates.

Take a Tour

All scripting tiles have the same look. The only difference between them is the supported language. The main body of the tile is the code editor. This is where you write your script. It includes syntax highlighting for the specific language the tile supports.
scripting_actions_code_editor.png
On the right side of the tile is a searchable list of input tiles currently connected to the scripting tile. Click on the name of an input to inject a code snippet into the script at the location of the cursor. This code snippet will read the data into the script and store it in a variable. Rename the variable to whatever you would like.
scripting_actions_inputs.png
There is a second tab on the right side of the tile called Packages. When you click the tab, a searchable list of available packages for the supported language is displayed. Hover over the package name for a description, or click on the arrow that appears to go to the package’s documentation website. As with inputs, if you click on a package, a code snippet is injected into the script at the location of the cursor. This code snippet imports the selected library for use in the script.
scripting_actions_packages.png
On the right side of the tile editor panel toolbar is a template icon button . Clicking this button resets the script to the example code template that exists when you first open a scripting tile. Be careful though—if you have already written a script, clicking this button overwrites your script (after a confirmation prompt). However, as long as you do not close the tile, you can retrieve your overwritten script by placing the cursor within the code editor and pressing Command+z or CTRL+z. In the Schema tab, you tell the tile what the results of your script looks like. To use this, click Schema in the tile editor panel toolbar. For a deep dive into this tab, see the Populating the Resulting Schema section of this article.
scripting_actions_schema.png
In the Console tab, you can see the standard out (stdout ) and standard error (stderr ) that your script produces. This tab is not available until after a preview is run on the tile, when it will appear as an option in the tile editor panel toolbar.
scripting_actions_console.png
In the Preview tab, you can see the results of running your script. This tab also does not appear until after a preview is run. It will then appear as an option in the tile editor panel toolbar.
scripting_actions_preview.png

Get Data In and Out of the Scripting Tile

Domo provides a basic API for importing data (read) from the Domo ecosystem into your script and then for exporting your results out (writing) of the script and back into the Domo ecosystem. The API is different for each scripting language and tries to follow the semantics of the language.
Note: Be aware of the following data type issues when using scripting tiles in Domo:
  • In Python, LONG becomes DOUBLE in Python because NumPY does not support NULLs in integer columns.
  • In both Python and R, DECIMAL becomes DOUBLE because there is no analogous type in NumPY or R.
The easiest way to see the specific API calls for your language is to examine the initial code template that is provided in a newly created and connected Scripting tile.

Step 1: Import The Domo Package

The first line in any script should be used for importing the language-specific Domo package into the script. This follows standard package import semantics for the chosen language. It is part of the initial code template. Alternatively, you can search for “domo” in the Packages list to the right of the code editor and click on the package to inject the code snippet into the editor.

Step 2: Read Data Into The Script

The next step in writing a script is getting access to the data. Each language will have its own semantic version of a Domo “read” method. This is included in the initial code template. Alternatively, you can click on one of the inputs in the input list to the right of the code editor. This injects a code snippet that reads the input data and stores it in a variable. You can rename this variable if you like.

Step 3: Export Data Out Of The Script

The last step in writing a script is returning the results to the Domo ecosystem. Each language has its own semantic version of a Domo “write” method. This is also included in the initial code template. There is no other way of injecting a code snippet for this, however. Provide the name of the variable that is storing the results of your script to this “write” method. There can only be one export per script.

Populate the Resulting Schema

There are three ways to tell the tile what schema your script produces. These methods can be mixed and matched as necessary to get the proper result. You perform all of these in the Schema tab of the tile.

Manually Adding the Columns

Follow these steps to add the columns manually:
  1. Type the name of a column that your script will produce into the Column Name field.
    scripting_actions_manual_column_name.png
  2. In the list of available data types, select the data type for this column.
    scripting_actions_manual_data_type.png
  3. Add another column by clicking the Add Column button.
    scripting_actions_manual_add_column.png
    Important: All added columns must be filled out for the tile to be considered configured.

Bulk Add Columns

Follow these steps to add the columns in bulk:
  1. Add all the columns from a particular “Input DataSet” tile in bulk by clicking on the Add From DataSet button and selecting the “Input DataSet” tile you want to populate from.
scripting_actions_bulk_add_from_dataset.png
  1. Remove any excess columns using the “X” button on the far right side of the column.
scripting_actions_bulk_schema_output.png

Run Script

Follow these steps to have Domo run the script for you:
  1. Click either the button on the right side of the tile editor panel toolbar or the Run Preview button in the toolbar at the top of the canvas. (If there is an issue with your preview, see the Troubleshooting section of this article.) After the preview has finished running, a sentence appears under the text inputs for creating new columns. This tells you how many columns the preview returned that are not currently listed as being part of the tile’s schema.
  2. Click the button at the end of the sentence.
After you populate the schema, the tile knows what the results of your script look like and will be marked as configured (as long as you have also written a script). The columns listed in the Schema tab will be passed to the next tile in the DataFlow.

Configuration Steps

If you have not done so already, check out the Take a Tour section of this article to become familiar with the different features of the Scripting Tile.
  1. Start a new Magic ETL DataFlow.
  2. Drag an Input DataSet tile onto the canvas and select the DataSet you want to use. You will find the Scripting tiles under the “Data Science” section of the left sidebar.
  3. Drag the Scripting tile for the language you want to write onto the canvas and then drag a connection from the “Input DataSet” to this Scripting tile.
    Python_1.png
  4. (Conditional) If the editor panel is not open at the bottom of the screen, click the Scripting tile you just added to the canvas so that it is selected; the editor panel will then open. The code editor will be auto-populated with an initial code template. For more information on what is included in the template, see Get Data In and Out of the Scripting Tile, above. A yellow banner also displays. Ignore this for now; it will be addressed later on in the configuration steps. You will see that the template is pulling the data from the “Input DataSet” tile that is connected to the Scripting tile and storing it in a variable. This is the variable that your script should manipulate.
    scripting_actions_code_editor__1_.png
  5. (Conditional) If you have multiple data sources, simply drag out another “Input DataSet,” select the next DataSet you need, connect this tile to the Scripting tile, and add another import statement to the script.
    Python_2.png
    Underneath the line that is creating the variable to hold your data, you will see a space for you to write your script.
    scripting_actions_configure3.png
  6. Write your script in the line provided, making sure that the export statement at the very bottom of the template is exporting the variable that represents the final product of your script. You will notice that your tile is still not marked as configured and there is still a yellow banner at the top of the tile telling you this. This is because the tile does not yet know what the result of your script looks like.
  7. To provide the tile with this information…
    1. Select the Schema tab in the toolbar of the tile editor panel.
    2. Choose one of the methods listed in the Populate the Resulting Schema section of this article and follow the steps there. Afterwards, continue the rest of the steps listed below.
  8. Drag an “Output DataSet” tile from the sidebar and drop it onto the canvas.
  9. Drag a connection from the Scripting tile to this new “Output DataSet” tile.
    Python_3.png
  10. Select the Output DataSet tile to open the tile editor and give the tile a name and description. The tiles within the DataFlow are now fully configured.
  11. Next, enter a name for the DataFlow in the field in the top left corner.
  12. Configure the settings in the panel using the Settings button in the canvas toolbar.
  13. Save the DataFlow.
Congratulations! You have created a DataFlow that utilizes a Scripting Tile. This is a basic configuration that only uses a single Input DataSet tile, a single Scripting tile, and a single Output DataSet tile. Don’t forget that you can still use all of the classic DataFlow tiles to manipulate the data before and after your Scripting tile.

API Reference

domomagic is the simple API provided to both the Magic ETL Python Tile and R Tile authors, allowing them to load and unload data to and from the Python and R contexts.

Python Script Tile Module API Reference

Python_Script.png

Method Signature

Parameters

Returns

read_dataframe(input_name=None)

input_name: Name of the Magic ETL input tile. Not required if only one input is available.

A Pandas DataFrame representing the tiles input.

write_dataframe(dataframe)

dataframe: Pandas DataFrame that will be used as the output for the Magic ETL tile.

N/A
read_array(column_name,input_name=None) column_name : Name of the column.

input_name: Name of the Magic ETL input tile. Not required if only one input is available.
A single column from the input as a NumPy array.
read_array_dict(input_name=None) input_name: Name of the Magic ETL input tile. Not required if only one input is available. The input as an OrderedDict of column names mapping to NumPy arrays.
write_array_dict(array_dict) array_dict : The dictionary of column names mapping to NumPy arrays as output to the Magic ETL context. The arrays must be of equal length. N/A

R Script Tile Package API Reference

R_Script.png

Method Signature

Parameters

Returns

read.dataframe(input.name =NULL, see below)

input.name : Name of the Magic ETL input tile. Optional if only one input is available.

The tiles input data.frame.

write.dataframe(table)

table : Exports a data.frame or matrix as the output for a Magic ETL tile.

N/A

The R Package read.dataframe function also supports the following parameters that allow you to customize its behavior:

Parameter

Description

colClasses

A named character vector of classes to be assigned to the named columns.

stringsAsFactors

Logical value determining whether or not string columns will be read as factors. Defaults to false, and is overridden by colClasses.

integersAsNumerics

Logical value determining whether integers should be read as numerics (floating point numbers). Defaults to true, and is overridden by colClasses.

allVerbatimExcept

Character vector of column names which should be read according to default behavior. All other columns will be read in verbatim as strings and written out with their original types. Useful when translation to and from the R context is causing loss of precision or errors and the column is not necessary for the calculation being performed.

allIgnoredExcept

Character vector of column names which will be read. All other columns will be ignored.

Project Directory and /tmp Directory in Scripting Tiles

In Magic ETL’s Python and R scripting environment, you can use two special directories to store and share data between scripting tiles in a DataFlow—the Project Directory and the /tmp Directory. These directories simplify the experience of working with complex data processing tasks.

Project Directory

The Project Directory is a shared space for all scripting tiles within a DataFlow, and its contents persist between executions. This means that you can store things like statistical models in the Project Directory and continue working on them across multiple executions. The Project Directory is located at /home/domo/project, but you can simply refer to it as project in your script. Any changes you make to the Project Directory are only saved after successful executions. Changes from preview or failed executions are discarded. The total size of a Project Directory may not exceed eight gigabytes (8 GB). Contact Domo Support if you need to increase this limit, restore a directory from a backup, or copy a directory from one DataFlow to another.

Use Cases

Possible use cases for the project directory include the following:
  • Use Case 1 — Train a statistical model over time, which allows you to keep multiple versions while you figure out which one works best.
  • Use Case 2 — Update complex calculations or aggregate statistics with new data instead of re-computing everything from the beginning when using the Input DataSet option that loads only new batches since the last successful execution.

Set Up a Project Directory

  1. Create your DataFlow with the necessary scripting tiles (Python and/or R) and any required input and output DataSet tiles.
  2. In the first scripting tile, whether Python or R, write your script to process the data incrementally and save the intermediate results or models to the Project Directory.
    step 2 example proj dir.png
  3. In another scripting tile, whether Python or R, read the intermediate results or models saved in the Project Directory from the previous tile and continue processing as needed.
    read intermediate results.png
  4. Save the final results or updated models to the Project Directory, if needed, and use the Output DataSet tile to store the final results.

/tmp Directory

The /tmp directory is a shared space for all scripting tiles within a DataFlow. If you create a file in the /tmp Directory in one scripting tile, you can access it from another scripting tile within the same DataFlow. Unlike the Project Directory, the contents of the /tmp Directory are temporary and are deleted between executions of the DataFlow.

Use Cases

Possible use cases for the /tmp Directory include the following:
  • Use Case 1 — Create multiple output DataFrames in one script as files in /tmp and use them in separate tables in downstream scripting tiles.
  • Use Case 2 — Share an object between scripting tiles that can’t be easily represented as a DataFrame.
  • Use Case 3 — Improve communication efficiency between scripting tiles.
Set Up a /tmp Directory
  1. Create your DataFlow with the necessary scripting tiles, whether Python or R, and any required input and output DataSet tiles.
  2. In the first scripting tile, write your script to process the data as needed and save the resulting data or objects to the /tmp Directory.
    step two tmp directory.png
  3. In the second scripting tile, read the data saved in the /tmp Directory from the previous tile.
    tmp second tile.png
  4. Continue processing the data as needed, and use the Output DataSet tile to save the final results.

Troubleshooting

  • If your tile is not marked as configured after writing the script, make sure you have provided a resulting schema. See Populate the Resulting Schema above to learn more.
  • To get feedback on the behavior of your script, run a preview. You can run a preview by selecting the Preview in the toolbar at the top of the canvas. After running a preview, a Console tab appears in the tile editor panel toolbar. This displays the standard out (stdout ) and standard error (stderr ) that your script produces. If the preview fails, check this console for errors.
  • If a script is failing, check that you have properly finished all three of the steps listed in the Get Data In and Out of the Scripting Tile section of this article. Check the variable name that you are writing back to Domo to make sure it is correct.
  • Previews run on only a sample of your DataSet. If your script requires a certain variability in data to function properly, the sample used by the preview might not meet this variability requirement. You can control the size of the preview sample by changing the Row Limit in the canvas toolbar.
    Note: Note that larger sample sizes will result in longer preview run times.
  • Make sure that you have explicitly imported the packages/libraries that you need for your script to run. A list of available packages displays in the Package tab.
  • If a column is resulting in all null values, it could mean that the column is not actually produced by your script. If a column is provided in the list in the Schema tab, that column is added to the results regardless of whether the script actually produces it or not. If the script does not produce that column, then all of the values for that column are set to null. You can see which of your columns are and are not created by the script by running a preview and then looking at the Schema tab. The status of each column displays to the right of the column data type.
  • If your script is failing during runtime with an out of memory error, your input DataSet could be too large or your script could be doing some sort of join that is causing the data to explode. Try filtering the data before it gets injected into the scripting tile and/or refactoring your script to prevent data explosion. For more information on troubleshooting memory errors, see Troubleshooting Scripting Tiles Memory Errors.