Intro
This article describes how to use the Data Science tiles in Magic ETL. To learn how to use tiles in DataFlows, see Create a Magic ETL DataFlow.
Note: These tiles are available by default to organizations on the Domo Consumption agreement.For non-consumption users, the tiles are available on demand and paid. To enable Data Science tiles, contact your Domo account team. You may need to complete training before you can use a tile.
- AutoML Inference tile
- Classification tile
- Clustering tile
- Forecasting tile
- Multivariate Outliers tile
- Outlier Detection tile
- Prediction tile
AutoML Inference Tile


Classification Tile

Naïve Bayes Classifier
Naïve Bayes classification is faster and simpler than other classification algorithms but often less precise. It is recommended for use on larger DataSets.Random Forest Classifier
Random forest classification is an ensemble learning method that builds multiple decision trees and combines the results to obtain an overall classification. No assumptions of linearity are needed. The algorithm is more robust against extreme values in your data.Example
The following example illustrates how the Classification algorithm can be implemented and used in Magic ETL in Domo. The DataSets Catastrophic_Train.xlsx (800 rows) and Catastrophic_Test.xlsx (200 rows) are artificially generated DataSets. They contain data on insurance claims where the goal is to train a Classification algorithm that can accurately classify a new claim as catastrophic or not. A snapshot of the “Catastrophic Train” DataSet is found below.

-
Add the Classification tile and connect it to your input DataSets.

-
First, you must select a DataSet that will be used to train the algorithm and one that will be predicted. Note that these could be the same DataSet, but may be separate if you have separate training and test/validation DataSets.

-
Next, select the column that you want to classify. Then, the columns that you believe may help classify must be selected next with the numeric columns selected first (note that this can be left blank).

-
The categorical classifier columns must be selected (note that this may be left blank if at least one column was selected as a numeric classifier in the previous step). The name of the classification column must also be named. In this case, the default name “classification” is used.

-
Lastly, select either
Naïve Bayes or Random Forest as the algorithm powering the Classification tile.

-
Connect and name the output DataSet. The resulting DataSet will include the original DataSet with the appended ‘classification’ column.

Clustering Tile

K-means Clustering
K-means form clusters by randomly selecting k rows from the DataSet and treating them as cluster centers. The k clusters are then formed based on each row’s distance to the cluster center. The mean of each cluster is then calculated and treated as the new cluster center. This process is repeated until cluster membership stabilizes.K-medians Clustering
K-medians form clusters by randomly selecting k rows from the DataSet and treating them as cluster centers. The k clusters are then formed based on each row’s distance to the cluster center. The median of each cluster is then calculated and treated as the new cluster center. This process is repeated until cluster membership stabilizes.Example
The following example illustrates how the clustering algorithm can be implemented and used in Magic ETL in Domo. The sample DataSet Wholesale_Distributor_Sales.xlsx (440 rows) is an artificially generated DataSet.
-
Add the Clustering tile to your Magic ETL and connect it to the input DataSet.

-
Select the columns you want to use to determine the clusters. You must have at least one numeric column and only numeric columns can be selected.

- Next, name the new column that will contain the cluster membership and the number of clusters (k) assumed to exist in the DataSet. Typically 2-5 clusters are a good starting point, although more can be used. It is recommended that different values of k be explored.
Note: Using too many clusters is typically not beneficial as the interpretations among the resulting clusters would be difficult.
-
Then, select either
K-means or K-medians as the algorithm powering this Clustering tile.

- Lastly, an output DataSet must be connected and named.

Build a Card Using a DataSet
A Scatter Plot is a great way to visualize the data in this DataSet.
Forecasting Tile

ARIMA
With ARIMA (Auto-Regressive Integrated Moving-Average), prediction parameters are automatically chosen based on model fit. This allows for the capture of trends and seasonality in the data. The forecasts are then based on the final model parameters.Example
The following example illustrates how the ARIMA forecasting algorithm can be implemented and used in Magic ETL in Domo. The sample DataSet Daily_Web_Sales.xlsx (171 rows) is an artificially generated DataSet that contains daily revenue totals.
-
Add the Forecasting tile to your Magic ETL and connect it to the input DataSet.

-
Next, select the column containing the date/time, followed by the column that you want to forecast. In this case, ‘Revenue’ will be forecasted.

-
Now, let’s choose the width of the prediction bands (default is 95%). The higher the percentage of the value chosen, the smaller the width of the bands. The number of dates to forecast can then be chosen next. The forecasting algorithm will look at the past data and take the average distance (in time) between data points. Future forecasted points will be based on this quantity. In this case, the data is in days, so the future forecasted time points are also in days.

-
Select the number of rows back in time that must be chosen to base the future predictions on. By default, all of the rows will be used. The name of the prediction column must also be named. In this case, the default ‘prediction’ is used.

-
Next, the prediction lower and upper bounds must be named. Here, the defaults ‘prediction lower’ and ‘prediction upper’ are used.

-
Indicate the number of times you want each row to be observed per time period.
In this example, the number of times is 7.

-
Finally, connect and name the output DataSet.


Build a Card Using a DataSet
A Forecasting Card is a great way to visualize the data in this DataSet.
Multivariate Outliers Tile

Example
The following example illustrates how the outlier detection algorithm can be implemented and used in Magic ETL in Domo. The sample DataSet Wholesale_Distributor_Sales.xlsx (440 rows) is an artificially generated DataSet.
-
Add the Multivariate Outliers tile to your Magic ETL and connect it to the input DataSet.

-
Within the Multivariate Outliers tile, one or more of the 6 product categories must be chosen as the columns for which outliers will be detected on. For this example, all of the product categories are chosen.

-
Enter a name for the column that will store the outlier determination based on the set of one or more numeric columns specified previously; the default name of the column is
outlier. The values stored in this column will be either 0 (which indicates that the observation/row is not an outlier) or 1 (which indicates that the observation/row is an outlier). -
Select the quantile of the Chi-square distribution to use as a cutoff—entering a value between 0 and 1. Typically, a quantile between.95 and.99 is a good starting point, with higher quantiles leading to stricter cutoffs. We recommend exploring different values.
Note: Using too low of a quantile cutoff value will label most of the observations as outliers

-
Connect and name an output DataSet.
The output DataSet includes the original DataSet with an appended column containing the outlier determinations (highlighted below).


Build a Card Using a DataSet
A Scatter Plot graph is a great way to visualize the data in this DataSet.
Outlier Detection Tile

Mean Absolute Deviation
Mean Absolute Deviation outlier detection is an anomaly detection algorithm that aims to detect outlying or unusual observations on a numeric column in a DataSet. Unlike Standard Deviation detection, Mean Absolute Deviation outlier detection does not assume that the values in the column are normally distributed (i.e. have a bell-shaped distribution) and thus makes it most useful for columns that are non-normal or skewed (have a disproportionate number of observations that are large or small). Observations in the column are labeled as outliers if the value is greater than a pre-specified number of median absolute deviations (MADs) from the median in either direction.Standard Deviation
Standard Deviation outlier detection is an anomaly detection algorithm that attempts to detect outlying or unusual observations on a numeric column in a DataSet. Standard Deviation outlier detection specifically assumes that the values in the column are roughly normally distributed (i.e. have a bell-shaped distribution). Observations in the column are labeled as outliers if the value is greater than a pre-specified number of standard deviations from the mean in either direction.Example
The following example illustrates how the outlier detection algorithm can be implemented and used in Magic ETL in Domo. The sample DataSet Wholesale_Distributor_Sales.xlsx (440 rows) is an artificially generated DataSet.
-
Add the Outliers Detection tile to your Magic ETL and connect it to the input DataSet.

-
Within the Outliers Detection tile, one of the 6 product categories must be chosen as the column for which outliers will be detected. For this example, Fresh is the chosen column.

-
The number of median absolute deviations above or below the median that will be used as a cutoff must be selected next as well as the name of the column (default is “outlier”) that contains either TRUE (observation is an outlier) or FALSE (observation is not an outlier) values. Typically 2-3 median absolute deviations are a good starting point, although a higher (stricter) value can be used. It is recommended that different values be explored. Note that using too small of a cutoff will label most observations as outliers.

-
Next, select either the
Standard Deviation or Mean Absolute Deviation algorithm. In this example, we will select Standard Deviation.

- Last, an output DataSet must be connected and named.


Build a Card Using a DataSet
A Scatter Plot graph is a great way to visualize the data in this DataSet.
Prediction Tile

Note: The Prediction tile can only support predicting a maximum of 53 unique categories. If you need more categories, we recommend building this using our custom R/Python scripting tiles.
Linear Regression
Linear regression prediction uses a linear regression model to predict a numeric column in your data. It requires a training DataSet that contains the column you want to predict as well as other columns (also known as “predictors”) that you believe can aid in the prediction process. The algorithm uses the training DataSet to “train” a prediction algorithm which can then be applied to a “test” DataSet (note the test and training DataSets could be the same) where it will use the same “predictor” column to classify each row.Random Forest
Random Forest regression is very powerful as it uses an ensemble of many weak decision trees to create one strong regression algorithm that predicts the mean prediction of each of the individual trees. This performs well on a variety of data types as it is good at identifying complex non-linear substructures from data.Example
The following example illustrates how the Regression prediction algorithm can be implemented and used in Magic ETL in Domo. The sample DataSets Catastrophic_Train.xlsx (800 rows) and Catastrophic_Test.xlsx (200 rows) are artificially generated DataSets. They contain data on insurance claims where the goal is to train a prediction algorithm that can accurately predict the number of claims for each row in a DataSet based on data in other columns. A snapshot of the “Catastrophic Train” DataSet is found below.

-
Add the Prediction tile to your Magic ETL and connect it to the input DataSet.

-
First, you must select the training and test DataSets. Note that these could be the same DataSet.

-
Next, select the column that you want to predict. Then the columns that you believe may help predict must be selected next with the numeric columns selected first (note that this can be left blank).

-
Now, the categorical predictor columns must be selected (note that this may be left blank if at least one column was selected as a numeric predictor in the previous step). The name of the prediction column must be set next. In the example, we leave the name as the default.

-
Lastly, select either the
Linear Regression or Random Forest algorithm. The example below uses Linear Regression.

-
Connect and name the output DataSet.
The resulting output DataSet includes the original training DataSet with the appended prediction columns. The prediction columns can then be compared with the “num.claims” column.


for more information.