Skip to main content

Intro

This article provides instructions for how to build an AI/ML model in a Jupyter Workspace. The model is also uploaded to the Domo Model Management interface where it can be deployed for real-time or batch inference.

Training Data

The example is intended to be simple and will not use machine learning to train a model. Instead, given a list of colored shapes, we will define a simple algorithm that classifies the shapes as blue or not blue.
import pandas as pd
from io import StringIO
data = [[`Circle`, `Red`], [`Square`, `Blue`], ['Oval`, `Green`],
[`Rectangle`, `Orange`,], [`Rectangle`, `Pink`]]
train_x = pd.DataFrame(data, columns=[`Shape`, `Color`])

# For each row in the training data, 1 if the Color is Blue, and 0 otherwise.
This is the value that we want to predict.
train_y = pd.DataFrame({`Blue`: [0,1,0,0,0]})

# View first few rows of data when joined
train_x.join(train_y).head()
Shape Color Blue
0 Circle Red 0
1 Square Blue 1
2 Oval Green 0
3 Rectangle Orange 0
4 Rectangle Pink 0

Hyperparameters

When using machine learning to train the model, parameters are required to configure the training process. In our example, we are not using them, but we include them here for reference.
hyperparameters = {
"alpha": "2.35e-05",
"lambda": "0.25"
}

Model Training

At this point, we would normally use a machine learning library to train a model to fit our training DataSet. For the purposes of this notebook, this is the model.
import pandas as pd
from io import StringIO

def invoke(data, content_type_header, accept_header):
"""Invoke the model using the data as the input

:param data: The input data
:param str content_type_header: The Content-Type header, or MediaType of the input data
:param str accept_header: The Accept header, or expected MediaType of the response
:return: The model prediction
"""
# Read csv input
input_data = pd.read_csv(StringIO(data), header=None).to_numpy()

# "predict" that the input is blue (1) if the color is Blue, otherwise 0
predictions = [1 if entry[1] == `Blue` else 0 for entry in input_data]

# Convert and return predictions as csv
return pd.DataFrame(predictions).to_csv(header=False, index=False)
This model implements an invoke function that gives us a common place to implement the algorithm. When the model is deployed in Domo, this function also acts as the entry point to execute the model. When using this model in a DataFlow, the invoke function accepts and returns data as a CSV string.

Validation

To ensure your model is read for deployment, we recommend testing using the invoke function. To keep things simple in this example, we will test against a training DataSet.
# Write training dataset as csv without headers or index column
train_csv = train_x.to_csv(header=False, index=False)

# Execute invoke function from model.py
import model
predicted_y = model.invoke(train_csv, `text/csv`, `text/csv`)
print(predicted_y)

Response

0
1
0
0
0

Model Schema

Each model defines an input and output type and optionally a schema for CSV or JSON types. CSVModelIOConfiguration, JSONModelIOConfiguration() For example, a model may accept a CSV as an input, and return a CSV as an output. In addition to manually creating a CSV schema, you can also create a CSV schema from a DataFrame which may be simpler if a DataFrame is in use.

Metrics

During training and validation, we can define metrics to measure model performance. Example metrics are included below as a reference. In addition to metric name and value, standard deviation and timestamp may be included.
from domojupyter.ai import Metric
from datetime import datetime

metrics = {
"accuracy": 1.0,
"recall":  1.0,
"precision": 1.0
}
now = datetime.now()
domo_metrics = {k: Metric(k, v, None, now) for (k,v) in metrics.items()}

Model Task

Domo lets you specify which task(s) your model is trained to perform, including:
  • TEXT_GENERATION
  • CLASSIFICATION, or
  • OTHER
Model input and output may also be configured as part of the task definition. In this example, the input and output are configured as CSV to allow for execution using the Model Inference tile in Magic ETL.
from domojupyter.ai import ModelTask, ModelTaskType
from domojupyter.ai import CSVModelIOConfiguration

# Infer the input column names and types from our training dataset
input_config = CSVModelIOConfiguration(data_frame=train_x)
# Infer the output column names and types from our training label dataset
output_config = CSVModelIOConfiguration(data_frame=train_y)
task = ModelTask(ModelTaskType.CLASSIFICATION, input_config=input_config, 
output_config=output_config)

Kernel Snapshots

Domo Jupyter Workspaces allow you to customize your environment by installing third-party libraries. To ensure that the model hosting environment matches your customized Jupyter environment, a snapshot is created of the conda environment running the Jupyter kernel. A kernel snapshot is automatically created the first time you create a model in a workspace. If one or more snapshots already exist, the most recent snapshot is used for your model. If your environment has changed and you need to create a new snapshot, you can call create\_model with create\_snapshot=True. Creating a new snapshot can take several minutes.

Create the Model

Upload the model to the Domo Model Management interface where its performance can be compared with other models, and it may be deployed as an endpoint or DataFlow tile in Magic ETL when it is ready. The following information is included:
  • Name — The name of the model
  • Entrypoint — The file containing our invoke function that is executed after it is deployed
  • Files — The serialized model or any other files required to execute our model
  • Training — Hyperparameters and metrics discovered during training
  • Tasks — A list of tasks our model supports
from domojupyter.ai import ModelTrainingInformation
import domojupyter.ai.model as ml

model_name = `Blue Classification`
entrypoint = `model.py`
extra_files = []
training = ModelTrainingInformation(metrics=domo_metrics, hyperparameters=hyperparameters, algorithm="Custom")
tasks = [task]

ml.create_model(model_name, entrypoint, extra_files, training=training, tasks=tasks)