Skip to main content

AutoML

Einblick's AutoML engine, Alpine Meadow, enables you to build predictive models in an interactive way. Alpine Meadow was developed at MIT and Brown University during the DARPA D3M program and is under active development at Einblick. If you want to learn more about Alpine Meadow, please refer to this page.

Einblick's AutoML engine automatically evaluates a wide range of machine learning models and preprocessors to construct pipelines that best solve the problem you specified (for more details on Alpine Meadow's search space of primitives, click here.) You can then choose one of the presented models, investigate its characteristics, and use it to predict values in other contexts.

Background

To better understand how to create a machine learning model, let's first cover a few important terms.

  • Target: the attribute we want to predict (e.g. sales)
  • Features: the attributes the model uses to predict the predict the target (e.g. location, month)
  • Training Set: the data which establishes the relationship between the target and features
    • The training set will contain all of the features and the target, so that the model can understand how they are related
  • Test Set (optional): data which can be used to evaluate how well the trained model performs on unseen data.
    • The target is either absent or ignored in the test set so that the model can make predictions without "looking at the answer key"

Usage

Specify Training and Test Dataframes

To create a model, you first need to specify the training set. Drag a dataframe to the train input slot of the operator.

If desired, you can also specify a test-set by dragging a dataframe to thetest input of the operator. If you add a test dataframe, it will be used to evaluate the model's performance. If you omit it, the AutoML operator will evaluate the model's performance by creating the appropriate train-test splits directly from the train dataframe.

Specify Features and a Target

Next, select a target from the target menu and features from the features menu. If the target you selected is included in the group of features, it will automatically be ignored.

Specify a Metric

The next step is to select a metric from the metric menu. If you're unsure which metric to use, the AutoML wizard will guide you to the appropriate one. If you prefer to select the metric manually, select the task type (i.e. classification or regression) and then the appropriate metric.

Specify Final Options and Run

Lastly, specify any final inputs (e.g. the run time) in the inputs menu. Once everything has been set, click the play button to start the AutoML operator. After starting, the operator will look like this:

Completed AutoML run

Running AutoML

The AutoML operator searches for pipelines that best perform on the specified task. Pipelines are trained machine learning models and associated processing based on the provided data and specified settings. As the AutoML operator runs, the best-performing pipelines will be displayed in a graph showing performance vs. time, like the one above. Each circular node on the graph corresponds to the best-performing pipeline discovered up until that point in time.

You can click on any pipeline node to view the time it was evaluated and its corresponding performance:

Viewing pipeline performance

You can also change the performance metric used in the graph at any time at the top of the operator:

Changing displayed performance metric

If the displayed metric is the same as the metric used to start the AutoML operator, the displayed graph will also show a naive score threshold. The naive score corresponds to the performance of a "dummy" model on the data (e.g. a classifier that always guesses the most common class in the training set.) A good pipeline should strongly outperform the naive score on a reasonable dataset. In the image below, the naive score is far worse than the scores of the pipelines returned by AutoML and is indicated as such.

A low naive score relative to returned pipeline scores

Using Pipelines

To use a pipeline, drag its corresponding node in the AutoML graph onto the canvas. You will then see a pipeline operator appear, like the one in the image below.

To learn more about pipelines, please refer to the pipeline page.