Einblick's AutoML engine, Alpine Meadow, enables you to build predictive models in an interactive way. Alpine Meadow was developed at MIT and Brown University during the DARPA D3M program and is under active development at Einblick. If you want to learn more about Alpine Meadow, please refer to this page.
Einblick's AutoML engine automatically evaluates a wide range of machine learning models and preprocessors to construct pipelines that best solve the problem you specified (for more details on Alpine Meadow's search space of primitives, click here.) You can then choose one of the presented models, investigate its characteristics, and use it to predict values in other contexts.
To better understand how to create a machine learning model, let's first cover a few important terms.
- Target: the attribute we want to predict (e.g.
- Features: the attributes the model uses to predict the predict the target (e.g.
- Training Set: the data which establishes the relationship between the target and features
- The training set will contain all of the features and the target, so that the model can understand how they are related
- Test Set (optional): data which can be used to evaluate how well the trained model performs on unseen data.
- The target is either absent or ignored in the test set so that the model can make predictions without "looking at the answer key"
Specify Training and Test Dataframes
To create a model, you first need to specify the training set. Drag a dataframe to the
train input slot of the operator.
If desired, you can also specify a test-set by dragging a dataframe to the
test input of the operator. If you add a test dataframe, it will be used to evaluate the model's performance. If you omit it, the AutoML operator will evaluate the model's performance by creating the appropriate train-test splits directly from the train dataframe.
Specify Features and a Target
Next, select a target from the
target menu and features from the
features menu. If the target you selected is included in the group of features, it will automatically be ignored.
Specify a Metric
The next step is to select a metric from the
metric menu. If you're unsure which metric to use, the AutoML wizard will guide you to the appropriate one. If you prefer to select the metric manually, select the task type (i.e. classification or regression) and then the appropriate metric.
Specify Final Options and Run
Lastly, specify any final inputs (e.g. the run time) in the
inputs menu. Once everything has been set, click the play button to start the AutoML operator. After starting, the operator will look like this:
The AutoML operator searches for pipelines that best perform on the specified task. Pipelines are trained machine learning models and associated processing based on the provided data and specified settings. As the AutoML operator runs, the best-performing pipelines will be displayed in a graph showing performance vs. time, like the one above. Each circular node on the graph corresponds to the best-performing pipeline discovered up until that point in time.
You can click on any pipeline node to view the time it was evaluated and its corresponding performance:
You can also change the performance metric used in the graph at any time at the top of the operator:
If the displayed metric is the same as the metric used to start the AutoML operator, the displayed graph will also show a naive score threshold. The naive score corresponds to the performance of a "dummy" model on the data (e.g. a classifier that always guesses the most common class in the training set.) A good pipeline should strongly outperform the naive score on a reasonable dataset. In the image below, the naive score is far worse than the scores of the pipelines returned by AutoML and is indicated as such.
To use a pipeline, drag its corresponding node in the AutoML graph onto the canvas. You will then see a pipeline operator appear, like the one in the image below.
To learn more about pipelines, please refer to the pipeline page.