In this section, you’ll find how to create new machine learning (ML) handlers within MindsDB.
Prerequisite
You should have the latest staging version of the MindsDB repository installed locally. Follow this guide to learn how to install MindsDB for development.
ML handlers act as a bridge to any ML framework. You use ML handlers to create ML engines using the CREATE ML_ENGINE command. So you can expose ML models from any supported ML engine as an AI table.
Database Handlers
To learn more about handlers and how to implement a database handler, visit our doc page here.
You can create your own ML handler within MindsDB by inheriting from the BaseMLEngine class.
By providing the implementation for some or all of the methods contained in the BaseMLEngine
class, you can connect with the machine learning library or framework of your choice.
Apart from the __init__()
method, there are five methods, of which two must be implemented. We recommend checking actual examples in the codebase to get an idea of what goes into each of these methods, as they can change a bit depending on the nature of the system being integrated.
Let’s review the purpose of each method.
Method | Purpose |
---|---|
create() | It creates a model inside the engine registry. |
predict() | It calls a model and returns prediction data. |
update() | Optional. It updates an existing model without resetting its internal structure. |
describe() | Optional. It provides global model insights. |
create_engine() | Optional. It connects with external sources, such as REST API. |
Authors can opt for adding private methods, new files and folders, or any combination of these to structure all the necessary work that will enable the core methods to work as intended.
Other Common Methods
Under the mindsdb.integrations.libs.utils
library, contributors can find various methods that may be useful while implementing new handlers.
Also, there is a wrapper class for the BaseMLEngine
instances called BaseMLEngineExec. It is automatically deployed to take care of modifying the data responses into something that can be used alongside data handlers.
Here are the methods that must be implemented while inheriting from the BaseMLEngine class:
And here are the optional methods that you can implement alongside the mandatory ones if your ML framework allows it:
MindsDB has recently decoupled some modules out of its AutoML package in order to leverage them in integrations with other ML engines. The three modules are as follows:
The type_infer module that implements automated type inference for any dataset.
Below is the description of the input and output of this module.
Input: tabular dataset.
Output: best guesses of what type of data each column contains.
The dataprep_ml module that provides data preparation utilities, such as data cleaning, analysis, and splitting. Data cleaning procedures include column-wise cleaners, column-wise missing value imputers, and data splitters (train-val-test split, either simple or stratified).
Below is the description of the input and output of this module.
Input: tabular dataset.
Output: cleaned dataset, plus insights useful for data analysis and model building.
The mindsdb_evaluator module that provides utilities for evaluating the accuracy and calibration of ML models.
Below is the description of the input and output of this module.
Input: model predictions and the input data used to generate these predictions, including corresponding ground truth values of the column to predict.
Output: accuracy metrics that evaluate prediction accuracy and calibration metrics that check whether model-emitted probabilities are calibrated.
We recommend that new contributors use type_infer and dataprep_ml modules when writing ML handlers to avoid reimplementing thin AutoML layers over and over again; it is advised to focus on mapping input data and user parameters to the underlying framework’s API.
For now, using the mindsdb_evaluator module is not required, but will be in the short to medium term, so it’s important to be aware of it while writing a new integration.
Example
Let’s say you want to write an integration for TPOT
. Its high-level API exposes classes that are either for classification or regression. But as a handler designer, you need to ensure that arbitrary ML tasks are dispatched properly to each class (i.e., not using a regressor for a classification problem and vice versa). First, type_infer
can help you by estimating the data type of the target variable (so you immediately know what class to use). Additionally, to quickly get a stratified train-test split, you can leverage dataprep_ml
splitters and continue to focus on the actual usage of TPOT for the training and inference logic.
We would appreciate your feedback regarding usage & feature roadmap for the above modules, as they are quite new.
Please note that pytest
is the recommended testing package. Use pytest
to confirm your ML handler implementation is correct.
Templates for Unit Tests
If you implement a time-series ML handler, create your unit tests following the structure of the StatsForecast unit tests.
If you implement an NLP ML handler, create your unit tests following the structure of the Hugging Face unit tests.
To see some ML handlers that are currently in use, we encourage you to check out the following ML handlers inside the MindsDB repository:
And here are all the handlers available in the MindsDB repository.