Developing new models

We have designed the system to be easily extensible - new models can be added with relative ease.

Frameworks

Bundled models are implemented using either PyTorch (for GPU-based models) or Scikit-learn (everything else). New models can be easily implemented using either of these frameworks. Implementing using other frameworks (such as Tensorflow) should be possible (as the common core of KTT is not hardwired to any particular framework) but has not been tested. Furthermore, you will need to add and keep track of your own dependencies yourself.

Should you decide to use another framework, please refer to the section at the end of this page.

General model folder structure

Models should have a fixed and uniform folder structure regardless of framework. This is not mandatory, but is highly encouraged to enable easier maintenance and improve code consistency. Bundled models follow the following folder structure:

models
└── model_name
    ├── __init__.py
    ├── bentoml
    │   ├── __init__.py
    │   ├── dashboard.json
    │   ├── evidently.yaml
    │   └── svc_lts.py
    └── model_name.py

As one can see, each model is put in their own folder, whose name is also the model’s identifier to be used in the common training and exporting controller. All model folders are put inside the ./models folder.

An explanation of what each file and folder is:

__init__.py files are there to ensure Python reads each folder as a module, allowing the model to be imported.
model_name.py is where the model is implemented as a Model object. This is also where most of the work happens. Note that it has the same name as the model folder.
bentoml/ contains all BentoML-related code and resources. These files and resources are needed to build a BentoService from this model.
bentoml/dashboard.json is the Grafana dashboard’s JSON schema. This file will be automatically provisioned for the Grafana instance when we start the Dockerisation process.
evidently.yaml is a template configuration file for the Evidently monitoring service. By itself, it contains dataset-agnostic configuration parameters tailored to this model. When the service is built, a copy is made of this file and additional dataset-dependent configuration added. That copy will reside in the built BentoService.
svc_lts.py is the monitoring application. It receives new data (feature inputs and resulting scores) from the main inference service, compares against a reference dataset and computes data drift/target drift metrics to be sent to the Prometheus database instance. It is a separate web service from the main inference service.

With that out of the way, we will now deep-dive into how to implement each part of a model:

The model itself

Every model must subclass a framework-specific abstract model class, all of which in turn subclass models.model.Model. This serves as the baseline requirement, or the minimum set of features a model must implement so they could be used with KTT’s training and exporting facilities.

Unless you are implementing your model in a framework other than PyTorch or Scikit-learn, you need not bother with get_dataloader_func and get_metrics_func. The rest will be detailed below.

Checkpointing

This part gives detail on how to implement save, load and Model.from_checkpoint.

If either path or best_path is specified for the fit method, a model’s training session must produce at least one checkpoint. Checkpoints must contain enough data to fully replicate the model and its state, including the state of any training process up to that moment in time (for example, Adam optimiser state dictionaries). For example, a PyTorch model checkpoint must contain not just state_dict fields but also sufficient metadata regarding the topology (number of layers, layer sizes and so on).

Model checkpoints (or pickled trained models) for a particular dataset must be saved to ./weights/<model_name>/<dataset name>. For example, a model named abc trained against an intermediate dataset named data001 must save its weights in ./weights/abc/data001. These are to be implemented at the fit method, which will pass suitable paths to the save method in the same instance to produce checkpoints at those paths.

Checkpoint file names must abide by the following convention:

Best-performing checkpoints (by your own metric, for example the smallest validation loss value): best_YYYY-MM-DDTHH:MM:SS.<extension>.

The last checkpoint (produced at the last epoch): last_YYYY-MM-DDTHH:MM:SS.<extension>.

In other words, the checkpoint name is best | last plus an ISO8601 datetime string (truncated to seconds) and then the file extension. Every checkpoint must be packaged within a single file. The exact extension depends on the format you choose to save your model’s checkpoint in. All bundled models use Pickle to serialise their models and as such use the .pt extension.

Note

Models that do not generate in-progress checkpoints (such as Scikit-learn models whose training process is a simple blocking fit() call) can produce their only checkpoint labelled as either best or last. However, since the export script defaults to looking for best checkpoints, it would be more convenient to use best. This would allow you to call the export script for these models without having to specify an additional option at the export script.

Preprocessing needs

Should your model require the dataset to be preprocessed in any way (for example, tokenisation for DistilBERT and stemming for simple term-based models), implement such logic by subclassing the BasePreprocessor class in utils/encoders/encoder.py:

class utils.encoders.encoder.BasePreprocessor(config)

A base class for your custom preprocessors.

This base preprocessor does not do anything (passthrough). It’s used by default when your model does not specify any preprocessor.

Models may require specific preprocessing in the form of tokenisation, stemming, word removal and so on. Such preprocessing can be implemented by subclassing this class in your own model definition file.

__call__(text)

Transform the given text into your preferred input format.

Parameters

text (str) – The text to tokenise.

Returns

inputs – All preprocessors must return a dictionary of model input fields. Do not use the key label as it is reserved for use by the PyTorchDataset class, which will later write the labels corresponding to this text to the dictionary.

By default, the BaseProcessor returns the input unchanged as the text key.

Return type

dict

__init__(config): General constructor.

__weakref__: list of weak references to the object (if defined)

With your logic implemented, instruct your model to use it by implementing the get_preprocessor method. This method returns an instance of your preprocessor class and will be called on every new dataset loading. That instance will be used to preprocess that dataset before its data is fed to your model.

If your model directly takes in raw text from the datasets, simply skip this method. Its default implementation in the Model class simply returns an instance of the above BasePreprocessor which does nothing aside from putting your input text in a dictionary at key text.

Exporting

This part gives detail on how to implement the two export_... methods and the BentoService.

KTT models should be able to export themselves into 2 formats:

ONNX: Open Neural Network eXchange, suitable for deploying to existing ML services with an ONNX runtime. A model may be exported into one or more .onnx files.

BentoML: Not strictly a ‘format per se, but rather a packaging of the model with the necessary resources to create a standalone REST API server based on the BentoML framework.

Your model should support both formats, but at the minimum, it should support one of them (because what good is a model that can only be trained but not used?).

ONNX

Exporting to ONNX should be a simple process as it does not involve the rest of the system that much. In export_onnx, simply use the ONNX converter tool appropriate for your framework (for example, skl2onnx for sklearn and torch.onnx for PyTorch). Once you have the ONNX graphs, write them to the paths passed to this function. If your model should be exported as one ONNX graph, export to the clasifier_path and leave encoder_path blank.

`export_bento_resources`

Exporting into a BentoML service is more involved, and also gives you more decisions to make. There are two main approaches regarding the format that you can export your model core into for your BentoService implementation to later use.

Reuse the ONNX graphs. This is possible, but you might run into problems with BentoML’s internal ONNX handling code not playing well with CUDA and TensorRT (at least in the current version). For non-GPU models, this is generally stable and could save you a bit of time (due to slightly better optimisation from an ONNXRuntime) and storage space (depending on your checkpoint format).

Directly serialise your model and implement your BentoService’s runner function to behave just like your test method. Depending on framework and code structure, you might still achieve equal performance as the ONNX approach. This has the added bonus of allowing you to adapt the code from your test method into the BentoService, saving implementation time. It is also less fussy and is more likely to work well for GPU-enabled models.

Tip

KTT internally uses BentoML 0.13.1 (the LTS branch). You can find specific instructions for your framework in their documentation.

The implementation for BentoML exporting is split into three parts: the svc_lts.py source file, the reference dataset’s generation plus configuration files (optional), and the export_bento_resources method. The expected result after running the export script with this model would be a new BentoService in the ./build folder with the following structure:

build
└── <model name>_<dataset name>
     ├── docker-compose.yaml
     ├── grafana
     │   └── provisioning
     │       ├── dashboards
     │       │   ├── dashboard.json
     │       │   └── dashboard.yml
     │       └── datasources
     │           └── datasource.yml
     ├── inference
     │   ├── bentoml-init.sh
     │   ├── bentoml.yml
     │   ├── (...)
     │   ├── Dockerfile
     ├── monitoring
     │   ├── Dockerfile
     │   ├── evidently.yaml
     │   ├── monitoring.py
     │   ├── references.parquet
     │   └── requirements.txt
     └── prometheus
         └── prometheus.yaml

or this, if it was built without support for monitoring:

build
└── <model name>_<dataset name>
     └── inference
         ├── bentoml-init.sh
         ├── bentoml.yml
         ├── (...)
         └── Dockerfile

To keep things nice and tidy, we recommend that you create a subfolder within your model folder to store BentoML-specific files just like the example general folder structure at General model folder structure.

The service implementation

The svc_lts.py file contains the definition of the BentoService for this model. The following is a rough description of what you need to implement in this file:

Define a subclass of bentoml.BentoService preferably named just like your model class (as it will be the name used by the exported BentoService’s internal files and folders).

Based on what your model requires, define Artifacts for this class via the @bentoml.artifacts decorator. Artifacts are data objects needed for the service, such as the serialised model itself, a metadata JSON file, or some form of configuration.

Define a predict method that accepts one of BentoML’s InputAdapters (see here) and returns a single string, preferably a newline-separated list of class names in hierarchical order (example: Food\\nPasta\\nSpaghetti). This method will contain the code needed to preprocess and feed the data received passed as an InputAdapter by the outer BentoServer into the model, then get the results out of the model and postprocess into said string format. You can access the Artifacts from within this method (or any method within this class) by retrieving self.artifacts.<artifact name>. Remember to wrap this method in a @bentoml.api decorator.

(optional) If you decide to implement monitoring capabilities for your model’s BentoService, make the predict method send new data to the monitoring app on every request processed. The data to be sent is a JSON object containing a single 2D list at field data. This 2D list has shape (microbatch, reference_cols) where microbatch is the number of microbatched rows in this request (1 if you do not enable microbatching), and reference_cols is the number of columns in your reference dataset (more on this later). As you might have guessed, this 2D array is basically a tabular view of a new section to be appended to a current dataset which will be compared against said reference dataset. The monitoring service’s hostname and port are exposed via the EVIDENTLY_HOST and EVIDENTLY_PORT environment variables. If these variables have not been set, default to localhost:5001.

Note

We chose to pass them via environment variables so as to give the BentoService more flexibility. If you run everything right on the host machine, the Evidently monitoring app can be reached at localhost:5001. However, when Dockerised and run in a docker-compose network, each container is on a different host instead of localhost. By making the BentoService read environment variables, our docker-compose.yaml file can pass the suitable hostname and port to it, allowing it to continue functioning normally in both cases.

With the service implemented, we can move on to implementing logic for automatically generating the reference dataset. A reference dataset is any dataset that is representative of the data the model instance was trained on. Bundled models simply use the test subset as the reference dataset. However, due to requirements from Evidently (the model metrics framework used by KTT), the reference dataset instead needs to contain numerical features (for the feature_drift report), with each feature being given one column. In addition to that, it may also require the raw classification scores (the numerical results the model outputs) if you choose to specify categorical_target_drift reports as part of the metrics to compute and track - again, each class gets its own scores column.

The service configuration files

If you chose not to include monitoring capabilities with this model, you may safely skip this part. If you do want to have them, however, then there are two configuration files to add to your model folder: evidently.yaml, and dashboard.json.

The evidently.yaml configuration template follows Evidently’s service configuration format. Technically, you can also use JSON or any other format, as you will be the one implementing the parsing code later. However, since the Evidently monitoring app itself uses YAML, it is best to just stick to that. There are a number of parameters that must be provided by the template. A sample templated configuration for DB-BHCN, which contains all of them, can be seen below:

service:
    reference_path: './references.parquet'
    min_reference_size: 30
    use_reference: true
    moving_reference: false
    window_size: 30
    calculation_period_sec: 60
    monitors:
        - cat_target_drift
        - data_drift

A simple explanation of these fields:

reference_path is a relative path to the reference dataset, whose generation we will implement shortly. The dataset must be copied along with the monitoring app to the built service’s folder. This path is relative to the monitoring app itself, which is the monitoring.py file within ./build/<service name>/monitoring. It is best to simply copy the reference dataset to the same folder as the monitoring app.

min_reference_size specifies how many rows must be collected by the metrics app (from user inputs and scores forwarded by the inference service) for the metrics to start being computed. It should be at least window_size or greater.

moving_reference here is set to false, as we need a fixed reference set to analyse data drift.

window_size specifies how many collected rows to use for comparison with the reference set. Here, the latest 30 rows are used. Smaller values reduce RAM usage and metrics computation time, while larger values may help capture wider shifts in trends.

calculation_period_sec is in seconds and specifies how frequently metrics should be computed based on the latest window_size collected rows. We recommend setting it to a high value if you do not expect your production environment to change quickly, as this has a significant performance impact.
monitors is a list of Evidently reports to compute. Each report in this context is a set of metrics, which can then be displayed in a Grafana dashboard. The following are available from our Evidently monitoring app:
monitor_mapping = {
    "data_drift": DataDriftMonitor,
    "cat_target_drift": CatTargetDriftMonitor,
    "regression_performance": RegressionPerformanceMonitor,
    "classification_performance": ClassificationPerformanceMonitor,
    "prob_classification_performance": ProbClassificationPerformanceMonitor,
}
Detailed information for each of these reports can be found in the official Evidently documentation.

The second configuration file is the Grafana dashboard layout (dashboard.json). We however do not recommend you create this file by hand. Instead, you can wait until you can boot up your service and log into the Grafana instance so you can create it interactively. More on that later in Grafana dashboard design (optional).

The reference dataset

To generate the reference dataset, you must implement an additional method, called the `gen_reference_set(self, loader) method, which will be called by the training script after the training script and test script if the user specifies --reference or -r. This is quite similar to the test method in that it also runs the model over a dataset (passed as a ‘loader’ of your choice), but it also records the numerical features (for example, from a Tf-idf vectoriser, or a DistilBERT instance) along with numerical classification scores. For reference, you may want to take a look at DB-BHCN’s version:

The exact reference set schema depends on your choice of Evidently reports and also your model’s design. DB-BHCN for example generates a reference dataset containings firstly a targets column (ground truths, using textual class names), 24 average-pooled feature columns (from the 768 features produced by its DistilBERT encoder) named 0 to 23 (in string form), and classification score columns, one for each leaf-level class, with the column names being the string names of the classes themselves.

The resulting reference set must be in Parquet format (.parquet) named similarly to the last checkpoint, with _reference added. For example, if the last checkpoint is named last_2022-04-01T12:34:56.pt, then the reference dataset must be named last_2022-04-01T12:34:56_reference.parquet.

The `export_bento_resources` method

In this method, you should do the following:

Create a configuration dictionary (let’s just call it config). Write a list of scores column names to the prediction key. This is necessary to inform the to-be-bundled Evidently service of which columns to track in the reference dataset and the new data coming in from production. If you chose to follow the default reference dataset schema above, that would be a list of all leaf class names.

Initialise a BentoService instance (as in importing the above svc_lts.py file as a module and constructing the BentoService-based class within). Pack all necessary resources into its artifacts.

PyTorch modules should be JIT-traced using torch.jit.trace before packing into a PytorchArtifact.

Configuration files or metadata should be packed as strings.

Return that configuration dictionary and the BentoService instance.

Specifying your hyperparameters (optional)

Some models might have tunable hyperparameters. KTT has facilities to automatically retrieve values from a ./hyperparameters.json file. Each model gets a JSON object with their own identifier as the key. You can add your model’s hyperparameters to this file, and then tell KTT to load them in at the beginning of the training session to initialise your model.

You can either directly tune your model by modifying this file, or implement automatic hyperparameter tuning in your training script and use this file to supply starting values.

Models without tunable hyperparameters can skip this step.

Registering your model with the rest of the system

Now that you have fully implemented your model, it is time to inform the training and exporting scripts of its existence and also on how to run it.

The model lists

Edit ./models/__init__.py and add your model to it. There are three places to do so:

First, import your model class (the one subclassing Model, PyTorchModel or SklearnModel). This allows the training and exporting code to shorten the import path to just from models import YourModelClass instead of from models.your_model.your_model import YourModelClass. Refer to how the bundled models are imported to import your own.
Then, add your model identifier (the model folder name) to the appropriate MODEL_LIST. Currently, there is the PYTORCH_MODEL_LIST and SKLEARN_MODEL_LIST.

Note

TODO: Add instructions for implementing a model outside of these frameworks.
Lastly, add your model class name (not the folder name) to __all__. If your model needs to expose special functions that are not available.

Test-run your model

If all goes to plan, you can now call on train.py and test.py with your model just like any of the bundled models. Train it on a preprocessed dataset and check if its checkpoints are in the correct format. Ensure that it can load, save and export smoothly.

If you implemented BentoService exporting, you can test-run the built service in two ways:

Without monitoring capabilities: either run the inference service directly using bentoml serve, or run as a Docker container using the supplied Dockerfile.
cd ./build/<model_name>_<dataset_name>/inference

 bentoml serve ./
 # or use the production gunicorn server
 bentoml serve-gunicorn ./
 # or as a Docker container
 docker image build .
 docker run -p 5000:5000 <built image ID>
With monitoring capabilities: fire up your entire service using the autogenerated docker-compose script:
cd ./build/<model_name>_<dataset_name>
docker-compose up
This will Dockerise the inference app and monitoring app (if not already), download and run Prometheus and Grafana, and configure them all to fit together nicely in a Docker network.

The following ports are exposed by the service:

5000: the inference service. POST inference requests to its /predict endpoint. The format of the request is whatever you decided on in your BentoService implementation. BentoML’s server also returns API- and process-related metrics through the /metrics endpoint.

5001: the monitoring service. It returns data-related metrics through its /metrics endpoint. Only available in monitoring-enabled services.

9090: the Prometheus database control panel. Only available in monitoring-enabled services.

3000: the Grafana control panel. Only available in monitoring-enabled services.

Grafana dashboard design (optional)

If you designed your model with monitoring capabilities, now is the time to start designing your Grafana dashboard. Log into Grafana (by default exposed at localhost:3000) with default credentials (admin for both username and password - remember to change them!). A Prometheus data source should already have been included, which connects to the Prometheus instance in your Docker network, which in turn fetches metrics regularly from the inference and monitoring services’ /metrics endpoint.

You can now create your dashboard from this data source, using metrics names returned by the /metrics endpoints above. You can also import the JSON schema of an existing dashboard from a bundled model to learn how to display them.

Once everything is done and running, export your dashboard as a JSON file (remember to tick that external exporting option). Place the JSON in your model’s folder (preferably following the standard folder structure at the beginning of this guide) and rename it if necessary.

Testing automatic dashboard provisioning

The expected starting state of a completed BentoService from KTT includes a fully provisioned Grafana instance. This means you should ensure that your Grafana dashboard is loaded in and running without any user intervention right from service startup. To facilitate this, ensure that your export method correctly passes the path to the dashboard JSON to the init_folder_structure function. This function would then copy the JSON to ./build/<model_name>_<dataset_name>/grafana/provisioning/dashboards.

After finishing designing your dashboard, exporting it, placing the JSON in the correct location and specifying the path in your export method implementation, you should repeat the whole exporting process. First, remove all previously-built service files. Then, remove all related Docker volumes using docker volume prune. Finally, export your service as usual and docker-compose up to start it. Log into your Grafana dashboard again (the username and password should have been reverted to the default credentials - if not, you have not fully cleared the previous service’s data) and see if the dashboard is already there. If it is, congratulations! You now have a fully working model and BentoService exporting process!

Framework-specific guides

The above instructions only cover parts that are common between all frameworks. See below for in-depth guides for each framework: