Skip to main content

Custom Models

Using the API or SDK you can use our system to classify your transactions with a set of custom labels. The process requires you to provide a set of labeled transactions which are then used to train a custom model. The custom model makes use of Ntropy's advanced base models and provides additional capabilities for customization and fine-tuning based on user provided labeled data.

Usage

A model is identified by a unique name that can be re-used (overwriting the previous model with that name). Each user can have up to 10 unique models simultaneously. After a model is trained it will be automatically kept up-to-date with the Ntropy system to benefit from improvements to our base models.

caution

If you train a model with the same name as a previously trained model, the system will overwrite the existing model and you will lose access to the earliest model.

Train

The training of a custom model requires a set of input transactions with the expected attributes (as described in Enrichment) and one additional label attribute containing the ground-truth label that the model should learn for that transaction. You can consult the API reference for more details.

The models are fine-tuned on a combination of the description, entry_type, amount and iso_currency_code fields. Providing a training dataset with a good diversity of these fields for each class will improve the quality of the final model.

With the labelled data, training a new model is done by a single API or SDK call as shown below:

$ curl \
-H "X-API-KEY: <YOUR-API-KEY>" \
-H "Content-Type: application/json" \
-X POST \
--data '{
"transactions": [
{
"description": "AMAZON WEB SERVICES AWS.AMAZON.CO WA Ref5543286P25S Crd15",
"entry_type": "outgoing",
"amount": 12042.37,
"iso_currency_code": "USD",
"date": "2021-11-01",
"transaction_id": "4yp49x3tbj9mD8DB4fM8DDY6Yxbx8YP14g565Xketw3tFmn",
"country": "US",
"account_holder_id": "id-1",
"account_holder_type": "business",
"label": "cloud"
},
{
"description": "Purchase Return 10/22 Apple.Com/US CA Card 5233",
"entry_type": "incoming",
"amount": 150.94,
"iso_currency_code": "USD",
"date": "2021-11-02",
"transaction_id": "tw3tFmn4yp49x3tbj9mD8DB4fM8DDY6Yxbx8YP14g565Xke",
"country": "US",
"account_holder_type": "business",
"label": "returns"
},
...
]
}' \
https://api.ntropy.com/v2/models/my-model-name

Since model training is an asynchronous operation, you can check the status of the submitted model, or wait for it to complete, as follows:

$ curl \
-H "X-API-KEY: <YOUR-API-KEY>" \
-H "Content-Type: application/json" \
-X GET \
https://api.ntropy.com/v2/models/my-model-name

When the returned model status is ready, the model can be used for classification.

Evaluation

Evaluating a trained model is simple if you're using the SDK. The model instance returned by sdk.train_custom_model or sdk.get_custom_model can be used to calculate a number of metrics given a list of LabeledTransactions to be used as a test set:


from ntropy_sdk import SDK, LabeledTransaction

train_txs, test_txs = load_transactions(...) # creates two lists of LabeledTransaction objects

model = sdk.train_custom_model(train_txs, "my-model-name")
res = model.eval(test_txs)

print(f"accuracy = {res.accuracy()}")

You can find more information on how each metric is calculated in the SDK reference for the eval() method.

Classify

To obtain the labels of a transaction with a trained custom model you must run enrichment on the transaction while providing the model name in the model-name query parameter of the /v2/transactions/sync or /v2/transactions/async API calls (or model_name argument in SDK calls). See the example below:

$ curl \
-H "X-API-KEY: <YOUR-API-KEY>" \
-H "Content-Type: application/json" \
-X POST \
--data '{
[
...
]
' \
https://api.ntropy.com/v2/transactions/sync?model-name=my-model-name

In this case, the returned EnrichedTransaction list will contain labels obtained using the custom trained model.

Constraints

Note that there are currently some constraints to the training process:

  • The maximum number of transactions to train a model is 50k
  • The minimum number of categories (labels) to train a model is 2
  • The minimum number of transactions per category is 16