Skip to main content

File Enrichment (CSV, PDF)

Enriching tabular data (csv, dataframe)

The SDK can be used to enrich tabular data directly by integrating with the pandas library. All the enrichment operations can receive a pandas.DataFrame and will in turn return one as well. The tabular data should contain columns with the same names as the expected input transaction attributes described above.

An example .csv file would be:

transaction_id,description,entry_type,amount,date,iso_currency_code,country_code,account_holder_type,account_holder_id
1234,TEST TRANSACTION,outgoing,123.4,2022-01-01,USD,USA,business,id-1234

This file can be processed by loading the .csv with pandas and enriching with the SDK. Note that the output will also be a pandas.DataFrame that can be persisted to .csv using the to_csv method:

import pandas as pd
from ntropy_sdk import SDK

tx_df = pd.read_csv("transactions.csv")
sdk = SDK("YOUR-API-KEY")

enriched_df = sdk.add_transactions(tx_df) # output is also a dataframe
enriched_df.to_csv("enriched.csv")

Enriching PDF files

The SDK also supports enriching PDF files with up to 200MB. The file is then submitted and processed by our OCR pipeline asynchronously.

Once processed, the result is a table containing the enriched transactions that were recognised by OCR.

An example code snippet is as follows:

from ntropy_sdk import SDK
sdk = SDK("YOUR-API-KEY")

with open('bank_statement.pdf', 'rb') as f:
bsr = sdk.add_bank_statement(file=f)

# do operations in the meantime
# ...

# block and wait for result
df = bsr.wait()

You may also specify an account holder to which the underlying transactions will be associated instead of relying on default values. It works similarly to when they are associated with transactions in that they are created implicitly1:

from ntropy_sdk import SDK, AccountHolderType
sdk = SDK("YOUR-API-KEy")

# Creates account holder with default account_type `business`
with open('bank_statement.pdf', 'rb') as f:
bsr = sdk.add_bank_statement(file=f,
account_holder_id="7b6ce4f1-004a-40f0-a480-562a831809ef")

# Uses `consumer` as account_holder_type
with open('bank_statement.pdf', 'rb') as f:
bsr = sdk.add_bank_statement(file=f,
account_holder_id="7b6ce4f1-004a-40f0-a480-562a831809ef",
account_type=AccountHolderType.consumer)

Footnotes

  1. You may always create account holders explicitly via API.