Skip to main content

API Tutorial

This is a tutorial style introduction to the API - jump straight to the reference if you're feeling lucky.

All data, individual pieces of which are called verbatims, are grouped into sources. A source should correspond to the origin of the data, like a single mailbox, or a particular feedback channel. These can be combined for the purposes of a single inference model, so it's better to err on the side of multiple different sources than a single monolith if you're in any doubt.

A dataset is a combination of sources together with the associated label categories. For instance one dataset may be built on a website feedback source, with labels like Ease of Use or Available Information, while a different dataset could base itself on various post-purchase survey response sources and apply completely different labels about Packaging or Speed of Delivery.

So before adding any comments, you need to create a source to put them in.

Create a source example

curl -X PUT 'https://<my_api_endpoint>/api/v1/sources/<project>/example' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"source": {
"description": "An optional long form description.",
"title": "An Example Source"
}
}'

To create a source you need four things:

  1. A project. This is an existing project you are a part of.
  2. A name. Alphanumeric characters, hyphens and underscores are all OK (e.g. 'post-purchase').
  3. A title. A nice, short human-readable title for your source to display in the UI (e.g. 'Post Purchase Survey Responses').
  4. A description. Optionally, a longer form description of the source to show on the sources overview page.

The first two form the 'fully qualified' name of your source, which is used to refer to it programatically. The latter two are meant for human consumption in the UI.

Go ahead and create an example source.

You should now be the proud owner of a source! Check out your sources page, then come back.

List sources example

Let's programmatically retrieve the same information available on the sources page with all metadata for all sources. You should see your source.

curl -X GET 'https://<my_api_endpoint>/api/v1/sources' \
-H "Authorization: Bearer $REINFER_TOKEN"

If you only want the sources belonging to a specific project you can add its name to the endpoint.

Delete a source example

Deleting a source irretrievably destroys all verbatims and any other information associated with it. Any datasets which use this source will also lose the training data supplied by any labels which have been added to verbatims in this source, so this endpoint should be used with caution. That said, it should be safe to delete the source we created for your project in the previous section.

curl -X DELETE 'https://<my_api_endpoint>/api/v1/sources/id:22f0f76e82fd8867' \
-H "Authorization: Bearer $REINFER_TOKEN"

The response should be {"status": "ok"}. To be sure it's gone, you can request all sources again.

curl -X GET 'https://<my_api_endpoint>/api/v1/sources' \
-H "Authorization: Bearer $REINFER_TOKEN"

Add comments example

Sources would be useless without the comments that go in them. A comment in Re:infer is either an individual piece of text, or multiple text items that are combined into a conversation. Examples of the former include survey responses, support tickets, and customer reviews, while examples of the latter include email chains.

We will go ahead and add a couple of comments to the 'example' source created in the previous section.

Adding emails

curl -X POST 'https://<my_api_endpoint>/api/v1/sources/<project>/example/sync' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"comments": [
{
"id": "0123456789abcdef",
"messages": [
{
"body": {
"text": "Hi Bob,\n\nCould you send me today'"'"'s figures?\n\nThanks,\nAlice"
},
"from": "alice@company.com",
"sent_at": "2011-12-11T11:02:03.000000+00:00",
"to": [
"bob@organisation.org"
]
},
{
"body": {
"text": "Alice,\n\nHere are the figures for today.\n\nRegards,\nBob"
},
"from": "bob@organisation.org",
"sent_at": "2011-12-11T11:05:10.000000+00:00",
"to": [
"alice@company.com"
]
},
{
"body": {
"text": "Hi Bob,\n\nI think these are the wrong numbers - could you check?\n\nThanks again,\nAlice"
},
"from": "alice@company.com",
"sent_at": "2011-12-11T11:18:43.000000+00:00",
"to": [
"bob@organisation.org"
]
}
],
"timestamp": "2011-12-11T01:02:03.000000+00:00"
},
{
"id": "abcdef0123456789",
"messages": [
{
"body": {
"text": "All,\n\nJust to let you know that processing is running late today.\n\nRegards,\nBob"
},
"from": "bob@organisation.org",
"sent_at": "2011-12-12T10:04:30.000000+00:00",
"to": [
"alice@company.com",
"carol@company.com"
]
},
{
"body": {
"text": "Hi Bob,\n\nCould you estimate when you'"'"'ll be finished?\n\nThanks,\nCarol"
},
"from": "carol@company.com",
"sent_at": "2011-12-12T10:06:22.000000+00:00",
"to": [
"alice@company.com",
"bob@organisation.org"
]
},
{
"body": {
"text": "Carol,\n\nWe should be done by 12pm. Sorry about the delay.\n\nBest,\nBob"
},
"from": "bob@organisation.org",
"sent_at": "2011-12-11T10:09:40.000000+00:00",
"to": [
"alice@company.com",
"carol@company.com"
]
}
],
"timestamp": "2011-12-11T02:03:04.000000+00:00",
"user_properties": {
"number:severity": 3,
"string:Recipient Domain": "company.com",
"string:Sender Domain": "organisation.org"
}
}
]
}'

This example shows how to add a comment that consists of multiple messages. This is most commonly used for adding emails.

The fields used in the requests in the accompanying code should be self-explanatory. The only required fields are id, timestamp, and messages.body.text. You can learn more about available fields in the Comment Reference.

The ID field should be a hexadecimal number, unique amongst comments, of at most 256 digits. It is otherwise left to the user of the API to choose, allowing easier integration with other systems. If your IDs are not hexadecimal, you can convert them. If you want to additionally retain the original IDs, you can put them into the user_properties field that holds arbitrary user-defined metadata.

The timestamp should be in UTC and refer to the time when the comment was recorded (e.g. the survey was responded to), not the current time.

The response should confirm that two new comments have been created.

Adding single-message comments

curl -X POST 'https://<my_api_endpoint>/api/v1/sources/<project>/example/sync' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"comments": [
{
"id": "fedcba098765",
"messages": [
{
"body": {
"text": "I was impressed with the speed of your service, but the price is quite high.",
"translated_from": "J'"'"'ai \u00e9t\u00e9 impressionn\u00e9 par la rapidit\u00e9 de votre service, mais le prix est assez \u00e9lev\u00e9."
},
"language": "fr"
}
],
"timestamp": "2011-12-12T20:00:00.000000+00:00"
}
]
}'

This example shows how to add a comment that contains a single message. This format can suit data such as survey responses, customer reviews, etc.

The required and available fields are same as in the emails example, with the only difference that the messages field should contain a single entry. You can skip email-specific fields that don't fit your data, as they are not required.

The response should confirm that one new comment has been created.

Retrieve comments example

Once added, a comment may be retrieved by its ID. You should see the comment added in the previous section.

curl -X GET 'https://<my_api_endpoint>/api/v1/sources/<project>/example/comments/0123456789abcdef' \
-H "Authorization: Bearer $REINFER_TOKEN"

Create a dataset example

Having successfully added some raw data to Re:infer, we can now start to add datasets. A dataset corresponds to a taxonomy of labels along with the training data supplied by applying those labels to the verbatims in a series of selected sources. You can create many datasets which refer to the same source(s) without the act of labelling verbatims using the taxonomy of one dataset having any impact on the other datasets (or the underlying sources), allowing different teams to use Re:infer to gather insights independently.

curl -X PUT 'https://<my_api_endpoint>/api/v1/datasets/<project>/my-dataset' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataset": {
"description": "An optional long form description.",
"source_ids": [
"22f0f76e82fd8867"
],
"title": "An Example Dataset"
}
}'

Once sources have been created, appropriately-permissioned users can also create datasets in the UI, which may be more convenient.

List datasets example

curl -X GET 'https://<my_api_endpoint>/api/v1/datasets/<project>/my-dataset' \
-H "Authorization: Bearer $REINFER_TOKEN"

Like sources, datasets have several GET routes corresponding to:

  • all the datasets the user has access to;
  • datasets belonging to the specified project;
  • a single dataset specified by project and name.

We supply an example of the latter in action.

Update a dataset example

All of the permissible fields used to create a dataset can be updated, with the exception of has_sentiment, which is fixed for a given dataset.

curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<project>/my-dataset' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataset": {
"description": "An updated description."
}
}'

Delete a dataset example

Deleting a dataset will completely remove the associated taxonomy as well as all of the labels which have been applied to its sources. You will no longer be able to get predictions based on this taxonomy and would have to start the training process of labelling verbatims from the beginning in order to reverse this operation, so use it with care.

curl -X DELETE 'https://<my_api_endpoint>/api/v1/datasets/<project>/my-dataset' \
-H "Authorization: Bearer $REINFER_TOKEN"

Get predictions from a pinned model example

curl -X POST 'https://<my_api_endpoint>/api/v1/datasets/<project>/<dataset>/labellers/<model_version>/predict' \
-H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{
"messages": [
{
"body": {
"text": "Hi Bob, has my trade settled yet? Thanks, Alice"
},
"from": "alice@company.com",
"sent_at": "2011-12-11T11:02:03.000000+00:00",
"subject": {
"text": "Trade Ref: 8726387 Settlement"
},
"to": [
"bob@organisation.org"
]
}
],
"user_properties": {
"number:Deal Value": 12000,
"string:City": "London"
}
},
{
"messages": [
{
"body": {
"text": "All, just to let you know that processing is running late today. Regards, Bob"
},
"from": "bob@organisation.org",
"sent_at": "2011-12-12T10:04:30.000000+00:00",
"subject": {
"text": "Trade Processing Delay"
},
"to": [
"alice@company.com",
"carol@company.com"
]
}
],
"user_properties": {
"number:Deal Value": 4.9,
"string:City": "Luton"
}
}
],
"labels": [
{
"name": [
"Trade",
"Settlement"
],
"threshold": 0.8
},
{
"name": [
"Delay"
],
"threshold": 0.75
}
],
"threshold": 0
}'

Once you have a trained model, you can now use this model to predict labels against other pieces of data. To do this you simply need to provide the following:

  1. Documents: This is an array of message data that the model will predict labels for and each message object can only contain one verbatim along with any optional properties. For optimal model performance, the data provided needs to be consistent with the data and format that was labelled on the platform, as the model takes all available data and metadata into consideration. E.g emails should include subject, from/bcc/cc fields, etc (if these were present in the training data). Additionally, user properties in the training dataset should also be included in the API request body.
  2. Labels: This is an array of the model trained labels that you want the model to predict in the data provided. Additionally, for each label a confidence threshold to filter labels by should be provided. The optimal threshold can be decided based on your precision vs recall trade off. Further information regarding how to choose a threshold can be found in the user guide, under the "Using Validation" section.
  3. Default threshold (optional): This is a default threshold value that will be applied across all labels provided. Please note, if default and per-label thresholds are provided together in a request, the per-label thresholds will override the default threshold. As best practice, default thresholds can be used for testing or exploring data. For optimal results when using predictions for automated decision making, it is highly recommended to use per-label thresholds.

Note: A hierarchical label will be formatted as a list of labels. For instance, the label "Trade > Settlements" will have the format ["Trade", "Settlements"] in the request.

Within the API URL it is important to pass in the following arguments:

  1. Project name: This is an existing project you are a part of.
  2. Dataset name: This is a dataset the model has been trained on.
  3. Model version: The model version is a number that can be found on the "Models" page for your chosen dataset.

Understanding the Response

Because a specific model version is being used, the response to the same request will always return the same results even if the model is being trained further. Once you have validated the results of the new model and would like to submit a request against the new model, you should update the model version in your request. Additionally, you should also update the label thresholds to fit the new model. For every new model you will have to iterate through the steps again.

By default, the response will always provide a list of predicted labels for each verbatim with a confidence greater than the threshold levels provided.

However, the response of a request can vary if entity recognition and sentiments are enabled for your model:

  1. Entities Enabled. The response will also provide a list of entities that have been identified for each label (first response example)
  2. Sentiments Enabled. The response will also provide a sentiment score between -1 (perfectly negative) and 1 (perfectly positive) to every label object classified above the confidence threshold. (second response example)
{
"model": { "time": "2018-12-20T15:05:43.906000Z", "version": "1" },
"predictions": [
[
{
"name": ["Trade", "Settlement"],
"probability": 0.86687008142471313,
"sentiment": 0.8762539502232571
}
],
[
{
"name": ["Delay"],
"probability": 0.26687008142471313,
"sentiment": 0.8762539502232571
}
]
],
"status": "ok"
}