Skip to main content

Exchange Integration

Emails can be synced into Re:infer via the Re:infer Exchange intergration or via the Re:infer API. The Exchange intergration provides a convenient, easy-to-setup way to sync your development and production email data into Re:infer in real time. If you have an existing email extraction pipeline and would like to sync enriched emails into Re:infer instead, we recommend you to take a look at the Re:infer API.

Overview

The Re:infer Exchange integration is deployed and managed by Re:infer. Once deployed, it will continuously poll your Exchange server for new emails and push them into Re:infer, where they are cleaned, enriched (using email metadata), and converted into Re:infer comment objects. Finally, Re:infer applies predictions to the comment objects and extracts entities. The emails and their predictions/entities can then be accessed by users on the Re:infer web platform, and by applications or bots via the Re:infer API.

Exchange Integration Architecture Overview
Exchange Integration Architecture
Compatibility

The Exchange intergration is compatible with Exchange Online, and with Microsoft Exchange 2010-2019 server using Exchange Web Services (EWS).

Security

The Exchange intergration polls the Exchange server by making authenticated GET requests over HTTPS. The Exchange intergration receives data via the GET requests it initiates, and does not accept any inbound connections initiated elsewhere. The Exchange integration can be configured with specific ciphers.

Buckets

The Exchange intergration syncs raw email data into Re:infer buckets. Same as other Re:infer resources, a bucket is created in a project which allows you to control access to the bucket. In order to read from a bucket, upload to a bucket, or manage buckets, the user needs the respective permissions in the project the bucket is in.

Sources

The raw data in a bucket can be made available to Re:infer users by linking the bucket to a source. When the bucket is linked to a source, Re:infer will preprocess the raw emails by generating metadata from email headers and extracting signatures and quoted emails from the email body. The preprocessed emails are then converted into Re:infer comment objects, and are available to users from the Re:infer UI and API as usual. Note that the user does not require Bucket permissions in order to read of modify data in a linked source, but does require Source permissions.

Managed Deployment

In order for the managed Re:infer Exchange integration to access your mailboxes, you should set up a service account. The Exchange intergration requires read (but not write) access to the mailbox.

Exchange Online

To add permissions to your OAuth app, in the permission search bar, search for: 'Office 365 Exchange', then select 'EWS.AccessAsUser.All'. Set up a service user and grant it read access to the mailboxes you want to sync.

Please provide us with the following details:

  • Your account's tenant ID (directory ID)
  • The application's client_id
  • The application's client_secret or client_certificate
  • The service user's username and password
  • List of mailboxes to be synced

Self-hosted Exchange Server

Please set up an NTLM service account and grant the service user with read access to the mailboxes you want to sync.

Please provide us with the following details:

  • Exchange server URL
  • Username and password of the service user
  • Access type (Delegate or Impersonation)
  • List of mailboxes to be synced

Self-hosted Deployment

The EWS appliance is delivered as a Docker image. The sections below explain how to configure and deploy the appliance.

If you have a large number of mailboxes, you may want to run multiple instances of the EWS appliance, with each instance handling a subset of mailboxes. Please refer to the sharding documentation for a detailed explanation.

Appliance Configuration

The appliance expects a JSON config file to be present. This section explains the contents of the file. Refer to the next section for instructions on how to make the config file available to the appliance.

To start, copy the example below.

{
"host": "https://exchange-server.example.com",
"port": 443,
"auth_type": "ntlm",
"auth_user": "reinfer-ews-service-user",
"access_type": "delegate",
"mailboxes": {
"abc@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name"
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
},
"xyz@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name"
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
}
}
}

First, replace the dummy values in host, port, auth_user, and access_type with their real values. See the configuration reference for a description of these parameters and their allowed values.

The only thing missing now for the appliance to connect to the Exchange server is the password. Instead of storing the password in the config file in plain text, we will provide it to the appliance as a REINFER_EWS_AUTH_PASS environment variable - this will be described in the next section. Other environment variables that you can set to override values in the config are:

NameDescription
REINFER_EWS_AUTH_USERExchange server user
REINFER_EWS_AUTH_PASSExchange server password
REINFER_EWS_ACCESS_TYPEAccess type: "delegate" or "impersonation"
REINFER_EWS_HOSTExchange server host
REINFER_EWS_PORTExchange server port

Finally, replace the dummy values in mailboxes with their real values. You can specify one or more mailboxes. For each mailbox, you have to provide the mailbox address and specify the following parameters:

NameDescription
bucket.ownerProject of the bucket in which the mailbox should be synced.
bucket.nameName of the bucket in which the mailbox should be synced.
start_fromWhether to start from last synced time ("bucket") or ignore last synced time and always start from start_timestamp ("config"). Should be set to "bucket" for normal operation, but "config" can be useful in some cases when debugging.
start_timestampTimestamp from which to start syncing email. If not set, all emails will be synced.

The configuration we created uses the default values for a number of settings such as polling frequency or batch size. To customize your configuration further, refer to the configuration reference.

Deploying with Kubernetes

Using Kubernetes is a popular way to run and manage containerized applications. This section shows you how to deploy the EWS appliance using Kubernetes. It assumes that you have basic familiarity with Kubernetes and have kubectl installed. Please check this documentation if you need help getting started with Kubernetes.

In order to deploy to Kubernetes, you need to create a YAML file describing your application. To start, copy the example below.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: reinfer-ews-appliance
labels:
app: reinfer-ews-appliance
spec:
podManagementPolicy: Parallel
replicas: 1
selector:
matchLabels:
app: reinfer-ews-appliance
serviceName: reinfer-ews-appliance
template:
metadata:
labels:
app: reinfer-ews-appliance
name: reinfer-ews-appliance
spec:
containers:
- args:
- "./run.py"
- "--reinfer-api-endpoint"
- "https://<mydomain>.reinfer.io/api/"
env:
- name: REINFER_EWS_CONFIG
value: "/mnt/config/example_ews_config"
- name: REINFER_API_TOKEN
valueFrom:
secretKeyRef:
key: reinfer-api-token
name: reinfer-credentials
- name: REINFER_EWS_AUTH_PASS
valueFrom:
secretKeyRef:
key: ews-auth-pass
name: reinfer-credentials
image: "your.private.registry.com/reinfer/ews-appliance:VERSION"
name: reinfer-ews-appliance
resources:
requests:
cpu: 0.05
memory: 128Mi
volumeMounts:
- mountPath: /mnt/config
name: config-vol
volumes:
- configMap:
name: ews-config
items:
- key: example_ews_config
path: example_ews_config
name: config-vol

Before you can deploy the appliance using this YAML file, there are a few additional steps you need to perform.

First, replace <mydomain>.reinfer.io with your tenant API endpoint.

Second, since we would like to avoid storing credentials as cleartext in our YAML file, the REINFER_TOKEN and REINFER_EWS_AUTH_PASS environment variables are populated from Kubernetes secrets. Create the secrets like so:

kubectl create secret generic reinfer-credentials \
--from-literal=reinfer-api-token=<REINFER_TOKEN> \
--from-literal=ews-auth-pass=<MSEXCHANGE_PASSWORD>

Finally, since we would like to load the appliance config from a local file, we need to mount that file into the pod. We do this by storing the data in a Kubernetes ConfigMap and mounting the ConfigMap as a volume. Create the ConfigMap like so:

kubectl create configmap ews-config \
--from-file=example_ews_config=your-ews-config.json
note

As an alternative to storing the config file locally, you can upload it to Re:infer and let the EWS appliance fetch it via the Re:infer API. This is described here. If both local and remote config files are specified, the appliance will use the local config file.

You can now create your statefulset and check that everything is running:

kubectl apply -f reinfer-ews.yaml
kubectl get sts

Advanced Topics

Configuring multiple shards

This section explains how to run multiple instances of the EWS appliance in order to sync a large number of mailboxes.

Below is a YAML file that should be familiar to you from the deployment guide. It has been modified to include two new command line arguments --shard-name and --num-shards. The modified parts are highlighted.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: reinfer-ews-appliance
labels:
app: reinfer-ews-appliance
spec:
podManagementPolicy: Parallel
replicas: 2
selector:
matchLabels:
app: reinfer-ews-appliance
serviceName: reinfer-ews-appliance
template:
metadata:
labels:
app: reinfer-ews-appliance
name: reinfer-ews-appliance
spec:
containers:
- args:
- "./run.py"
- "--reinfer-api-endpoint"
- "https://<mydomain>.reinfer.io/api/"
- "--shard-name"
- "$(REINFER_SHARD_NAME)"
- "--num-shards"
- "2"
env:
- name: REINFER_SHARD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: REINFER_EWS_CONFIG
value: "/mnt/config/example_ews_config"
- name: REINFER_API_TOKEN
valueFrom:
secretKeyRef:
key: reinfer-api-token
name: reinfer-credentials
- name: REINFER_EWS_AUTH_PASS
valueFrom:
secretKeyRef:
key: ews-auth-pass
name: reinfer-credentials
image: "your.private.registry.com/reinfer/ews-appliance:VERSION"
name: reinfer-ews-appliance
resources:
requests:
cpu: 0.05
memory: 128Mi
volumeMounts:
- mountPath: /mnt/config
name: config-vol
volumes:
- configMap:
name: ews-config
items:
- key: example_ews_config
path: example_ews_config
name: config-vol

Replace the values of --num-shards and replicas with the number of appliances you want to run.

You also need to modify your JSON config file to include a sharding_key for each bucket. These keys should be set to consecutive integers. This ensures that each bucket is consistently handled by the same appliance even if the order of the mailboxes changes in the config file. Note that if two mailboxes are being synced to the same bucket, they must have the same sharding key set inside the bucket.

{
...
"mailboxes": {
"abc@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name",
"sharding_key": 0
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
},
"xyz@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name",
"sharding_key": 1
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
},
...
}
}

You can then deploy the appliance as described in the deployment guide.

Using a remote config file

Instead of providing a local config file to the appliance like you did if you followed the EWS appliance deployment guide, you can instead manage the config file in Re:infer. Note that if both local and remote config files are specified, the appliance will default to using the local config file.

First, upload your JSON config file to Re:infer:

curl -H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: multipart/form-data" \
-F 'file=@your-ews-config.json' \
-XPUT https://<mydomain>.reinfer.io/api/v1/appliance-configs/<project-name>/<config-name>

To see the current config:

curl -H "Authorization: Bearer $REINFER_TOKEN" \
-XGET https://<mydomain>.reinfer.io/api/v1/appliance-configs/<project-name>/<config-name>

Then, in the kubernetes YAML file, set the --remote-config-owner parameter to the project name, and the --remote-config-key parameter to the config name.

Reference

Application parameters

See the table below for a list of available application parameters. You can learn more about running the EWS appliance here.

ParameterDescription
--reinfer-api-endpointEndpoint to connect to the Reinfer API.
--remote-config-ownerProject that owns the remote EWS appliance config file.
--remote-config-keyName of the remote EWS appliance config file.
--debug-levelDebug level. 0 = No debug, 1 = Service debug, 2 = Full debug. Default: 1.
--shard-indexWhich shard this appliance is. 0-based and must be strictly less than --num-shards. Only used if you are running multiple shards.
--shard-nameShard name i.e. ews-N to extract shard number from. At most one of --shard-index or --shard-name can be specified Only used if you are running multiple shards.
--num-shardsThe total number of shards in the appliance cluster. Only used if you are running multiple shards.

Configuration parameters

See the table below for a list of available configuration parameters. You can learn more about writing the EWS appliance configuration file here.

NameDescription
hostExchange server host. Can be overriden by the REINFER_EWS_HOST environment variable.
portExchange server port. Default: 80. Can be overriden by the REINFER_EWS_PORT environment variable.
auth_typeOnly "ntlm" allowed.
auth_userExchange server user. Can be overriden by the REINFER_EWS_AUTH_USER environment variable.
auth_passwordExchange server password. Can be overriden by the REINFER_EWS_AUTH_PASS environment variable.
access_typeAccess type: "delegate" or "impersonation". Default: "delegate". Can be overriden by the REINFER_EWS_ACCESS_TYPE environment variable.
ews_ssl_verifyIf set to "false", will not verify certificates. Default: "true".
poll_frequencyHow long to wait between batches, in seconds. Default: 15.
poll_message_sleepHow long to wait between individual emails in a batch, in seconds. Default: 0.1.
max_concurrent_uploadsNumber of concurrent uploads to Re:infer, between 0 and 32. Default: 8.
emails_per_folderMax number of emails to fetch from each folder per batch, between 1 and 100,000. Default: 2520. This setting allows the appliance to make progress on all folders evenly in case there is a very large folder.
reinfer_batch_sizeHow many emails to fetch per batch, between 1 and 1000. Default: 80.
mailboxesList of mailboxes to fetch. See here for an explanation of how to configure the mailboxes.
audit_emailIf you have configured the appliance with a remote config, Re:infer will send an email to this address whenever the config is updated. Default: None.
ews_ssl_ciphersMake EWS appliance use specific ciphers. The ciphers should be a string in the OpenSSL cipher list format. Default: None.