Self-Hosted EWS Integration
The EWS appliance is delivered as a Docker image. The sections below explain how to configure and deploy the appliance.
Configuration
The appliance expects a JSON config file to be present. This section explains the contents of the file. Refer to the Deployment section for instructions on how to make the config file available to the appliance.
To start, copy the example below.
{
"host": "https://exchange-server.example.com",
"port": 443,
"auth_type": "ntlm",
"auth_user": "reinfer-ews-service-user",
"access_type": "delegate",
"mailboxes": {
"abc@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name"
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
},
"xyz@example.com": {
"bucket": {
"owner": "project-name",
"name": "bucket-name"
},
"start_from": "bucket",
"start_timestamp": "2020-01-01T00:00:00+00:00"
}
}
}
First, replace the dummy values in host
, port
, auth_user
, and
access_type
with their real values. See the
configuration reference for a description of these
parameters and their allowed values.
The only thing missing now for the appliance to connect to the Exchange server
is the password. Instead of storing the password in the config file in plain
text, it should be provided to the appliance as a REINFER_EWS_AUTH_PASS
environment variable - this will be described in the Deployment
section. The full list of environment variables that you can set to override
values in the config is:
Name | Description |
---|---|
REINFER_EWS_AUTH_USER | Exchange server user |
REINFER_EWS_AUTH_PASS | Exchange server password |
REINFER_EWS_ACCESS_TYPE | Access type: "delegate" or "impersonation" |
REINFER_EWS_HOST | Exchange server host |
REINFER_EWS_PORT | Exchange server port |
Finally, replace the dummy values in mailboxes
with their real values. You can
specify one or more mailboxes. For each mailbox, you have to provide the mailbox
address and specify the following parameters:
Name | Description |
---|---|
bucket.owner | Project of the bucket in which the mailbox should be synced. |
bucket.name | Name of the bucket in which the mailbox should be synced. |
start_from | Whether to start from last synced time ("bucket") or ignore last synced time and always start from start_timestamp ("config"). Should be set to "bucket" for normal operation, but "config" can be useful in some cases when debugging. |
start_timestamp | Timestamp from which to start syncing email. If not set, all emails will be synced. |
The configuration uses the default values for a number of settings such as polling frequency or batch size. To customize your configuration further, refer to the configuration reference.
The Exchange intergration syncs raw email data into Re:infer buckets. Same as other Re:infer resources, a bucket is created in a project which allows you to control access to the bucket. In order to read from a bucket, upload to a bucket, or manage buckets, the user needs the respective permissions in the project the bucket is in.
Deployment
You can deploy the EWS appliance either with Kubernetes or with Docker.
Deploying with Kubernetes allows you to run multiple instances of the EWS appliance, with each instance handling a subset of mailboxes to be synced.
With Kubernetes
Using Kubernetes is a popular way to run and manage containerized applications.
This section shows you how to deploy the EWS appliance using Kubernetes. It
assumes that you have basic familiarity with Kubernetes and have kubectl
installed. Please check
this documentation if you
need help getting started with Kubernetes.
In order to deploy to Kubernetes, you need to create a YAML file describing your application. To start, copy the example below.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: reinfer-ews-appliance
labels:
app: reinfer-ews-appliance
spec:
podManagementPolicy: Parallel
replicas: 1
selector:
matchLabels:
app: reinfer-ews-appliance
serviceName: reinfer-ews-appliance
template:
metadata:
labels:
app: reinfer-ews-appliance
name: reinfer-ews-appliance
spec:
containers:
- args:
- "./run.py"
- "--bind"
- "0.0.0.0:8000"
- "--reinfer-api-endpoint"
- "https://<mydomain>.reinfer.io/api/"
- "--shard-name"
- "$(POD_NAME)"
# This value should match `spec.replicas` above
- "--total-shards"
- "1"
env:
- name: REINFER_EWS_CONFIG
value: "/mnt/config/example_ews_config"
- name: REINFER_API_TOKEN
valueFrom:
secretKeyRef:
key: reinfer-api-token
name: reinfer-credentials
- name: REINFER_EWS_AUTH_PASS
valueFrom:
secretKeyRef:
key: ews-auth-pass
name: reinfer-credentials
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: "your.private.registry.com/reinfer/ews-appliance:TAG"
name: reinfer-ews-appliance
resources:
requests:
cpu: 0.05
memory: 128Mi
volumeMounts:
- mountPath: /mnt/config
name: config-vol
volumes:
- configMap:
name: ews-config
items:
- key: example_ews_config
path: example_ews_config
name: config-vol
Before you can deploy the appliance using this YAML file, there are a few additional steps you need to perform.
First, replace <mydomain>.reinfer.io
with your
tenant API endpoint.
Second, since we would like to avoid storing credentials as cleartext in our
YAML file, the REINFER_TOKEN
and REINFER_EWS_AUTH_PASS
environment variables
are
populated from Kubernetes secrets.
Create the secrets like so:
kubectl create secret generic reinfer-credentials \
--from-literal=reinfer-api-token=<REINFER_TOKEN> \
--from-literal=ews-auth-pass=<MSEXCHANGE_PASSWORD>
Finally, since we would like to load the appliance config from a local file, we need to mount that file into the pod. We do this by storing the data in a Kubernetes ConfigMap and mounting the ConfigMap as a volume. Create the ConfigMap like so:
kubectl create configmap ews-config \
--from-file=example_ews_config=your-ews-config.json
As an alternative to storing the config file locally, you can upload it to Re:infer and let the EWS appliance fetch it via the Re:infer API. This is described here. If both local and remote config files are specified, the appliance will use the local config file.
You can now create your statefulset and check that everything is running:
kubectl apply -f reinfer-ews.yaml
kubectl get sts
With Docker
Alternatively, you can run the EWS appliance in Docker. The command below will start the appliance with the same parameters that are used in the Kubernetes section.
EWS_CONFIG_DIR=
REINFER_API_TOKEN=
MSEXCHANGE_PASSWORD=
TAG=
sudo docker run \
-v $EWS_CONFIG_DIR:/mnt/config \
--env REINFER_EWS_CONFIG=/mnt/config/your_ews_config.json \
--env REINFER_API_TOKEN=$REINFER_API_TOKEN \
--env REINFER_EWS_AUTH_PASS=$MSEXCHANGE_PASSWORD \
eu.gcr.io/reinfer-gcr/ews:$TAG \
./run.py --reinfer-api-endpoint https://<mydomain>.reinfer.io/api/ &> ews_$(date -Iseconds).log
- Replace
<mydomain>.reinfer.io
with your tenant API endpoint. - Replace
your_ews_config.json
by the name of your EWS config JSON file.
The appliance will run continuously syncing emails into the Communications Mining platform. If stopped and started again, it will pick up from the last stored bucket sync state.
With Docker (local storage)
The EWS appliance can save extracted emails locally instead of pushing them into the Communications Mining platform.
EWS_LOCAL_DIR=
MSEXCHANGE_PASSWORD=
CONFIG_OWNER=
CONFIG_KEY=
TAG=
sudo docker run \
-v $EWS_LOCAL_DIR:/mnt/ews \
--env REINFER_EWS_AUTH_PASS=$MSEXCHANGE_PASSWORD \
eu.gcr.io/reinfer-gcr/ews:$TAG \
./run.py --local-files-prefix /mnt/ews \
--remote-config-owner $CONFIG_OWNER --remote-config-key $CONFIG_KEY &> ews_$(date -Iseconds).log
- The appliance expects to find the config in
$EWS_LOCAL_DIR/config/$CONFIG_OWNER/$CONFIG_KEY.json
. You can alternatively provide the path to config by setting the$REINFER_EWS_CONFIG
environment variable. - The appliance will save the sync state to
$EWS_LOCAL_DIR/state
. If stopped and started again, it will pick up from the last stored sync state. - The appliance will save data to
$EWS_LOCAL_DIR/data
.
Store configuration in Re:infer
Instead of providing a local config file to the appliance like you did if you followed the EWS appliance deployment guide, you can instead manage the config file in Re:infer. Note that if both local and remote config files are specified, the appliance will default to using the local config file.
First, upload your JSON config file to Re:infer:
curl -H "Authorization: Bearer $REINFER_TOKEN" \
-H "Content-Type: multipart/form-data" \
-F 'file=@your-ews-config.json' \
-XPUT https://<mydomain>.reinfer.io/api/v1/appliance-configs/<project-name>/<config-name>
To see the current config:
curl -H "Authorization: Bearer $REINFER_TOKEN" \
-XGET https://<mydomain>.reinfer.io/api/v1/appliance-configs/<project-name>/<config-name>
Then, in the kubernetes YAML file, set the --remote-config-owner
parameter to
the project name, and the --remote-config-key
parameter to the config name.
Reference
Application parameters
See the table below for a list of available application parameters. You can learn more about running the EWS appliance here.
Parameter | Description |
---|---|
--reinfer-api-endpoint | Endpoint to connect to the Reinfer API. Mutually exclusive with --local-files-prefix . |
--local-files-prefix | Path to store synced emails and bucket sync state. Mutually exclusive with --reinfer-api-endpoint and REINFER_API_TOKEN . |
--remote-config-owner | Project that owns the remote EWS appliance config file. |
--remote-config-key | Name of the remote EWS appliance config file. |
--debug-level | Debug level. 0 = No debug, 1 = Service debug, 2 = Full debug. Default: 1. |
--shard-name | Shard name i.e. ews-N to extract shard number from. When running in Kubernetes, you can set it to the pod name. |
--total-shards | The total number of instances in the appliance cluster. When running in Kubernetes, must be set to the same value as the number of instances in the StatefulSet. |
--restart-on-unrecoverable-errors | If enabled, unrecoverable failures will result in the entire service being restarted without crashing. |
Configuration parameters
See the table below for a list of available configuration parameters. You can learn more about writing the EWS appliance configuration file here.
Name | Description |
---|---|
host | Exchange server host. Can be overriden by the REINFER_EWS_HOST environment variable. |
port | Exchange server port. Default: 80. Can be overriden by the REINFER_EWS_PORT environment variable. |
auth_type | Only "ntlm" allowed. |
auth_user | Exchange server user. Can be overriden by the REINFER_EWS_AUTH_USER environment variable. |
auth_password | Exchange server password. Can be overriden by the REINFER_EWS_AUTH_PASS environment variable. |
access_type | Access type: "delegate" or "impersonation". Default: "delegate". Can be overriden by the REINFER_EWS_ACCESS_TYPE environment variable. |
ews_ssl_verify | If set to "false", will not verify certificates. Default: "true". |
poll_frequency | How long to wait between batches, in seconds. Default: 15. |
poll_message_sleep | How long to wait between individual emails in a batch, in seconds. Default: 0.1. |
max_concurrent_uploads | Number of concurrent uploads to Re:infer, between 0 and 32. Default: 8. |
emails_per_folder | Max number of emails to fetch from each folder per batch, between 1 and 100,000. Default: 2520. This setting allows the appliance to make progress on all folders evenly in case there is a very large folder. |
reinfer_batch_size | How many emails to fetch per batch, between 1 and 1000. Default: 80. |
mailboxes | List of mailboxes to fetch. See here for an explanation of how to configure the mailboxes. |
audit_email | If you have configured the appliance with a remote config, Re:infer will send an email to this address whenever the config is updated. Default: None. |
ews_ssl_ciphers | Make EWS appliance use specific ciphers. The ciphers should be a string in the OpenSSL cipher list format. Default: None. |