Exponential backoff

This page explains how to use truncated exponential backoff to ensure that your devices do not generate excessive load.

When devices retry calls without waiting, they can produce a heavy load on the OmniCore servers. OmniCore automatically limits subscriptions that generate excessive load. Even a small fraction of overactive devices can trigger limits that affect all devices in the same OmniCore subscription.

To avoid triggering these limits, you are strongly encouraged to implement truncated exponential backoff with introduced jitter. If you have questions or would like to discuss the specifics of your algorithm, complete this form.

Truncated exponential backoff is a standard error-handling strategy for network applications. In this approach, a client periodically retries a failed request with increasing delays between requests. Clients should use truncated exponential backoff for all requests to OmniCore that return HTTP 5xx and 429 response codes, as well as for disconnections from the MQTT server.

Example algorithm

An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. For example:

Make a request to OmniCore.
If the request fails, wait 1 + random_number_milliseconds seconds and retry the request.
If the request fails, wait 2 + random_number_milliseconds seconds and retry the request.
If the request fails, wait 4 + random_number_milliseconds seconds and retry the request.
And so on, up to a maximum_backoff time.
Continue waiting and retrying up to some maximum number of retries, but do not increase the wait period between retries.

where:

The wait time is min(((2^n)+random_number_milliseconds), maximum_backoff), with n incremented by 1 for each iteration (request).
random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This helps to avoid cases in which many clients are synchronized by some situation and all retry at once, sending requests in synchronized waves. The value of random_number_milliseconds is recalculated after each retry request.
maximum_backoff is typically 32 or 64 seconds. The appropriate value depends on the use case.

The client can continue retrying after it has reached the maximum_backoff time. Retries after this point do not need to continue increasing backoff time. For example, suppose a client uses a maximum_backoff time of 64 seconds. After reaching this value, the client can retry every 64 seconds. At some point, clients should be prevented from retrying indefinitely.

The wait time between retries and the number of retries depend on your use case and network conditions.