Optimizing traffic routing in UiPath Automation Suite with Envoy and WASM

In this blog post, we'll dive into the routing infrastructure of UiPath Automation Suite, uncovering key insights and innovative approaches to drive performance.

Traffic routing inside UiPath Automation Suite

UiPath Automation Suite enables you to deploy the full UiPath Business Automation Platform — composed of 40+ micro-services in the Kubernetes cluster on top of Linux. It has an API gateway sitting in front of all the micro-services with customized routing business logic, such as service discovery, request interception, request manipulation, and error handling, etc.

a diagram

We use Istio and Envoy this popular combination as the routing infra and the routing business logic is achieved via Lua scripts within Envoy filters. This solution works well except for the following pain points:

  1. No direct Redis connectivity: our routing layer heavily uses Redis to cache the business data needed to complete the request routing, such as service metadata and tenant-to-backend mapping, etc. However, the Envoy filter doesn't support connecting to Redis natively. As a workaround, we let the routing layer to through a backend service (Routing Service in the diagram above) to retrieve the needed data, which connects to Redis instead. The issue is the backend (in .Net Core) consumes unnecessary resources under load and adds unnecessary latency overhead.

  2. Hard to extend, test, and maintain: using Lua scripts within YAML files presents challenges regarding extension, maintenance, and testing.

Goal

Our primary objective is to minimize the resource consumption of the complete routing layer, which will enhance its performance and increase its extensibility.

Solution

The team has assessed various solutions. However, each of them presents notable drawbacks, leading us to decide against pursuing further with any of these options.

a table mentioning the proposal, reason and drawbacks

Ultimately, we devised the notion of employing the WebAssembly (WASM) extension within Envoy, capitalizing on its shared data feature to achieve an in-memory caching strategy. WASM provides the advantage of enabling us to flexibly select programming languages. This facilitates easy development and testing while maintaining similar performance characteristics to Lua. Of greater significance, the usage of WASM in Envoy (proxy-wasm) introduces shared data (Key Value Store) that's accessible for every HTTP stream. This capability lends essential support to an in-memory caching strategy, with the advantage of significantly reducing CPU usage for the backend routing service.

Implementation details

No significant architectural modifications have been made within the routing layer. The transition primarily involves substituting the Lua-based Envoy filters with the WASM plugin. Considering the nature of cached routing data, we've recognized that these data experience infrequent changes and that outdated data won't substantially impact the customer's primary workflow. Consequently, we've opted to streamline the cache invalidation strategy by using time-based (TTL) expiration.

The WASM plugin initially queries the routing data from the in-memory cache. If a cache hit occurs, the plugin then modifies the original request's path and headers appropriately to route it toward the backend service. If a cache miss occurs, it queries the routing data from the routing service, loads the routing data to the cache, and eventually modifies the original request to route it.

Result

Through collaborative efforts with the proxy-WASM community, we have successfully addressed memory leakage issues stemming from the Garbage Collection of TinyGo. These improvements have been incorporated into the most recent release.

Resource utilization improvement

Presented below is the resource utilization data within an environment encompassing 100K UiPath robots.

A clear decline in CPU usage is noticeable in the routing services, along with a minor reduction in CPU usage within the Istio ingress gateway controller. This tradeoff is accompanied by an increase in memory consumption within the Istio ingress gateway controller. However, this rise is anticipated and remains cost-effective in relation to CPU resource considerations.

Testability improvement

Previously, our test coverage was limited to the helper functions in Lua, as it wasn't feasible to test the Lua script within the Envoy filter YAML. However, thanks to the proxytest framework offered by the proxy-wasm-go-sdk, we can now achieve comprehensive unit test coverage by utilizing the Envoy host simulator.

Latency

After conducting performance tests, we observed that the latency for each request remains consistent and unchanged.

What's next

Looking ahead, our objective is to establish a unified routing layer for both the UiPath Automation Suite and the UiPath Automation Cloud through the usage of WASM. This strategic approach aims to enhance performance and alleviate the burden on engineers by removing the need to maintain two separate sets of solutions and code. Additionally, we intend to implement a proactive cache invalidation mechanism to address the challenge of dirty data in specific edge cases. This proactive approach is geared towards enhancing user experiences and providing smoother interactions.

Mengjia Liang and Gong Zhang

Senior Software Engineer and Principal Engineering Manager, UiPath