Securing REST APIs in Kubernetes with Istio and Open Policy Agent
As our security product, Cognitive Intelligence, evolves, we decided to ease the scaling challenges with a move towards a more decoupled architecture and more independent teams. What was once a monolithic UI is getting decomposed into services each living in its Bounded Context¹ managed by an independent team. Each service can expose a REST API that can be used by other teams, or directly by customers.
This situation called for a new solution to API authentication and authorization. Using a session managed by the UI monolith that sits in the path of all outside requests was no longer good enough. On top of that, we wanted a solution that would satisfy these requirements:
- Teams don’t need to solve the cross-cutting concern of authentication/authorization, a unified solution is provided. The access is uniform to all services from the perspective of an API client and the API can be agnostic of how the client was authenticated.
- Teams own and manage authorization policies for their services independently on other teams.
- APIs are secured regardless of whether the request originated from the inside or the outside of the cluster (part of defense in depth).
- It is auditable, scalable, easy to test and debug.
Most of our services run in Kubernetes and the solution we chose is based on Istio service mesh² and Open Policy Agent policy engine. And because there wasn’t much guidance on how to put all the pieces together at the time,³ we would like to share the details with you in this post. You can also find an accompanying demo on GitHub.
Overview of the technical solution
This diagram shows a high-level overview of the components involved in the authorization process:
Before the request, the client needs to authenticate to obtain an access token for API requests. Authentication is beyond the scope of this post; many identity providers can be used if you don’t want to implement it yourself, but the rest of the solution is independent of the authentication mechanism. When a request is made:
- The client sends its access token in the
Authorization
header (1). - Istio Envoy proxy intercepts the request when it reaches the service pod. It takes the access token, validates it, and translates it to client identity information that is used later in the authorization decision making process. In our case, the translation is delegated (2) to Access Management Service (you, know, the AM part of IAM; more on the alternatives later). If validation or translation fails, a
401
response is returned immediately. - Client identity information and other details about the request (method, URL, etc.) are passed to an OPA sidecar — this is another Docker container that is automatically injected into your pods, similarly to the Envoy proxy. Each OPA sidecar is configured with an authorization policy specific to the service and responds with an allow or deny decision based on the inputs and the policy (3).
- If the authorization decision is deny, the request never reaches the actual service container and a
403
response is returned immediately. Otherwise, Envoy proxy passes the request to the service container, along with the client identity information (4). - If the service needs to delegate to other services, it passes the access token along in the
Authorization
header again. The request is forwarded by Envoy proxy (5) and the same process repeats.
Technical details
Access Token
JWT-based tokens are popular these days. If you exchange access tokens in an OAuth 2.0 flow, it’s quite possible your access token is a signed JWT with scopes expressing what the client is allowed to do. Another option are ID tokens issued by your OpenID Connect provider which contain identity information.
The upside of these is that their validation can be stateless in the case when you don’t need an immediate revocation. Istio provides support for ID tokens out of the box. If you opt for them, you can drop Access Management Service from the picture above and avoid any extra network round-trips for each request.
The downside of JWTs is that their non-opaque nature presents an extra attack surface and you need to be careful with validation (and I’ve seen people get it wrong). Our security guy likes to say that safety by design is better than safety by mitigation. Delegating to Access Management Service allowed us to use opaque random strings as tokens and gave us control over the token interpretation in a single place.
Token translation
As we mentioned, you can rely on Istio’s built-in support for ID tokens when using OIDC. Another option is to skip token validation in Envoy, pass the token directly as an input to the OPA sidecar, and use the support for JWT in OPA.
Here, we will show a third way — delegating the token translation to another service. Envoy has a concept of filters that are applied to a request before it is proxied to the destination. We will leverage Envoy’s support for Lua scripting⁴ to configure a filter that delegates validation and translation to Access Management Service.
In Istio, you can configure Envoy proxies with an EnvoyFilter
manifest. Keep in mind that EnvoyFilter
is a low-level tool and can be somewhat fragile — test it with Istio upgrades. Here is a simplified version of the manifest (full example at GitHub):
This manifest inserts the envoy.lua
filter before envoy.ext_authz
authorization filter (which we will configure later), and apply it to all incoming HTTP requests. It also defines the Lua function envoy_on_request
that will be executed for each request: it extracts the Authorization
header, uses it to call access-management:8080/auth/api-clients/me
, and stores the response to metadata associated with the request under the key identity-json
. We will reference this metadata later when we invoke OPA.
An important thing to keep in mind is that the filter is only applied to HTTP traffic. How does Istio recognize HTTP from ordinary TCP traffic? It does so, somewhat obscurely, by looking at the corresponding port name specified in your Service
manifest. I.e., you need to name your Service port http
or http-<suffix>
for this to work.
Authorization decision
We intentionally didn’t want to rely on authorization policies implemented in the code of the individual services. It can be error-prone (difficult to enforce a deny-by-default policy and easy to forget about something), and we wanted to make auditing and testing with a heterogeneous stack easy. One option is delegating authorization decisions to a centralized service. That would make an independent evolution of services and their policies more complicated and introduce an extra network round-trip. In the end, Open Policy Agent seemed like the best solution — no imperative coding required, authorization policies can be expressed in a declarative DSL and deployed independently for each service, and OPA runs locally in a pod sidecar, avoiding extra network hops.
OPA is a general-purpose policy engine that helps decouple policy decision-making from policy enforcement. It produces policy decisions (either allow/deny, or more complex answers) by evaluating queries against a given policy (expressed in the declarative Rego language), input data (arbitrary JSON), and possibly other external data. You can try it out in its online playground.
Envoy proxy supports external authorization with gRPC or HTTP services. Opa-istio-plugin extends OPA with such an interface. It also provides configuration for a Kubernetes admission controller that is responsible for injecting OPA sidecars to all pods⁵. We used a slightly modified configuration of the controller so as to configure the OPA policy for each pod individually with opa-policy-config-map-name
pod label. The label determines which ConfigMap
with OPA policy to mount to the sidecar. An alternative to ConfigMaps is to distribute policies as OPA bundles but ConfigMaps seemed more convenient for us.
The whole configuration is a bit verbose so we won’t list it here. Let us at least show an example of an authorization policy (full example):
package istio.authz
# Parse the identity data
import input.attributes.metadata_context as metadata
api_client := json.unmarshal(metadata["filter_metadata"]["envoy.lua"]["fields"]["identity-json"]["Kind"]["StringValue"])
# Example of a custom rule
default can_write = false
can_write { not api_client.identity.attributes.isReadOnly }
# Deny by default
default allow = {"allowed": false, "reason": "deny-by-default"}
# Whitelisting allowed requests
allow = true {
input.attributes.request.http.path = "/api/status"
}
else = true {
# This branch will succeed only if all expressions are satisfied
# e.g., the request path must pattern-match
# /api/users/<identityId>/preferences
input.parsed_path = ["api", "users", api_client.identity.identityId, "preferences"]
input.attributes.request.http.method == "PUT"
can_write
}
This policy parses the identity JSON from step (2) (it is a bit complicated due to the way Lua scripting in Envoy works). The ultimate result of the authorization query is the value assigned to the allow
output. The default is to deny the request with a given reason (useful for debugging). If the request is PUT /api/users/<identityId>/preferences
and the <identityId>
part matches the access token, the last branch will succeed and the request will be allowed.
As you can see, we have both the identity information for the (validated) access token and all the HTTP request information (including headers and body) available as inputs. Decoupling policy decision-making with OPA assumes that the request contains enough information to make the decision, e.g., an ID of the resource owner is included in the URI for comparison with token contents. If this is not the case, external data can be supplied to OPA, or HTTP requests can be made (although this is not recommended for performance reasons).
We also configured OPA to mount additional policies by default (OPA policies are additive), e.g., this one to suppress sensitive values in logs:
package system.log# Exclude sensitive values from decision logs
mask["/input/attributes/request/http/headers"]
mask["/input/attributes/request/http/body"]
mask["/input/parsed_body"]
Reaching the destination service
If the authorization query evaluates to a deny decision, the request never reaches the target service and Envoy proxy responds with a 403
directly. The response can be customized with a combination of Envoy configuration and OPA response:
default allow = {
"allowed": false,
"headers": {"x-ext-auth-allow": "no"},
"body": "Unauthorized Request",
"http_status": 301
}
The same mechanism can also be used to pass information from OPA to the destination service in request headers.
Because all services in our cluster are secured with the same mechanism, the service needs to add the access token to the Authorization
header of all requests it makes to other services. This is done directly in the code of each service, which may be cumbersome at times.⁶
Debugging, Testing, Security
Since we protected our API endpoints even within the cluster, does that mean we need to generate an access token for any debugging requests by developers? No, we don’t — requests to ports forwarded with kubectl port-forward
will bypass Istio and go directly to the target service.⁷
We may also want to test that the authorization layer and policies are working as expected. Since solutions like this one depend a lot on the exact configuration of the cluster, it is a good idea to test directly in the live environment. You can’t do that directly by making requests to forwarded ports in your test, as we saw above, you need to be within the Istio service mesh. We’re using the Squid proxy for this purpose in a configuration similar to docker-squid, with dns_defnames
turned on so that DNS resolution works.
Another security consideration is the traffic encryption within the cluster. If you are not using other means of encryption already, you can configure Istio to use mTLS for all requests within the service mesh (this is the default since Istio 1.5).
Lastly, we should mention that some of the features this solution depends on may still be in beta or alpha versions (e.g. EnvoyFilter
) and should be carefully monitored across Istio proxy version upgrades.
Conclusion
Our aim was to show how Istio and OPA can be configured to provide an authorization layer for REST API calls that is scalable and allows independent management of authorization policies.
There are many alternatives to the decisions we needed to make along the way. Even if your use case is different, we hope you will find the examples at GitHub useful as building blocks, and we would be glad to hear back from you about it.
¹ The boundaries corresponding to different subdomains of the security domain that we deal with.
² If you are not familiar with Istio, it will help to learn more about its architecture. The gist of it: to add service mesh features to network requests within the cluster, Istio injects Envoy proxy sidecars to your pods; all incoming and outgoing pod traffic then goes through this proxy. The proxy can do things like routing, rate limiting, mTLS, authorization, etc.
³For instance, we learned the hard way that the OPA Adapter shipped with Istio is not production-ready. Is has since been deprecated, but this wasn’t known at the time.
⁴WebAssembly plugins may be a better solution in the future versions of Istio.
⁵If you are wondering: this is the same mechanism that Istio uses to inject its sidecars. The basic idea is that a MutatingWebhookConfiguration
object instructs Kubernetes to invoke an HTTP webhook before objects, such as pods, are created. The webhook can modify the created object, for example, inject containers in a pod.
⁶This is a problem similar to the propagation of request trace IDs, but extra care is necessary here because of its impact on security.
⁷For the curious: requests to a forwarded port appear in the pod as if they came from localhost. Istio internally uses iptables rules to intercept and route requests to its sidecar, but this does not apply to local requests, including port-forwarded ones.