The original design for authentication for the Rubin Science Platform leaked cookies and user tokens to backend services. This undermines isolation between services, which could become relevant if a service is compromised. This document proposes several possible alternative designs, including one that uses separate hostnames for each Rubin Science Platform service, and discusses the complexity and effort trade-offs.
The Rubin Science Platform is, for the purposes of this document, a set of web services used by both web browsers and non-browser clients.
Browser clients authenticate with an encrypted session cookie.
Other clients authenticate with an
Authorization header containing either a bearer token or an HTTP Basic Authentication string.
Authentication in a browser is done via either OpenID Connect or OAuth 2 to an external authentication provider.
Successful authentication then sets a session cookie in the browser, which is used to authenticate subsequent requests until that cookie expires.
Rather than asking each application to verify that authentication cookie, the authentication verification is provided by a central service.
That service, Gafaelfawr, can be invoked in one of two ways: using OpenID Connect if the protected application supports it natively, or by using an NGINX
auth_request handler and configuration on the ingress of the application.
auth_request handler approach is preferred for most Science Platform services.
auth_request handler is used, the NGINX ingress for the Rubin Science Platform instance makes a subrequest to Gafaelfawr that includes the headers of the original browser request to the service URL.
Gafaelfawr then locates the cookie, decrypts it, verifies the authentication credentials in that cookie, and (if successful) returns the results of that authentication verification in reply headers.
NGINX then can be configured to include those reply headers as request headers in the proxied request to the protected application, which can then extract authentication information from those trusted headers.
Currently, a given deployment of the Rubin Science Platform uses a single hostname for all components.
Different services are mounted on different routes under that hostname.
For example, for a Rubin Science Platform deployment at
https://data.lsst.cloud, the Notebook Aspect is at
https://data.lsst.cloud/nb, the Portal Aspect is at
https://data.lsst.cloud/portal, and so forth.
(DMTN-076, not yet published, will propose a longer-term URL scheme for the Science Platform components.)
In general, all services running on the Rubin Science Platform are trusted. In some cases, such as the Notebook Aspect, the running notebook is always given an authentication token with most of the permissions as the user’s session cookie. However, ideally, services should be as isolated from each other as is feasible, and should only be able to make the calls to other services that are explicitly permitted by authorization policies, following a principle of least privilege.
3 Problem statement¶
auth_request handler approach supplements the request headers but does not remove headers.
Specifically, it does not remove any cookies the browser sends (nor can it drop all cookies, since protected applications may use their own cookies).
Therefore, the authentication cookie used by Gafaelfawr to verify the user’s authentication is also sent to the protected service in the HTTP headers.
The authentication cookie itself is encrypted with a key known only to Gafaelfawr, so no other component can extract the underlying authentication token and use it in a different context.
However, the entire encrypted cookie acts as a bearer token and can itself be used to authenticate requests.
That cookie is scoped to the hostname of the Rubin Science Platform deployment.
Therefore, any service with a registered HTTP ingress in the Rubin Science Platform, whether or not it is protected by an
auth_request handler and including services that instead use OpenID Connect, receives a copy of the authentication cookie used by Gafaelfawr.
If that service is compromised, the attacker can obtain that cookie from the incoming request and use it to make browser requests to other services in the same Science Platform deployment with the credentials of the user.
This includes requests to the Gafaelfawr authentication service itself to, for instance, create new, persistent authentication tokens for that user that would be under the control of the attacker.
JupyterHub recommends enabling subdomains so that each user’s notebook is hosted in a unique subdomain for exactly this reason.
Note that even if an attacker gains that access, they can only misuse credentials that are sent to the service while they have compromised that service. They cannot make requests as arbitrary users who have not accessed the compromised service. However, a patient attacker could wait until a user with administrative permissions accesses the compromised service and then leverage their credentials to impersonate arbitrary users.
The same problem exists for non-browser authentication using the
That header is also sent to the protected service after it is interpreted by the
4 Alternative designs¶
The following alternative designs would avoid exposing authentication credentials to protected services that could be used to access other protected services.
The best solution from a security standpoint would be to use per-host cookies plus
auth_request header stripping to remove the Gafaelfawr cookie and suppress the
This achieves defense in depth by not leaking authentication crendentials to services that do not need them while also limiting the scope of those credentials.
Using both mechanisms would relieve some pressure on creating separate origins for every service and would make it safer to group some services together on the same origin for the sake of simplicity, as long as at least the authentication system and the notebooks were moved to different origins.
This approach would require reasonably substantial development effort in the authentication system to add the more complex login flow for each origin. This work should be coupled with enabling per-user notebook URLs for JupyterHub.
It’s not clear how important fixing this issue is relative to other security work that we could be doing. The boundaries between services inside the Rubin Science Platform are not that strong, by design. For example, a spawned server in the Notebook Aspect, by design, should be able to make any API call to any other service on behalf of the user except for the authentication service itself. The benefits of isolating the services from each other are only significant if effort is also invested into defining scopes for tokens, setting authorization rules on services, and restricting the scopes of internal tokens issued to services. Very little of that work has yet been done. Protecting the external attack surface and basic authentication flow of the Rubin Science Platform is currently a higher priority.
Implementing per-host cookies would let us choose the granularity of security domain that we want. For example, we could group all the core Rubin-written services other than the Notebook Aspect and the Portal Aspect on one hostname and put ancillary services on a different hostname, thus gaining protection against an attacker moving between those two security domains (but not within them).
Do nothing for the launch of the Intermediate Data Facility. Live with this problem for now.
Add support for stripping cookies from the
Cookieheader and stripping or replacing the
Authorizationheader to Gafaelfawr. This is relateively simple and already adds a lot of security benefit, although it doesn’t protect against leakage on unauthenticated routes. (This has now been done.)
Prioritize the user registration and external authentication flow and basic Kubernetes security until the risks in those areas are well-understood and reasonably mitigated.
Implement support for the more complex login flow required for per-host service deployment once the user registration and external authentication flow work is complete.
Plan on using more granular hostnames when deploying the Rubin Science Platform on the US Data Access Center. At the least, separate core Rubin Science Platform services from ancillary services that may be less secure or easier to attack.