Native Subscriptions in Federated GraphQL with Cosmo Router

How an event-based approach makes GraphQL Subscriptions better in Federation.

Federated GraphQL is invaluable for enterprises because it creates a single, logical API layer — a federated graph — that connects disparate data sources, serving as a unified view of the organization’s data landscape.

Services can ensure interop, yet still be independent and use tech they’re familiar with thanks to the shared and standardized GraphQL schema, and new functionality/services can be easily integrated into this unified graph without breaking existing systems. TL;DR: a robust, adaptable enterprise architecture that can evolve to meet needs.

What if you could go one step further and bring real-time data to the table, alongside static queries? That’s exactly what GraphQL subscriptions let you do, but they’re non-trivial to implement in such a microservices-orientated, federated architecture, especially in an enterprise environment.

With a Federation V1/V2 compatible router that natively supports subscriptions, like the WunderGraph Cosmo Router, this becomes much easier. More importantly, with Cosmo you get to do it using OSI-compatible open-source software that lets you self host and retain full autonomy over your data.

We’ll take a look at what the Cosmo Router brings to the table re: subscriptions in federated GraphQL; but first, a primer on GraphQL subscriptions.

GraphQL Subscriptions 101

Queries and Mutations in GraphQL are simple enough to understand — they operate on your garden variety request-response cycle via HTTP (TCP) connections.

💡 Note that the choice of a transfer protocol isn’t mandated by the GraphQL spec, HTTP is chosen because it’s just the Thing That Works Everywhere™.

Subscriptions, however, are an entirely different beast — they create a persistent connection (leveraging WebSockets, usually) between the client and server, enabling real-time data updates. Here’s how it works:

  1. The client sends a GraphQL subscription operation to the server, specifying the events it wants to be notified about,
  2. The server receives it and registers the client’s interest in the specified events (creating an asyncIterator linked to the specific event/data source). Think of this as a generator function that yields new values whenever they become available.
  3. When a relevant event occurs on the server (e.g., a new post is added to the database), the associated logic publishes this event to the router (or to a shared event bus, if one is being used).
  4. Remember that asyncIterator? Upon detecting an event (or a pub/sub notification), it triggers an update, pushing the updated data to all subscribed clients through their open WebSocket connections.
  5. The client receives the pushed JSON formatted data and updates its UI or state accordingly.

This works out great for everyone; Clients stay informed about data changes as they happen, eliminating the need for manual polling and enhancing responsiveness, and Servers only send updates that are relevant to subscribed clients, reducing network overhead and improving performance.

However, subscriptions take on a whole new level of complexity in a federated graph.

Subscriptions in Federated GraphQL

Unlike monoliths, a single subscription here involves data from multiple subgraphs. What many organizations do is offload subscriptions to a completely separate, standalone, monolithic GraphQL server that handles all subscription logic, bypassing the gateway for subscription operations (and only those).

At first glance, this makes sense. You decouple subscriptions from the gateway, implementation and management are much simpler, and there’s potential for better scalability and performance.

But this is a headache for a bunch of reasons:

  1. You’re maintaining two different GraphQL schemas. Ensuring consistency between the monolith’s schema and the federated schema is going to be critical.
  2. You’d have to configure your federation router/gateway to forward subscription requests to the monolith server via custom resolvers.
  3. Your subgraph teams now have zero ownership over the subscriptions that are implemented by the subscription-specific GraphQL monolith you’ve introduced into your system.
  4. What about auth, or error handling? You’d have to handle these aspects separately for both the federation router and the monolith server.

It’s much better to use a Federation router/gateway that supports subscriptions natively.

WunderGraph Cosmo Router on GitHub 👉

Let''s talk about the WunderGraph Cosmo router. This is a fully open-source (Apache 2.0 license) Federation V1/V2 compatible router that is incredibly fast, and can be fully self hosted on-prem (for compliance reasons, for example), with a managed cloud option. The Cosmo platform as a whole is a drop-in, fully OSS replacement for Apollo GraphOS that offers subscription support without paywalls. You can read more about it here.

Out-of-the-box support for event-driven subscriptions

The biggest win for Subscriptions using the Cosmo Router is that traditional WebSocket connections are only one option; Cosmo can also use event driven subscriptions (far more efficient, and less problematic) natively.

WebSockets are used because they’re widely supported, but they come with some very significant caveats.

1. WebSockets are always bidirectional.

The obvious one, of course, is that WebSockets are bidirectional while GraphQL Subscriptions are only ever one-directional traffic, so you have an overhead for no reason except for the fact that WebSocket connections are inherently full duplex. (And that might open you up to malicious actors).

2. WebSockets don’t play nice with corporate firewalls.

WebSocket connections have a specific handshake protocol. They first send an HTTP request with an Upgrade header to transform (“upgrade”) the regular HTTP/HTTPS connection to a WebSocket one.

This works perfectly well for most use cases, but in enterprise environments, corporate firewalls and proxies are configured for traditional HTTP/HTTPS traffic — and often misinterpret the Upgrade header in the handshake as invalid and/or malicious and drop the connection entirely.

Also, many firewalls employ deep packet inspection to sniff network traffic for security vulnerabilities, and WebSockets trip the alarm far too often to be reliable in such an environment.

3. WebSockets make your subscriptions stateful.

The federation router initiates a persistent WebSocket connection with each relevant subgraph when a client subscribes to real-time updates, and each subgraph must:

  1. Maintain the state of its WebSocket connections with the router (connection status, active subscriptions, associated client information, etc.)
  2. Track which clients are subscribed to which events, and handle subscription lifecycle events (e.g., new subscriptions, cancellations).
  3. Maintain a message buffer for graceful handling of error/edge cases (the router is temporarily unavailable, or there are network issues and messages need to be reordered, etc.)

If a subgraph disconnects or crashes, you’ll need some mechanism to restore all of this data upon restart. Not great for ephemeral/serverless environments!

4. WebSockets necessitate 1+N Connections per client.

When a Client subscribes to a Server event, 1+N connections are established per client — one persistent WebSocket connection from the client to the federation router to receive updates, and the router, in turn, maintains N separate connections, one with each relevant subgraph to fetch real-time data when it becomes available — even if all clients are subscribing to the same event.

Managing multiple open connections per client can strain server resources, especially as the number of clients and subgraphs grows. Also, that extra hop through the router (between client and subgraphs) can introduce latency, especially if those two servers are not colocated geographically.

Event based subscriptions with Cosmo Router

Cosmo provides a better solution — firstly, it supports event driven subscriptions (in addition to the traditional WebSockets approach), via SSE (Server-sent Events). Unlike WebSockets, SSE connections are just your standard HTTP connections which are stateless, making SSE purely a unidirectional server-to-client push — that’s all GraphQL subscriptions would ever need. This alone neatly sidesteps the first three drawbacks.

Secondly, instead of N connections between the router and the subgraphs, there’s only one — subscribing to a central event bus. Your subgraphs publish events to a pub/sub system instead of directly to the router, and the router taps into it for pub/sub notifications, receiving events and forwarding them to clients via SSE.

Not only do you minimize the number of connections, but now the server can push data to clients without maintaining any state information about the connection or clients, and subgraphs don’t have to communicate directly with the router for event updates. This is far cheaper and scales much better, because:

  1. You only need one stateless connection with the router per client, regardless of the number of underlying subgraphs subscribed to. So, a simpler architecture, a lighter server footprint, with fewer connections to manage.
  2. No extra hop through the router to underlying subgraphs, so less latency.
  3. In addition to in-memory pub/sub, you’re free to plug in an external pub/sub like Kafka or NATS.
  4. Serverless deployments are now possible.

As an added advantage — SSE connections are HTTP/2, meaning the browser and server can natively handle multiple concurrent connections (multiplexing) within a single TCP connection, bringing another uplift to performance and latency.

But even if your use-case can’t use HTTP/2, the Cosmo router can still multiplex long-lived connections (that have the same authentication information) to the subgraphs over a single connection, when possible. Cosmo can do this for WebSocket connections, too, but for those, only subscriptions with the same forwarded header names and values will be grouped into a single connection.

💡 Remember that since the Cosmo Router is built to be stateless (doesn’t store any session-specific data between requests; if one hosted instance of the Router fails, another can seamlessly take over because there’s no session state that needs to be preserved) whenever your Cosmo Router instance updates its config at runtime, it will terminate all active subscriptions. Make sure your clients account for this (if using HTTP/2, this is easier as reconnects should be built-in).

Finally, the Cosmo Router makes it easy to forward specific client headers in subscriptions through the router to the subgraphs, using the proxy capabilities for the router. This may be because you need to pass contextual information like caching strategies, auth tokens, user preferences, or just device-specific information so your subgraphs can make decisions based on some client context. You can read up on that here.

Does this mean SSE is strictly better than WebSockets for Subscriptions?

Not necessarily. Not all enterprise use-cases, even if they use federated GraphQL, serve big websites that get a lot of requests per second, have to support tens of thousands of concurrent users for real-time updates, or are big enough to be targets of malicious agents. For these use cases, a standard, run-of-the-mill, stateful WebSocket based approach is perfectly fine — preferable, even, because SSE can be complex to implement.

But that’s why the Cosmo Router supports three protocols so you can stay flexible for your real-time data needs:

Using the Cosmo Router together with the larger Cosmo platform, you can create, publish, and manage your subgraphs — and individually configure the choice of protocol between both the client and the router, as well as those between the router and the individual subgraphs.

In summary…

Federation makes organizations better by providing a comprehensive view of the organization’s data, so getting accurate insights, identifying trends, or making informed decisions is much easier.

With GraphQL subscriptions in the mix, instead of static queries, data is pushed to clients as changes occur, keeping everyone in sync with data across the enterprise’s data landscape. Any relevant data change in one service can trigger updates in other services, regardless of their location or technology stack. If they need it, teams can easily subscribe to events from other services, enabling cross-functional collaboration, and react to events as they happen to make data-driven decisions faster.

The Cosmo Router and the WunderGraph Cosmo platform not only make real-time use cases easier to implement, with more efficiency and compatibility for enterprise environments, but do so with a stack that is truly open-source, and completely self-hostable, ensuring full data autonomy for organizations.

Continue Learning

Discover more articles on similar topics