Federated GraphQL has been an invaluable tool for enterprise systems because it offers a scalable, decentralized architecture that accommodates evolving data requirements across distributed teams ā and their diverse microservices. You have your independent services, merge them into a unified schema with a single endpoint, and now all of your clients can access exactly the data they want, as if it were all coming from a single GraphQL API.
Instead of having a single, monolithic GraphQL server that eventually becomes difficult to scale, youāve divided your schema and functionality across multiple services.
But the agility, collaboration, and adaptability afforded by such an architecture would hold little value if you didnātĀ alsoĀ have crucial metrics that let you optimize your data fetching strategies, minimize downtime by patching issues fast, allocate your resources efficiently, and ā in general ā make informed decisions in the context of your system.
Field-level metrics in federated GraphQL are precisely this detailed insight.
To demonstrate field usage metrics in Federation, Iāll be usingĀ WunderGraph CosmoĀ ā a fully open source, fully self-hostable platform for Federation V1/V2 that is a drop in replacement for Apollo GraphOS.
Field Usage Metrics 101
Whatās a āfieldā in GraphQL, anyway?
AĀ fieldĀ is just an atomic unit of information that can be queried using GraphQL. Suppose we have these two very simple subgraphs ā Users, and Posts:
Posts subgraph
type Post @key(fields: "id") {
id: ID!
content: String
authorId: ID!
}
Users Subgraph
type User @key(fields: "id") {
id: ID!
name: String!
}
type Post @key(fields: "id") {
id: ID!
authorId: ID! @external
author: User! @requires(fields: "authorId")
}
From these two graphs, we can tell that Users have idās and names, Posts have idās, content, and authorIdās ā and the shape of each specific data is represented by their respective fields (name is a simple, built-in GraphQL type ā a String, while the author of a Post is a compound type represented by the UserĀ object type).
The relationship in this example is simple enough ā Each Post has a User who authored it, resolved through the authorId field to uniquely identify a User for each Post.
Letās not go too deep into Federation specific directives here (TL;DR: @key represents the unique identifier for each object type, @external signals that a field is defined in another subgraph and will be resolved externally, via whichever field is presented by the @requires directive ā here, authorId).
So if you wanted to query for all Posts along with their User authors, you would request these fields in your GraphQL query:
query allPosts {
posts {
id
content
author {
name
}
}
}
postsĀ
is a root query object (technically a field) on theĀPostsĀ
subgraph, and contains an array ofĀPostĀ
type objects.id
, andĀcontentĀ
are fields on theĀPostĀ
type.authorĀ
is a field on theĀUserĀ
type. Within the posts query weāre using the relation viaĀauthorIdĀ
to reference aĀUserĀ
from theĀUsersĀ
subgraph
Field-level usage metrics in GraphQL would track how often these specific fields across different subgraphs are requested in queries on the federated graph. And then, for object types like posts, we could get even more fine-grained and look at the usage ofĀ itsĀ individual fields, in turn.
What does all this information get us?
- Weād be able to debug issues faster, because thanks to our metrics weād know exactly which fields were having trouble resolving data.
- Even if there were no immediate fatal errors, specific performance data for each field would still allow us toĀ pinpoint bottlenecks or optimization opportunitiesĀ ā and then we could ship fixes/improvements at different levels: our resolver functions, database queries, or network calls associated with those specific fields.
- Just knowing how many times a specific field has been requested or resolved (taking into account potential caching) within a given timeframe would provide valuable insights into user behavior and needs, help usĀ streamline the schema and reduce infrastructure costs, or just help us make informed decisions about pricing tiers and resource allocation.
- Weād have insight into performance trends ā error rates, latency, etc. ā of specific fields. We could use this toĀ proactively improve scalabilityĀ (ex. a certain field might require ramping up compute power, another might require increased database throughput) based on anticipated increased demand for certain fields, before they ever get bad enough to impact user experience.
- Tracking field-level metrics is crucial for enterprises toĀ ensure compliance withĀ SLAsĀ āmake sure the performance of individual fields meet predefined service-level expectations.
TL;DR: less reactive firefighting, more proactive optimization.Ā Letās show off these metrics for a second.
Field-usage Metrics with WunderGraph Cosmo
Iāll use WunderGraph Cosmo to federate those two subgraphs. Cosmo is an all-in-one platform for GraphQL Federation that comes with composition checks, routing, analytics, and distributed tracing ā all under the Apache 2.0 license, and able to be run entirely on-prem. Itās essentially a drop-in, open-source replacement for Apollo GraphOS, and helpfully offers a one-click migrate option from it.
[š Cosmo on GitHub](https://github.com/wundergraph/cosmo?source=post_page-----7e3aeced65be--------------------------------)
The Cosmo platform comprises of:
- the Studio ā a GUI web interface for managing schemas, users, projects, and metrics/traces,
- the Router ā a Go server that implements Federation V1/V2, routing requests and aggregating responses,
- and the Control Plane ā a layer that houses core Cosmo APIs.
The key to managing your Federation with the Cosmo stack is its CLI tool:Ā wgc. You install it from the NPM registry, and your subsequent workflow would look something like this:
- Create subgraphs from your independently deployed and managed GraphQL services usingĀ
wgc subgraph create
. - Publish the created subgraphs to the Cosmo platform (or more accurately, to its Control Plane) withĀ
wgc subgraph publish
. This makes the subgraphs available for consumption. Note that the Cosmo āplatformā here can be entirely on-prem. - Once you have all your subgraphs created and published, federate them into a unified graph usingĀ
wgc federated-graph create
- Configure and deploy the Cosmo Router to make your federated graph available to be queried at the routing URL you specified. The Router, in addition to being a stateless gateway that intelligently routes client requests to subgraphs that can resolve them, also generates the field usage metrics for our federated Graph as itās being queried.
Then, we run a few queries against our federated graph, and then fire up Studio, our web interface.
Studio contains the Schema Explorer, which is the control room for your federated GraphQL ecosystem. Here, you can view and download schemas of all of your subgraphs and federated graphs, and ā more importantly in our case ā view usage of every single type in your federated ecosystem, from Objects (Users
,Ā Posts
) to the Scalars that theyāre made of (Boolean
,Ā ID
, andĀ String
), and even the root operation types (each query, mutation, and subscription).
This is an incredibly fine-grained look at your system. Want to know exactly how many times theĀ authorĀ
relation (viaĀ authorId
) was actually accessed when querying for one or more Posts? Go right ahead.
The field usage metrics for the author relation here tell you exactly how many clients and operations requested it, along with a histogram for usage. You get to see exactly which operations accessed it, how many times they did so, which subgraphs were involved in resolving requests for this field, and finally, the first and last time the relation was accessed.
What could these metrics tell us, anyway?
The first thing that jumps out right away from these numbers is that in a real world scenario, certain posts willĀ alwaysĀ be more popular than others, but frequent lookups for the same author across multiple posts is redundant, and can and will strain the Users subgraph and its backend. A simple solution could be to implement caching on the User subgraph, and cache author (User) data for the most popular posts, without having to retrieve it every single time.
Since Cosmo lets you filter field usage by client and operation, you might find that your mobile client predominantly accesses the content and author fields, while your analytics dashboard frequently retrieves likes and shares. Now, you can create specialized queries on each client, optimizing for speed and minimizing unnecessary data transfer. Field usage numbers here let you recognize unique requirements of each client type, and their unique field access patterns.
These metrics also show you exactly when a field was accessed over a 7 day retention period (free tier default; extensible), and this is useful in more ways than one: historical usage data, of course, can be used to align caching strategies with predicted future demand, meaning proactive infra scaling (up or down) to avoid bottlenecks during peaks.
But also, the timestamps provide a historical perspective on the adoption and usage patterns of the features each field represents. If youāre not seeing expected usage rate for a certain field/feature, perhaps you need to reassess its relevance to user needs, its value proposition, or even its pricing/monetization strategy.
Simply put, engineers and stakeholders make better decisions on how to evolve the organizationās graphs when they have relevant data to back it up.
In Summaryā¦
Field level metrics ā along with Cosmoās suite of analytics ā ultimately help organizations evolve their federated architecture and deliver a better, more valuable product.
In fact, with deeper insights into how fields are accessed over time, orgs can go beyond just performance optimization to answer questions like: Which fields are consistently accessed together, suggesting potential customization opportunities? Do usage patterns evolve over time? Can we identify underutilized fields for potential streamlining? And these insights inform content strategy, personalization efforts, and even the data model itself.
Of course, the Cosmo platform goes beyond metrics. It includes a blazingly fast V1/V2 compatible Router, visualizes data flow, automates deployments, and integrates seamlessly with your existing infrastructure ā you could even migrate over from Apollo GraphOS with one click if you wanted to. And all of its stack is open-source, and completely self-hostable.