The Question GraphQL Hands You
REST answers a question for you by structure. One URL, one
resource, one verb. The interface forces the architecture: if
you want a filtered list of products by collection, you build
GET /collections/:id/products. The URL itself is a
placement decision. The router routes. The handler handles.
Nothing crosses.
GraphQL does not force that. GraphQL gives you a graph and asks you to pick the shape. Every field is a function. Every type is a record. Clients compose their own queries. The protocol does not care where the rule lives. The protocol cares that the rule produces the right answer.
This is the trap. The first time you build a GraphQL API, you write a few resolvers, hook them to a database, and ship. It works. The second time, a buyer asks for a filter, and you add a tiny SQL call inside the resolver. It still works. The third time, a nested field starts hydrating related entities. The fourth time, a repository starts returning unpublished rows and the resolver "remembers" to filter them out. None of these decisions look dangerous. Each one is the smallest possible change that delivers the feature.
Two years later, you have a 4000-line resolver file. A new engineer cannot tell where a published-only rule is enforced. A buyer query that should issue three SQL statements issues fifty. The integration tests pass and the production logs do not. The schema, the original contract, has deformed into a documentation of accidents.
GraphQL did not do this. The absence of an opinion did.
This is an essay about supplying that opinion. There is nothing novel in it. The pattern is older than GraphQL. It is sometimes called "deep modules," sometimes "thin resolvers," sometimes "hexagonal architecture" with a tilt. What matters is not the name. What matters is that the pattern answers one question consistently at every layer of the API: who owns this rule?
The examples below use a generic store. Products, collections, reviews, customers, orders, line items. If you have ever queried a Shopify-shaped data set, you already know the domain. The same patterns apply to any read-heavy API over a relational store: a CMS, a catalog, a directory, a ticketing system, a B2B feed. The data shape changes. The architecture does not.
Deep Modules, Thin Interfaces
A deep module has a small surface and a large interior. The outside looks simple. The inside does real work. A shallow module exposes its complexity to every caller. It looks like a wrapper, because it is one.
In a GraphQL API, the public surface is the SDL. A client types:
query {
products(page: 1, filter: { collectionId: "summer-25" }) {
items {
id
title
price
reviews { rating, body, author { name } }
}
total
hasNext
}
}
That is the whole interface. Forty tokens, a stable shape, a precise return. Behind it, the module is doing argument validation, pagination math, SQL filter construction, count parity, foreign-key normalization, batched related-entity lookups, null coalescing, error mapping. None of that complexity is visible to the buyer. None of it should be.
The temptation, especially in GraphQL, is to give every layer a slightly broader job. A "convenience helper" emerges in a service. A resolver "just adds" a small condition. A repository returns the full row in case someone needs the rest of it. Each of these is a tiny act of generosity. Each one moves a small piece of complexity outward, into the surface, where the next engineer will trip over it.
The discipline is unromantic: every layer answers one kind of question, and only that kind. When you change the rule, you change exactly one layer. The other layers are unchanged because the question they answer has not changed.
Where Each Rule Lives
This is the matrix that decides whether the codebase will be readable in two years.
src/schemas/*.graphql
src/validations/*.ts
src/services/*.ts
src/repositories/*.ts
resolver → ctx.loaders
src/loaders/registry.ts
The matrix is not arbitrary. It follows a single principle:
rules belong where they are easiest to find when you
forget. If a buyer can ask for product.reviews, where do
you look when reviews stop appearing? The SDL tells you the
field exists. The resolver tells you it uses a loader. The
loader tells you which repository call backs it. The repository
tells you the SQL that runs. You can debug this without grep.
Notice what is missing. There is no "controller." There is no "manager." There is no "facade." Each of those names is an invitation to put behavior in a place where the next engineer will not think to look. A name that does not answer a question does not belong in the architecture.
Acquire the Root, Reuse the Parent
Most GraphQL writing focuses on a single request path: HTTP enters, a resolver runs, a response leaves. That is one third of the picture. A query that traverses any depth has three flows, not one, and treating them the same is the source of the worst performance problems in GraphQL.
Take the query from Part II: a page of products, each with its reviews, each review with its author. The naive mental model is "one resolver per field, just call the database from each." That mental model works for a flat query of one product. It collapses for a list.
The flows are these:
- Acquire the root. The query resolver calls a service. The service validates args, clamps the page, calls one repository. The repository returns parent rows and a total count. One SQL statement, regardless of how many children the client asks for.
-
Expand the children. For each parent row,
the field resolvers fire. Each one needs related data. This
is where naive GraphQL dies:
reviews: (parent) => fetchReviewsFor(parent.id)issues N queries for a page of N products. Instead, the field resolver calls a per-request DataLoader. DataLoader collects all the IDs in the current tick of the event loop, fires one set query, and returns rows grouped by parent ID. - Reuse the parent IDs. Each child becomes a parent for the next level. Review authors are loaded by author ID. The author loader batches across all reviews from all products. One query for all authors, regardless of how many reviews referenced them.
Conquer the root. Reuse the parent.
"Which products are on this page?"
One question. One call.
"For these 25 IDs, fetch reviews."
Many fields. One batch each.
Two phases. Two ownership boundaries. The resolver is the wire between them.
The architectural mantra is two sentences. Conquer the root, reuse the parent. Services own root acquisition. Loaders own child expansion. Resolvers are the wiring between them, and resolvers do almost nothing.
Why does this split matter? Because the two phases have different cost models. Acquisition is one SQL statement no matter what. Expansion is one SQL statement per relationship type, no matter how many parents. A page of 100 products with reviews, collections, and authors costs four queries with this design and four hundred without it. The difference is not a micro-optimization. It is the difference between a system that survives a popular query and one that hangs.
There is a corollary the architecture quietly enforces: services never know about loaders, and loaders never call services. The service is allowed to call one repository directly because it owns one use case. A field resolver is allowed to call a loader because it owns one relationship edge. These two privileges do not compose. If a service starts using a loader, you have hidden a batched call behind a use case, and the next engineer will not know it batches. If a loader starts calling a service, you have invited recursive expansion, and the next engineer will not know the call stack depth.
Each layer knows about exactly one neighbor. No more.
The Schema Is Not Your Database
This is the most common failure mode in API design, and it gets worse every year as tools that auto-generate types from database schemas get better. The story usually goes: introspect the database, generate types, expose them as GraphQL. It is fast. It is also wrong.
The contract you ship is shaped like your storage. Buyers see
your join tables. They see soft-delete columns. They see audit
fields. They see the difference between price and
price_cents because the codegen surfaced both. They
see your migration history through column names that hint at
past refactors. When you change your schema, you break their
queries. When you cannot change your schema because their
queries depend on its shape, your storage is now their problem.
A well-designed GraphQL contract is a translation layer. Inside,
you might have a CMS-managed Postgres with implementation noise:
relationship tables suffixed with _rels, status
columns named _status, foreign keys that allow null
because Postgres requires it, numeric strings because the ORM
returns them. Outside, buyers see Product, Collection, Review,
Author. The translation happens in two places: the repository
(row mapping) and the SDL (field shape).
type ProductsCollectionsRels {
collection_id: ID!
product_id: ID!
_status: String!
}
type Product {
collections: [Collection!]!
}
price_cents: String
currency_code: String
discount_applied_flag: Int
price: Money!
# { amount: Float!, currency: String! }
products: [Product]
products_count: String
products_page: Int
products_per_page: Int
products: ProductPage!
# consistent envelope, used everywhere
Two rules make this stick. The first is that
nullability is honest. If a foreign key can be
null in the database and that nullability has product meaning (a
product with no manufacturer is genuinely a product without a
manufacturer), the field is nullable in the SDL. If it cannot be
null (a product always has a price), the field is non-null.
Buyers learn the meaning of null in your API by
reading the schema, not by trial and error.
The second is that pagination is consistent.
Every list returns the same envelope. items,
page, pageSize, total,
totalPages, hasNext,
hasPrev. Buyers learn the shape once and use it
everywhere. Do not invent a one-off envelope for a single module
because "this list is special." This list is not special. Buyers
do not want six pagination dialects.
This sounds like style. It is not style. It is a contract about cognitive load. The buyer's mental budget is small, and you are competing with their actual job for it. Every inconsistency you ship spends some of that budget.
SQL Owns the Truth
The repository is where SQL lives. It is also where visibility lives. These two facts are inseparable, and treating them as separable is the source of most data leaks.
If a product is draft, it should never reach a
resolver. If a review is unmoderated, it should never reach a
resolver. The temptation is to fetch broadly and filter in
JavaScript: pull the rows, then apply
.filter(r => r.status === 'active'). This is
wrong for two reasons. The shallow reason is that it scales
badly. You fetch rows you will not return. The deep reason is
that it leaks invariants. Any caller can forget the filter. The
next caller usually does.
The discipline: visibility is enforced in the WHERE clause. The repository is the only place where data leaves the database, so it is the only place that can guarantee what leaves. There is no opt-out. There is no "internal" mode that skips the filter. The filter is part of the function.
const PUBLISHED = eq(products.status, 'active');
const buildFilterSql = (filter: ProductsFilter) => {
const clauses = [PUBLISHED]; // always. no exceptions.
if (filter.titleContains) {
clauses.push(ilike(products.title, `%${filter.titleContains}%`));
}
if (filter.collectionId) {
clauses.push(sql`EXISTS (
SELECT 1 FROM product_collections pc
WHERE pc.product_id = products.id
AND pc.collection_id = ${filter.collectionId}
AND pc.status = 'active'
)`);
}
return and(...clauses);
};
// list and count share one predicate — no drift, no lying totals.
const where = buildFilterSql(filter);
const [items, [{ count }]] = await Promise.all([
db.select(...).from(products).where(where).orderBy(...).limit(limit).offset(offset),
db.select({ count: sql`COUNT(*)::int` }).from(products).where(where),
]);
A few things in that snippet are worth naming explicitly, because they are easy to skip past:
- The predicate is built once. The list query and the count query share it. If they diverge, the page math lies: a buyer asks for page 3, gets an empty result, and the total says there are 47 items. Build the predicate once. Pass it to both queries.
-
Relationship state matters. The collection
join also checks
pc.status = 'active'. Filtering published products is not enough if a buyer can still see them through an inactive link. Visibility is transitive. -
Ordering is stable. Pagination needs a
deterministic order, which means a business field plus a
tiebreaker.
ORDER BY featured_at DESC NULLS LAST, id ASC. Without the tiebreaker, two rows with the samefeatured_atcan appear on different pages or the same page on different requests. -
Counts cast explicitly. Postgres returns
COUNT(*)as a bigint, which most drivers serialize as a string. Cast it. Convert it once at the boundary. The rest of the codebase sees a number.
The repository is also where you handle database quirks once. If the ORM returns numeric strings, convert them. If foreign keys are nullable, normalize them. If a timestamp is stored as text, parse it. The rest of the codebase sees clean, typed rows. There is one place to fix a quirk, and one place to look for one.
sourceParentId for grouping? If you cannot answer
all six, the query is not done.
The Quiet Engine
DataLoader is a small library and a large idea. It is two
hundred lines of code and an entire design discipline. The idea:
within a single request, batch all calls to the same loader. If
three resolvers each call
loaders.reviewsByProductId.load(...) with different
IDs in the same tick of the event loop, DataLoader collects all
the IDs, fires one query, and returns the results to each
caller.
Mechanically it works by deferring. When you call
.load(id), you get a promise. DataLoader adds the
ID to an internal queue and schedules the actual batch fetch for
the end of the current tick. Any other
.load(id) calls in the same tick join the same
batch. When the tick ends, the batch fires once. The returned
rows are distributed to the waiting promises.
DataLoader solves N+1, but it also creates a discipline: every cross-entity field resolver must use a loader. Not "should." Must. The moment you have a field resolver that calls a repository or a service directly, and a client requests that field inside a list, you have N+1. The fix is the same every time. The rule is the same every time. Make it the only path.
Three details make DataLoader work in production:
-
The registry is one file. Every loader
lives in one place, typically
src/loaders/registry.ts. This is the one intentional cross-module integration point. A product loader can know about reviews. A review loader can know about authors. The registry composes them. Nothing else does. Without this discipline, loaders sprout in random files and you lose the ability to scan the surface area of relation traversal in your codebase. -
One registry per request. Loaders cache by
ID within a request. A user fetched in one resolver gets
reused in another. But the cache must not survive the
request - it would leak data between users and serve stale
rows. The registry is constructed in
context()on every incoming request and discarded when the response finishes. - Cap the batch size. Set a maximum (500 is a reasonable default). Without it, a single page with thousands of nested IDs fires a query with a 50,000-character IN clause. Postgres handles this poorly, the planner gives up, and your fast path becomes your slow path. With the cap, DataLoader chunks automatically.
There is a fourth discipline that is less obvious: loaders
return shapes, not entities. A "reviews by product ID" loader
returns Review[][] in the same order as the input
keys. A "manager by ID" loader returns
Manager | null for each key. The shape encodes the
cardinality of the relationship. The resolver does not have to
think about it. This sounds pedantic until you have a resolver
that expects a single value and gets an array, and the error
surfaces three layers away from the cause.
A worked example, in code, is worth more than another paragraph:
export const createLoaders = (db: Database) => ({
// many-to-one parent → returns Review[] per productId, in input order
reviewsByProductId: new DataLoader<string, Review[]>(
async (productIds) => {
const rows = await findReviewsForProductIds(db, productIds);
const byProduct = groupBy(rows, r => r.productId);
return productIds.map(id => byProduct.get(id) ?? []);
},
{ maxBatchSize: 500 }
),
// one-to-one parent → Author | null per id
authorById: new DataLoader<string, Author | null>(
async (ids) => {
const rows = await findAuthorsByIds(db, ids);
const byId = new Map(rows.map(r => [r.id, r]));
return ids.map(id => byId.get(id) ?? null);
},
{ maxBatchSize: 500 }
),
});
Notice that the loader does almost no business logic. It calls a repository function, groups the result, and returns it in the input order. The repository function does the SQL and the visibility. The loader does the batching. Each layer answers one question.
Operational Guardrails
A schema is not an API. An API is a schema plus the operational behavior that makes it survivable. Clients are not always benign. Networks fail. Databases hiccup. The graph permits queries that the database cannot answer in finite time. None of this is in the SDL. All of it is part of the contract.
The minimum set of guardrails, in rough order of how often they save you:
pageSize argument.
statement_timeout. A hung query
cannot hang the API.
These four numbers cap the worst-case work a single request can cause. They are not optional. A GraphQL server without them is a server waiting for one curious client to wedge it.
The other operational rules are less numeric but equally load-bearing:
-
Authentication is the first middleware.
Every request carries a key. The key is hashed and matched
against an
api_clientstable in constant time. There is no "optional auth" mode. There is no "internal endpoint." Every path is authenticated. - Authorization is in the repository. For a single-tenant public read API, authorization is trivial: all authenticated clients see all published data. For a multi-tenant API, it is the most important rule in the system, and it lives next to the visibility filter, in SQL.
-
Request identity is a UUID. Every request
gets an
x-request-idat the edge. Every log line carries it. Every error response returns it. When a buyer reports a problem, you ask for the request ID and find every related log line with one grep. The cost of adding this is twenty lines. The value of having it during an incident is everything. -
The health check is honest.
/healthzprobes Postgres with a short timeout. It returns failure when the database is unreachable. It does not always return 200. Load balancers should believe it. -
Error shapes are stable. Every GraphQL
error has a
codefield, and the codes are part of the contract:BAD_USER_INPUT,UNAUTHENTICATED,NOT_FOUND,INTERNAL_SERVER_ERROR. Stack traces are never exposed in production. Buyers handle errors by code, not by message text. - Body size is capped. A typical GraphQL query is a few kilobytes. A pathological client can submit megabytes of garbage. Set the body limit at the HTTP layer, well below your query depth implies.
None of these are GraphQL features. All of them are part of running a GraphQL API. The schema and the guardrails are inseparable, and the schema is not finished until the guardrails are.
There is one more category of guardrail worth naming honestly because most teams skip it: query cost accounting and rate limiting. Depth and page size cap individual requests. They do not cap a client's total work across a window. If you serve public traffic at any scale, you need both, and you need them at the edge, not in your application code. This article does not solve that. It just refuses to pretend the schema solves it.
A Filter, End to End
The whole architecture is theory until you run a change through
it. Here is one. A buyer asks for
products(filter: { collectionId }). They want every
active product in a given collection, paginated, with the usual
nested fields.
The wrong instinct is to add a JavaScript filter. Load all
products, then drop the ones not in the collection. This loses
on every dimension: it fetches rows you will not return, it
leaks the rule into the resolver, and it breaks count parity.
Pagination metadata becomes a lie because
total reflects the unfiltered query.
The disciplined path is small and local. Four files. Each layer answers the question it owns.
01 · Start at the contract
The SDL is the public surface. The change starts there because if it does not start there, the rest of the work is invisible to buyers.
input ProductsFilter {
titleContains: String
collectionId: ID # new — public surface grows by one optional field
}
02 · Mirror it in Zod
SDL types are not runtime checks. They are removed at parse time. Zod is the runtime gate. It also generates the TypeScript types the service uses internally, which keeps the contract honest end to end.
export const ProductsFilter = z.object({
titleContains: z.string().min(1).max(120).optional(),
collectionId: z.string().uuid().optional(), // new — runtime gate matches the SDL
}).strict();
03 · The service does not change
This is the test of the architecture. A new filter changes the contract, the validation, and the SQL. The orchestrator should not change. If it does, the architecture has leaked, and the next filter will leak again.
export const listProducts = async (ctx: Ctx, args: unknown) => {
const parsed = ProductsArgs.safeParse(args);
if (!parsed.success) throw badInput(parsed.error);
const { page, pageSize, filter } = clampPage(parsed.data);
const { items, total } = await findProducts(ctx.db, { filter, page, pageSize });
return buildPage({ items, total, page, pageSize });
};
04 · Put the rule in SQL
An EXISTS clause against the join table, with the
relationship state checked. The visibility filter on products
stays where it is. The two predicates compose.
const buildFilterSql = (filter: ProductsFilter) => {
const clauses = [PUBLISHED];
if (filter.titleContains) {
clauses.push(ilike(products.title, `%${filter.titleContains}%`));
}
if (filter.collectionId) { // new
clauses.push(sql`EXISTS (
SELECT 1 FROM product_collections pc
WHERE pc.product_id = products.id
AND pc.collection_id = ${filter.collectionId}
AND pc.status = 'active'
)`); // new
} // new
return and(...clauses);
};
That is the entire change. Four files. The contract grew by one optional input. The validation grew by one optional field. The repository gained one clause. The service is unchanged. The resolver is unchanged. The loader registry is unchanged. The tests that need updating are the contract tests: one new test that asserts the filter works, one that asserts pagination is still consistent when the filter is applied.
where builder that
handles every domain. Three lines of conditional SQL. The
architecture's job is not to make the change clever. Its job is
to make the change small.
If a future filter does justify abstraction (suppose you have eight filters with similar EXISTS patterns), build it then, on top of working code, with the real shape in hand. Premature abstraction is the cousin of the shallow module. It expands surface area in anticipation of a future that may not arrive.
Three similar implementations are better than a premature abstraction.
The Working Checklist
The architecture compresses into a few questions you can run before any pull request. These are not theoretical. They are what stops a small mistake from becoming a structural one.
- Can a buyer understand the new field without knowing your storage?
- Does the service call only its own repository?
-
Does every cross-entity field route through
ctx.loaders? - Are unpublished rows impossible to return from the repository, on every code path?
- Do page size, query depth, and batch size still cap worst-case work?
- Is the new test pinning a buyer-visible contract, or is it pinning local plumbing?
When you start a new module, the SOP is similar and short:
-
Add SDL in
src/schemas/<domain>.graphql. -
Add Zod args and filters in
src/validations. - Add a service that validates, paginates, and calls one repository.
- Add a repository: row type, select columns, filter builder, list/count helpers, by-id helpers.
- Enforce visibility in SQL on every primary entity and every relationship table.
-
Add resolver
Queryentries and field resolvers that go throughctx.loaders. - Register new loaders only when a relation traversal needs them.
- Add focused contract coverage for the new entry point or graph edge.
None of this is exciting. It is not supposed to be. The interesting work happens in product surfaces, performance frontiers, and data modeling. The architecture's job is to get out of the way of that work. When the architecture is invisible to the engineer adding a feature, it is doing its job.
Why This Holds Up
The pattern in this essay is not the only way to build a GraphQL API. It is the way that has held up across iterations of a production read API serving real traffic. The tradeoffs are concrete and worth naming honestly.
You pay for this architecture in file count. Adding a single field touches the SDL, sometimes a validation, sometimes a service, often a repository, occasionally a loader. A junior engineer's first reaction is reasonable: this is a lot of files for a small change. The answer is that the files are the point. Each one answers a question. When the question changes, exactly one file changes. The diff is local. The review is fast. The blast radius is small.
You also pay in generated noise. If your storage is managed by a CMS or any tool that produces opinionated table names, the repository has to translate. There is no way to make this disappear. The alternative is to expose those names to buyers, and that alternative is much worse than translation.
You give up some framework magic. There is no resolver decorator that auto-generates from a database table. There is no "fastify-style" plugin that wires everything for you. Every connection is a function call from one file to another. This is the boring tradeoff, and it is the one that pays off the longest. Boring code is reviewable code. Reviewable code is fixable code.
What you get in return is a system where the next engineer can find a rule by reading the architecture. Where a regression has a known place to look. Where a performance problem has a known place to start. Where a new feature has a known shape. The architecture is not novel. It is not the point that it is novel. The point is that it is consistent. Every layer answers one question.
A GraphQL API is the surface of a contract between you and every client you have not met yet. That contract should be small, stable, and honest. The architecture that produces it should be the same.
The interface is small. The interior is large. The schema is the contract. The SQL is the truth. Conquer the root and reuse the parent.
A buyer should see a clean graph. An engineer should see clear ownership. The code should answer one question at every layer: who owns this rule? When you can read your own architecture without grep, you have arrived.