The Trenches
Architecture Trenches · 2026

Game Changers in the
Data Engineering Space

Fifteen tools that are quietly reshaping how we ingest, transform, stream, orchestrate, and serve data at scale. No hype. Just what actually works in production.

Boyan Balev
22 min read
~Countless worlds
Prologue

The Landscape Shifts

There is a moment in every engineer's career where the tools you trusted quietly become the bottleneck. Not all at once. Slowly, like sand swallowing a foundation you thought was solid. One morning you realize your batch pipeline runs for eleven hours, your Kafka cluster needs a team of three just to stay alive, and your orchestration layer has become a maze of YAML that nobody dares to touch.

I have been there. Most of us have. And the honest truth is that the data engineering landscape in 2026 looks nothing like it did even three years ago. The tools have matured. The abstractions have sharpened. The community has collectively decided that certain problems should not require heroic engineering anymore.

What follows is not a ranking. It is not a product comparison or a vendor evaluation. It is a field guide, written from the trenches, covering fifteen tools that I believe have fundamentally changed what is possible in data engineering today. Some are new. Some are battle tested veterans that quietly evolved into something far more capable than most people realize. All of them have earned their place here because they solve real problems in ways that actually survive contact with production.

Paul Atreides did not conquer Arrakis by choosing the most popular weapon. He studied the terrain. He understood the forces at play. He adapted. The same principle applies to building data platforms. You do not pick tools because a blog post told you to. You pick them because you understand the forces acting on your data and you have found the right instrument for each one.

By the end, we will combine these tools into three production-proven stacks: a modern lakehouse for analytics-first teams, a real-time engine for sub-second use cases, and a pragmatic hybrid for everyone in between. But first, let us walk through each layer together.

The Five Layers of a Modern Data Platform
Ingestion
AirbyteNiFiDebezium
Streaming
KafkaRedpandaFlinkksqlDB
Storage & Transform
dbtIcebergTrinoSparkClickHouse
Orchestration
AirflowDagsterPrefect
Serving
RedisClickHouse
Part I

The Ingestion Layer

Everything begins with getting data from point A to point B. It sounds trivially simple, but ingestion is where most architectures quietly begin to rot. A missed schema change here. A silent failure there. Before you know it, you are debugging a pipeline at midnight because someone added a column to a Postgres table and nobody told the data team.

Three tools have fundamentally changed this story.

Three Ingestion Paradigms
Paradigm I
Batch Pull
Airbyte
Scheduled extraction from APIs and databases. Handles the 80% of data that tolerates hourly refreshes.
SaaS APIs · Databases Connector (CDK) Warehouse / Lake
Paradigm II
Flow Routing
Apache NiFi
Visual directed-graph processing. Excels when data arrives in unpredictable formats from many sources.
Heterogeneous Sources Processor Graph N Destinations
Paradigm III
Change Data Capture
Debezium
Real-time event streams from database transaction logs. No polling. No missed deletes.
DB Transaction Log Debezium Event Stream
01 Batch Ingestion
Airbyte
The open source EL that actually works.

For years, "just connect to the API" was the starting point of every data project, and the beginning of every data team's pain. Airbyte changed this by building a connector catalog so massive (350+ and counting) that it effectively commoditized the ingestion layer. Write a connector once in any language using the CDK, publish it, and it becomes everyone's connector.

But here is what makes Airbyte genuinely different: it is built on the principle that data extraction should not be a competitive advantage. It should be infrastructure. Like electricity. The value is in what you do with the data, not in how you extracted it.

The architecture is elegant in its simplicity. Each connector runs in its own container, producing records in a standardized protocol. Source and destination are decoupled. You can swap either without touching the other. It is the Unix philosophy applied to data movement.

When to reach for it

Batch ingestion from SaaS APIs, databases, or file systems into your warehouse or lake. Particularly powerful when you have 10+ diverse sources and do not want to maintain custom scripts for each one. Not ideal for sub second latency requirements, that is a different paradigm entirely.

02 Flow Based Routing
Apache NiFi
The Swiss Army knife of data routing.

NiFi is one of those tools that does not get the hype it deserves, probably because it came from the NSA and its UI looks like it was designed by government contractors (because it was). But beneath the dated aesthetics lies one of the most powerful data routing engines ever built.

The concept is beautifully simple. Data flows through a directed graph of processors. Each processor does one thing: fetch from an API, filter records, transform formats, route based on content, write to a destination. You compose them visually, like LEGO blocks for data.

Where NiFi shines is in the messy middle, the scenarios where data arrives in unpredictable formats from unreliable sources and needs to be routed to multiple destinations with different requirements. It handles backpressure natively, retries gracefully, and provides data provenance tracking that can trace a single byte from source to destination.

When to reach for it

Complex routing logic, government and enterprise environments with strict data provenance requirements, IoT data collection with heterogeneous sources, or any scenario where you need to fan out data to many destinations with different transformation rules. Overkill for simple A to B data movement.

03 Change Data Capture
Debezium
Your database's event stream, hiding in plain sight.

Here is a truth that took me embarrassingly long to internalize: every database is already an event stream. Every INSERT, UPDATE, and DELETE is an event. Debezium simply makes those events accessible.

It works by reading the database's transaction log (WAL in Postgres, binlog in MySQL, oplog in MongoDB) and converting each change into a structured event that gets published to Kafka or Redpanda. No polling. No "last modified" timestamp hacks. No missed deletes. Just a faithful reproduction of every change, in order, as it happens.

This sounds simple, but the implications are profound. With Debezium, your operational database becomes a source of truth that can feed real time analytics, cache invalidation, search index updates, and cross service data synchronization, all without modifying a single line of application code.

When to reach for it

Real time replication from operational databases. Cache invalidation when paired with Redis. Event driven microservice architectures. Building a change log for auditing. Converting a monolith's database into an event driven system without rewriting the monolith.

Debezium CDC — From Transaction Log to Consumers
Postgres / MySQL
Operational DB
WAL / Binlog
Transaction log
Debezium
Reads & emits events
Kafka / Redpanda
Event backbone
Analytics · ClickHouse
Search · Elasticsearch
Cache · Redis
Downstream services

"The spice must flow" is not just a line from a novel. In data engineering, the flow is everything. The moment your pipeline stops, your business is flying blind. Every tool you choose is a bet on continuity.

Part II

The Streaming Layer

Batch processing got us far. But there is a class of problems where waiting for the next scheduled run is simply not acceptable. Fraud detection. Real time personalization. Operational monitoring. These problems demand that data moves continuously, like blood through a circulatory system.

On Arrakis, the spice harvesters had minutes to extract melange before a sandworm arrived. There was no luxury of a scheduled run. You moved fast or you were consumed. The same urgency applies to streaming data. When the business depends on sub second decisions, batch is not slow — it is a liability.

Three tools define this space today.

Streaming Data Architecture
Producers
Apps, CDC, IoT devices
Kafka / Redpanda
Durable event log
Flink / ksqlDB
Process & enrich
Sinks
ClickHouse, Redis, S3
Flink
Complex event processing, event-time windows, exactly-once semantics
ksqlDB
SQL-native streaming: filters, aggregations, materialized views
04 Event Backbone
Kafka / Redpanda
The central nervous system of modern data.

Kafka invented the category. It proved that you could build a distributed, durable, high throughput event log that decouples producers from consumers and lets you replay history. That insight changed everything.

But Kafka also comes with baggage. ZooKeeper management (finally being replaced with KRaft, but slowly). JVM tuning nightmares. A complexity tax that makes small teams hesitate. This is where Redpanda enters the conversation.

Redpanda is a Kafka compatible streaming platform written in C++ that runs without the JVM, without ZooKeeper, and with tail latencies that are consistently lower. It speaks the Kafka protocol, so your existing producers, consumers, Kafka Connect integrations, and client libraries all work without changes. You just point them at Redpanda instead.

For large organizations already running Kafka successfully, migration might not be worth the effort. But for new projects or teams without dedicated Kafka expertise, Redpanda removes an entire category of operational headaches while delivering the same (often better) performance.

When to reach for it

Any architecture that needs durable, ordered event streaming. Microservice communication, CDC event transport, real time analytics ingestion, or any scenario where you need a reliable backbone between data producers and consumers. Kafka for large, established deployments. Redpanda for everything else.

05 Stream Processing
Apache Flink
Where time becomes a first class citizen.

Flink does something that most data tools only pretend to do: it treats time correctly. Not wall clock time. Not processing time. Event time, the actual moment something happened in the real world.

This matters more than most engineers realize. When you are computing a 5 minute window of transactions, do you mean the 5 minutes when your system processed them, or the 5 minutes when the transactions actually occurred? Late arriving data, out of order events, network delays: these are not edge cases, they are the normal state of distributed systems. Flink handles all of them through its watermark mechanism.

The programming model gives you exactly once processing guarantees with checkpoint based fault tolerance. If a node fails, Flink rolls back to the last consistent checkpoint and replays from there. No duplicate processing. No lost events. For anyone who has tried to build this manually, you know how valuable that guarantee is.

When to reach for it

Complex event processing, real time aggregations, streaming ETL, fraud detection, and any scenario where event time semantics and exactly once processing matter. The learning curve is steep, but the payoff is enormous for the right use case.

06 Streaming SQL
ksqlDB
SQL that never stops running.

Not every streaming use case needs the full power of Flink. Sometimes you just want to filter, aggregate, or join streams using SQL. That is exactly what ksqlDB provides.

Think of it as a SQL engine where your queries do not return results and terminate. They run continuously, processing every new event as it arrives on a Kafka topic and producing results to another topic. A "SELECT count(*) FROM orders WHERE status = 'failed' GROUP BY region" does not give you a snapshot. It gives you a living, breathing materialized view that updates in real time.

The sweet spot for ksqlDB is in teams that have strong SQL skills but limited experience with distributed stream processing frameworks. It lets you build streaming applications without writing Java or Scala, using a language your analysts already know.

When to reach for it

Streaming aggregations, event filtering, stream to stream joins, and materialized views over Kafka topics. Ideal when your team speaks SQL fluently and the processing logic is not so complex that it demands Flink's full programming model.

Part III

Storage and Transformation

Data at rest is only valuable if you can shape it and query it efficiently. This layer is where raw signals become actionable knowledge. And it is where, frankly, the most interesting architectural battles of 2026 are being fought.

The Fremen did not simply collect spice. They refined it, processed it, understood its molecular structure before they could wield its true power. The engineers who build lasting data platforms are the ones who deeply understand how their storage and transformation layers actually work under the hood — not just the interfaces, but the mechanics.

07 Transformation
dbt
Software engineering discipline, applied to SQL.

Before dbt, SQL transformations lived in one of two places: stored procedures maintained by a DBA who left two years ago, or Python scripts buried in an orchestrator that nobody fully understood. dbt changed this by bringing version control, testing, documentation, and dependency management to SQL transformations.

The core insight is powerful. SQL is the right language for transforming tabular data. What it lacked was not expressiveness, but engineering discipline. dbt provides that discipline without asking you to abandon SQL for Python or Spark. You write SELECT statements. dbt handles the DDL, materializes the results, runs tests, generates documentation, and manages the dependency graph.

The result is transformations that are readable, testable, and maintainable by anyone who can write SQL. That is not a small thing. It means your analytics engineer can own the transformation layer end to end, without needing a data engineer to babysit every deployment.

When to reach for it

Any warehouse or lakehouse transformation pipeline. Particularly valuable when multiple people need to collaborate on transformation logic, when you need lineage tracking, or when your transformations have grown beyond what a single script can maintain.

08 Table Format
Apache Iceberg
The table format that freed data from the warehouse.

Iceberg solves a problem so fundamental that most engineers do not even realize it exists until they hit it: how do you give warehouse level reliability to data sitting on object storage?

Traditional data lakes on S3 or GCS are just files in folders. No transactions. No schema enforcement. No time travel. No concurrent write safety. Iceberg adds all of these capabilities through a metadata layer that sits between your query engine and your Parquet files.

The practical impact is enormous. With Iceberg, you can run Trino, Spark, Flink, and Presto against the same data simultaneously. You get ACID transactions on object storage. You get schema evolution without rewriting data. You get partition evolution that lets you change your partitioning strategy without migration. And because the data is just Parquet files on S3, you are never locked into a vendor.

When to reach for it

Building a lakehouse on object storage. Multi engine environments (Spark for ETL, Trino for interactive queries, Flink for streaming). Any scenario where you want warehouse grade reliability without warehouse grade vendor lock in.

Apache Iceberg — Open Lakehouse Architecture
Trino
Spark
Flink
all engines read the same tables
Iceberg Metadata Layer
ACID transactions · Schema evolution · Time travel · Partition evolution
Parquet Files on S3 / GCS
Open format · No vendor lock-in · ~10:1 compression
09 Query Federation
Trino
One query engine to rule them all.

Trino (formerly PrestoSQL) is a distributed SQL query engine that can query data wherever it lives. Iceberg tables on S3, MySQL, PostgreSQL, Elasticsearch, Cassandra, Redis, even Google Sheets. You write one SQL query, and Trino federates across sources, pushes down predicates, and returns results in seconds.

The beauty of Trino is that it separates compute from storage entirely. Your data stays where it is. Trino just queries it. This means you can upgrade your query infrastructure without migrating a single byte of data. And because it scales horizontally, you can add workers to handle peak loads and remove them when demand drops.

When to reach for it

Interactive analytics over lakehouse data (Iceberg + Trino is a killer combination). Cross source queries when data lives in multiple systems. Ad hoc exploration where you need sub minute response times on terabyte scale datasets.

10 Distributed Compute
Apache Spark
The gravitational center of large scale data processing.

Spark needs no introduction, but it deserves a more nuanced one. Spark in 2026 is far beyond the MapReduce successor many still picture. Structured Streaming is a viable continuous processing engine. The Catalyst optimizer improves with every release. Spark Connect has decoupled the client from the cluster entirely.

Where Spark truly shines is in complex, multi stage transformations that would be painful in SQL alone: ML feature pipelines, graph processing, large scale ETL with custom business logic. For simple SQL transforms, dbt on Trino will do. For everything else, Spark remains the gravitational center.

When to reach for it

Large scale batch ETL (hundreds of GB to PB). ML feature engineering. Complex transformations beyond SQL. Overkill for simple analytics queries.

11 Analytical Database
ClickHouse
The speed demon that rewrites your assumptions about query performance.

The first time you run a query on ClickHouse and it returns results in 50 milliseconds on a dataset that takes your warehouse 30 seconds, something shifts in your brain. You start to question every architectural assumption you have made about acceptable query latency. (The ClickBench results bear this out across dozens of analytical queries.)

ClickHouse achieves this through a combination of columnar storage, aggressive vectorized execution, and a storage engine (MergeTree) that is purpose built for analytical workloads. It compresses data so efficiently that many deployments see 10:1 compression ratios, which means your storage costs drop dramatically while your query performance improves.

The catch: ClickHouse is opinionated. It is built for append heavy, analytical workloads. Frequent updates, heavy joins across large tables, and OLTP style access patterns are not its strength. But within its sweet spot, nothing else comes close.

When to reach for it

Real time analytics dashboards, log analytics, time series data at scale, and any scenario where you need sub second query latency on billions of rows. Pairs beautifully with Kafka/Redpanda as a real time analytics sink.

The best engineers approach new tools the way the Fremen approach the desert: with humility, patience, and a willingness to learn its rhythms before trying to bend it to theirs. Master the tool's language before you impose your own.

Part IV

The Orchestration Layer

Having great tools means nothing if you cannot coordinate them. Orchestration is the invisible hand that ensures your ingestion runs before your transformation, your models refresh before your dashboards update, and when something fails at 3 AM, you know about it before your stakeholders do.

This is the layer where discipline separates amateurs from professionals.

Two Mental Models for Orchestration
Task-Centric
Airflow · Prefect
extract_api_data
load_to_staging
transform_data
run_tests
publish_to_prod
vs
Asset-Centric
Dagster
raw_api_data
clean_events
user_metrics
dashboard_feed
12 Workflow Engine
Apache Airflow
The orchestrator that defined the category.

Airflow created the modern concept of data orchestration. DAGs (Directed Acyclic Graphs) as code. Pluggable operators. A web UI for monitoring and debugging. An ecosystem so vast that there is an operator for nearly everything. For all its warts, Airflow remains the most battle tested, widely deployed, and well understood orchestrator in the ecosystem.

The honest assessment: Airflow is showing its age in places. The scheduler can be slow with thousands of DAGs. Testing DAGs locally is more painful than it should be. Dynamic task generation feels bolted on rather than native. But Airflow 2.x addressed many early complaints, and the sheer size of the community means answers, patterns, and integrations exist for nearly every scenario.

When to reach for it

Production data orchestration at any scale. Especially if your team already knows it, the switching cost to alternatives is real. New projects might prefer Dagster or Prefect, but Airflow remains the safe, proven choice for organizations that value stability over novelty.

13 Asset Orchestration
Dagster
Orchestration reimagined around data assets.

Dagster takes a fundamentally different approach from Airflow. Instead of defining workflows as sequences of tasks, you define your data assets and the dependencies between them. The orchestrator figures out what needs to run and in what order.

This asset centric model changes how you think about pipelines. Instead of asking "did task X succeed?" you ask "is asset Y fresh?" Instead of debugging a chain of task failures, you look at which assets are stale and why. It is a subtle but powerful shift that aligns orchestration with how data consumers actually think about data.

The developer experience is genuinely excellent. Type checked resources, built in testing, local development that mirrors production, and a UI that makes Airflow's look dated. The tradeoff is a smaller ecosystem and a steeper initial learning curve for teams coming from Airflow.

When to reach for it

New data platforms being built from scratch. Teams that value developer experience and testing. Organizations where data freshness and asset health are primary concerns. Pairs exceptionally well with dbt.

14 Modern Workflow
Prefect
Orchestration that gets out of your way.

Prefect's philosophy is simple: you should not have to learn a new paradigm to orchestrate your code. Just decorate your Python functions, and Prefect handles the rest. Scheduling, retries, logging, alerting, and a clean UI for monitoring.

Where Prefect 2.x really shines is in hybrid deployments. Your orchestration control plane can be fully managed (Prefect Cloud), while your code runs on your own infrastructure. This separation is elegant: you get the operational simplicity of a managed service without sending your data or code to a third party.

The learning curve is the shallowest of any orchestrator on this list. If your team writes Python and wants orchestration without the cognitive overhead of Airflow's operator model or Dagster's asset framework, Prefect is the path of least resistance.

When to reach for it

Python heavy teams that want fast time to value. Workflows that mix data pipelines with ML training, API calls, and general automation. Teams that want managed orchestration without vendor lock in on the compute side.

Part V

The Serving Layer

Data that cannot be served quickly is data that cannot drive decisions. The serving layer is the final bridge between your carefully engineered pipeline and the applications and humans that depend on it.

The Hot / Cold Serving Pattern
Pipeline Output
Batch or streaming
Pre-compute
Aggregations, features, metrics
Redis
< 1 ms · Hot 1% of data
Application
User-facing reads
Cold data remains in ClickHouse or the lakehouse for ad-hoc queries.
Hot data is promoted to Redis for sub-millisecond serving.
15 In Memory Serving
Redis
The sub millisecond layer between your data and your users.

In the data engineering context, Redis plays one critical role: it makes pre computed results available to applications with sub millisecond latency. Your pipeline computes aggregations, features, or derived metrics and writes them to Redis. Your application reads from Redis instead of hitting the warehouse. Response times in microseconds, not seconds.

What was once just a cache has evolved into a versatile serving platform. Redis Streams add lightweight event streaming for use cases where Kafka is too heavy. Redis Search provides full text search over cached data. Redis TimeSeries handles time series with built in aggregation. The discipline is in what you store — Redis is expensive per gigabyte, so use it for the hot 1% of your dataset that serves 99% of traffic.

When to reach for it

Feature serving for ML models. Session data. Real time leaderboards. Rate limiting. Any scenario where your application needs data in under a millisecond.

Part VI

The Map

Every tool has a sweet spot and a danger zone. The trick is matching the tool to the problem, not the other way around.

Batch vs. Streaming — Architecture Decision Guide
Batch Stack
Minutes to hours latency
Airbyte / NiFi
S3 / GCS
Iceberg
dbt + Trino / Spark
Airflow / Dagster
Streaming Stack
Milliseconds to seconds latency
Debezium / Producers
Kafka / Redpanda
Flink / ksqlDB
ClickHouse / Redis
Always-on (K8s)
Layer Batch Stack Streaming Stack Key Tradeoff
Ingestion Airbyte, NiFi, custom scripts Debezium CDC, direct producers Connector breadth vs. latency
Transport Object storage (S3, GCS) Kafka / Redpanda Cost vs. real time capability
Storage Iceberg, ClickHouse, Trino Kafka (event log), ClickHouse Query flexibility vs. write speed
Transformation dbt, Spark, SQL Flink, ksqlDB, Spark Streaming Simplicity vs. latency
Orchestration Airflow, Dagster, Prefect Always on (K8s, managed Flink) Control vs. operational overhead
Serving Warehouse queries, caching Pre computed views in Redis/ClickHouse Freshness vs. cost
Part VII

The Art of Combination

Individual tools are interesting. Combinations are where the magic happens. Here are the stacks I have seen work in production, and when each one makes sense.

The Modern Lakehouse

Best for: Analytics first organizations, 10 to 100 TB range, SQL heavy teams

Airbyte Iceberg on S3 dbt Trino Dagster

This is the stack I recommend most often. Airbyte handles extraction from dozens of sources. Data lands in Iceberg tables on S3, open format, no lock in. dbt transforms raw data into clean models with testing and documentation. Trino provides the interactive query layer. Dagster orchestrates everything and keeps the asset graph healthy. Total infrastructure cost for a mid size company: roughly $500 to $2,000/month on bare metal or Hetzner — a fraction of equivalent managed warehouse pricing, assuming your team can handle the ops overhead.

Sources
APIs, DBs, files
Airbyte
Extract
Iceberg / S3
Store
dbt → Trino
Transform & query
Dagster
Orchestrate
The Real Time Analytics Engine

Best for: Product analytics, user facing dashboards, sub second query requirements

Debezium Redpanda Flink ClickHouse Redis

When latency matters, this is the stack. Debezium captures changes from your operational database in real time. Redpanda provides the durable event backbone with predictable latency. Flink processes, enriches, and aggregates the stream. ClickHouse stores the results for ad hoc analytics. Redis serves the hottest metrics to your application with sub millisecond reads. End to end latency from database change to dashboard: under 5 seconds.

DB Changes
INSERT, UPDATE
Debezium
CDC
Redpanda
Event log
Flink
Process
ClickHouse + Redis
Serve < 5 s
The Pragmatist's Hybrid

Best for: Most organizations, growing data needs, teams that value simplicity

Airbyte + Debezium Redpanda Iceberg dbt + Trino Prefect

The reality is that most organizations need both batch and streaming, but at different levels of urgency. This hybrid approach uses Airbyte for the 80% of data that can tolerate hourly or daily refreshes, and Debezium plus Redpanda for the 20% that needs real time delivery. All data lands in Iceberg for unified querying through Trino. dbt handles the transformation layer. Prefect keeps it all running without the operational overhead of Airflow.

80% Batch
Airbyte
Iceberg
+
20% Real-time
Debezium
Redpanda
Iceberg
Unified query
dbt + Trino
Epilogue

Choosing Wisely

The data engineering landscape in 2026 is richer, more mature, and more accessible than it has ever been. The tools on this list are not theoretical. They are running in production, at scale, right now. Many of them are open source. Most of them can be deployed on modest hardware. All of them reward the engineer who takes the time to understand them deeply.

If there is one lesson that Paul Atreides teaches us, it is this: power without understanding is destruction. He studied the ecology of Arrakis before he attempted to lead. He learned the language of the Fremen before he asked them to follow. The same applies to data platforms. The engineer who grabs the flashiest tool without understanding the forces it must withstand will build a system that collapses under its own weight.

Start with the problem. Understand the forces. Then choose the tool that fits. Add complexity only when the pain demands it. And always, always, read the documentation before the blog post.

See you in the trenches.

Notable Omissions

Delta Lake and Hudi are serious open table formats that compete with Iceberg. Iceberg won this list because of its broader engine support and momentum in 2026, but both are production-worthy.

Snowflake, BigQuery, and Databricks are deliberately excluded. This guide focuses on tools you can self-host and compose freely. Managed platforms are a valid choice, but they represent a different philosophy.

Materialize and RisingWave are fascinating streaming databases that blur the line between stream processing and materialized views. They did not make the cut because their production footprint is still growing, but they are worth watching closely.

dbt Cloud vs. dbt Core: this guide covers dbt as a concept. Whether you run it open source or managed is an operational decision, not an architectural one.

About the Author

Boyan Balev is a Senior Full Stack Engineer with 8+ years building data systems across outsourcing, startups, and enterprise environments. He currently works on MandateWire, a platform that tracks institutional investment mandates, where he recently reduced query times from seconds to 50ms through architectural refactoring and closure table patterns.

He writes about system design, infrastructure economics, and the pragmatic side of engineering at The Trenches.

Medium LinkedIn GitHub