CMDS 002: The Frankenstack

Bloat in the data stack is ubiquitous - and a bigger problem than you think

Last week, we talked about how ubiquitously bloated the modern data stack is, and we went over some of this bloat’s ensuing headaches. This week, let’s twist the knife a bit more…because the data stack sprawl is even worse than you think.

Even if you’re not a CTO or an engineering manager, you already know this intuitively. You get tons of inbound marketing emails from reps offering to demo new software. You already have more tools at your disposal than you need. And you probably also have a vague sense that there are even more tools out there - ones that might make your job easier. But who has the time to learn them?

In the best case scenario, you're looking at a neat menu of options like this:

Or maybe you’re thinking about incorporating AI functionality. Bet you had no idea the landscape was this crowded:

And don't even try showing your Martech team this image:

You get the picture. A data stack shouldn’t look like a Jackson Pollock painting.

How did this crazy bloat all come about? It’s a layered origin story, but it starts with a near-unlimited VC pool funding an overwhelming amount of options - with no end in sight. As more and more of these VCs’ fundees have fragmented the stack into ever-smaller slices, the solutions they offer have gotten exponentially more complex. One company builds on top of another’s innovation, hoping to claim that their tool is the one that can do everything…though of course, none of them can. This phenomenon at scale is part of the reason for bloat.

No single person or company is really to blame for the overabundance of data tools. It’s not your CTO or engineering manager’s fault that the database is sprawling, or that you’re struggling to figure out how to manage your relational, NoSQL, MapReduce, cloud, columnar, wide column, object-oriented, key value, hierarchical, document/JSON, text search engine, message queue, graph, vector, and time series databases all at once. They’re always looking for opportunities to innovate, and vendors are very good at selling (or upselling) their stuff, so Frankenstacks are somewhat inevitable. And it’s not all bad news - competition in the database solutions space has driven major sophistication. But the inevitable consequence that we all now face is major complexity.

Convinced that the bloat problem is very real, and bigger than you thought? Good. Now we can discuss what to do about it. We’re building something at Tembo that won’t just add yet another layer to a packed stack, because we know that more tools won’t solve the sprawl. In future newsletters, I’ll talk about why it will take an entirely new approach to fully untangle the mess we’ve made of our data stacks.

Ry’s Weekly Resources

On Episode 4 of Hacking Postgres, I sat down with Pavlo Golub, co-founder of PostgreSQL Ukraine and a PostgreSQL expert at Cybertec. We talked about pg_timetable, an advanced standalone job scheduler for Postgres which Pavlo created.