CMDS 001: The Sprawling Modern Data Stack

How to avoid laying off 80% of your employees

There are scores of VC-funded data tools on the market for each level of the data stack. That’s not obviously a problem, but it creates some problems. A sales team’s job is to make their product indispensable. Salespeople are really good at what they do, and data teams can be sold - and sometimes, they buy.

These tools often start with one core feature, but grow to provide adjacent features to help support the “land and expand” sales approach: i.e. buy a data warehouse tool, and eventually add on data transformation or dashboarding functionality. Before they know it, customers have multiple places they can do their work in because their data stack tools start to overlap - not an ideal situation, because devs will start using the secondary features of some tools, and similar workloads are running in multiple places.

A bloated data stack becomes a contagious infection in larger organizations. Different data teams (tribes) end up using different tools to isolate their tribe, and data becomes increasingly siloed. To combat this fragmentation and reduce the errors that stem from siloed workflows, companies buy even more software for data monitoring, providence, lineage, cataloging and pipelining. Monitoring and data quality assurance is necessary regardless of the complexity of your stack, but the more tools you use, the higher the risk. As Notorious B.I.G. said “Mo Money Mo Problems” applies here as well. A bloated data stack goes hand-in-hand with increased costs and complexity, and creates an urge for new teams to try to silo away from the mess (which exacerbates the mess).

How do you solve this issue? In theory, don’t let your data stack bloat in the first place…but if you’re already past the point of no return, another option is layoffs. When Elon Musk bought X/Twitter, he saw so many microservices running with unknown value. So he turned many of them off in turn to see if anything bad happened to the Twitter performance. When nothing bad happened, he had his answer: the services were extraneous. And the corresponding personnel was, therefore, also extraneous. It’s reported that he fired 80% of the Twitter team - it was hard for those services to defend themselves if their creators and operators were gone. This is an ugly road to travel, but if your modern data stack bloats sufficiently, you might end up here.

Modern data stack bloat is a ubiquitous problem in the industry. Powerful venture capital engines empower it. But it forces companies to deal with increased costs and overcomplicated processes, and it lays the groundwork for future heartache. The work we’re doing at Tembo is an attempt to attack this problem. Using reliable ol’ Postgres as a data platform rather than just as a database, is at the core of our thesis.

Is your modern data stack bloated? If so, what tools would be the first you’d like to jettison first? Let me know on Twitter.

Ry’s Weekly Resources

If you’re looking for an Elastic Search alternative, look no further than ParadeDB - built with Postgres and Tantivy, a Rust-based implementation of Apache Lucene.

Hacking Postgres - Stay up to date with the newest in Postgres with a new podcast I’m hosting

Did you know George R. R. Martin wrote a vampire novel? (highly recommended, I listened to it on Audible)

If you're looking to publish a website, product documentation, and a blog together, use http://docusaurus.io (the tembo.io site uses it!)

Evidence.dev - cool "Business Intelligence as Code" product that I'm going to try out soon

Thanks for reading. If you have questions you’d like me to address, reply to this email and I’ll do my best to answer them in future emails.

Sincerely,

Ry Walker
CEO @ Tembo.io

Want to take things further?

Tembo is the Postgres developer platform for building every data service. Our mission is to collapse the database sprawl of the modern data stack with a unified developer platform. With Tembo, developers can quickly create and deploy specialized data services using Stacks, pre-built Postgres configurations, and deployments without complex builds or additional data teams.