Delta lakehouse

11/22/2023

We’re picking data up from it’s distributed, analytics-specialized format and inserting it into a different distributed, analytics-specialized format. One of the key elements of this is the movement of data between the lake layer and the warehouse layer. Simply put – if you want the best of both worlds, then you need to include both worlds within your solution. That’s what led us to building out solutions following our reference architecture, a simplified version might look like this: Full data access audits, row level security, dynamic data masking or many hundreds of other enterprise security features we find in mature relational stores, simply aren’t easily available in Data Lake solutions currently. In software engineering terms, most data warehouses are gigantic monolith solutions – however data is moving faster than a traditional warehouse can handle, hence the adoption of data lakes, as I’ve argued previously.ĭata Lakes themselves lack some of the more mature features we need for doing financial reporting and driving critical business decisions. Everyone who has managed a large data warehouse knows that there comes a tipping point when the whole thing becomes slow and inflexible.

What we’re talking about is a shift in thinking that has been driven by technological advances but, crucially, isn’t ready yet.įirst, the Modern Data Warehouse, which is an architectural pattern more than anything. I’m going to give it the benefit of the doubt and say actually… no. So let’s see – if we’ve been building hybrid solutions of lakes and warehouses for many years now, is this just a new term for the same thing? They’re not the first to use it, it’s been thrown around by Snowflake and Amazon in the past two years, the first notable mention being way back in August 2017. Given we already have this phrase, I want to dig a little into why a new term might come about and whether it’s worth paying attention to.Ī lot of the current hype is down to a recent post by Databricks themselves – the impressive lineup of co-authors (O’Reilly’s Chief Data Scientist Ben Lorica and Databrick’s terrifying braintrust of Armbrust, Ghodsi, Xin, and Zaharia) speaks volumes about how much weight Databricks are putting behind this term. Now this term isn’t entirely new – we’ve been talking about data lakes and data warehouses together for the better part of the last decade, it was only inevitable that people would portmanteau the two together, especially as we commonly build hybrid architectures that we refer to as a “Modern Data Warehouse”. That’s right, we’ve entered a new decade, it’s time for a new buzzword to define what we’re doing… right? Enter the Data Lakehouse.

0 Comments

Delta lakehouse

Leave a Reply.

Author

Archives

Categories