Azure Mythbusters: I don’t need a Data Warehouse now that I have a Data Lake

The text "Azure Mythbusters", to the left of an illustration of Bit the Raccoon.

There are a lot of benefits to a Data Lake – you can store all of your data, all of the time without worrying about scale. The data stored here can also be ingested very quickly with functions like schema on read, meaning that you’ll no longer find yourself bottlenecked. It also gives you a suite of analytics that weren’t always possible in old-style data warehousing. It’s a cheap way for an enterprise to store all of their data.

Traditionally, Data Warehouses were thought to be slow and inflexible. This is down to process as much as it’s down to technology – typically Data Warehouses are managed by IT who have SLAs, and you can see that in the past four or five years that Hadoop and Spark systems have been built as Data Warehouse replacements due to how flexible they are. The issue here is that you go back to the same process, you start to productionalise it, it goes back to IT and then you end up with the same problems.

This isn’t just a process problem, however, as there are some technology issues behind a Data Warehouse too. Traditionally old Data Warehouse appliances can be fixed in terms of storage and compute, meaning it’s difficult and costly to scale, especially on an old on-premises system. It can take a long time to get new capacity.

This video, by Pratim Das and Greg Loxton, is the first in a new series that aims to bust some of the myths surrounding Azure technologies. With their expertise, you’ll be able to gain a deep understanding of Azure technologies, why the myths are there in the first place, and what their recommendations are for you.

Stay tuned for the next instalment of Azure Mythbusters! Until then, here are some useful resources for learning more on the topic of Data Warehouses and Data Lakes.