We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The data lake is by now a familiar term and concept across the enterprise, as is the data warehouse.
But what about “data lakehouse?”
As its name would suggest, the emerging architecture fuses the functions of the data lake repository for raw data with the reporting and analysis business intelligence (BI) of the data warehouse.
Also referred to as the less catchy “unified analytics warehouse,” this architecture can manage an organization’s full complement of structured, semi-structured and unstructured data. It can support many different data workloads and has the ability to be deployed on top of low-cost cloud storage systems.
“It comes down to insight,” said Jeff Denworth, cofounder and chief marketing officer of flash memory data storage company VAST Data. “It provides one comprehensive view across an entire data estate.”
To provide this vantage point for companies and enable real-time queries at scale, VAST Data has partnered with database company Vertica. The partnership announced today unites VAST Data’s all-flash Universal Storage data platform with Vertica’s Eon Mode Architecture to create an all-flash data lakehouse. This helps enterprises consolidate their structured and unstructured data silos to democratize data for real-time data exploration, analytics and insights, Denworth said.
“Customers can start to run a lot more queries,” he said, “they can get much faster query responses.”
Modern data market: Faster, stronger and better all around
The market for managing big data only continues to grow as organizations amass data on larger and larger scales. Global Industry Analysts, Inc., has forecasted it to reach $234.6 billion by 2026.
Growing right along with it is the list of companies supporting data lakehouse architectures. These include big data giants Snowflake and Databricks, as well as Oracle Cloud Infrastructure (OCI) and Google, which preview launched its BigLake engine at its Cloud Data Summit earlier this month. Onehouse emerged from stealth in February with its open-source data lakehouse; Dremio recently raised $160 million in series E and in March released a free edition of its SQL lakehouse.
Databricks, which was founded in 2013 and reports an estimated $38 billion post-money valuation, has said that 5,000 global organizations leverage its Databricks Lakehouse Platform. Snowflake’s offering, Upsolver, is helping companies such as Peer39 with page-level intelligence under GDPR/CCPA compliance and ironSource with collecting, storing and preparing data to support multiple use cases.
Gerrit Kazmaier, vice president and general manager of Databases, Data Analytics and Business Intelligence at Google Cloud, said of its decision to enter the market: “Managing data across disparate lakes and warehouses creates silos and increases risk and cost, especially when data needs to be moved.”
Stacking up against the competition
With their partnership, VAST and Vertica aim to provide a unique offering in a growing field of competitors.
As Denworth pointed out, in the case of data, one immense problem for enterprise is compartmentalized storage. Historically, companies have built multiple data warehouses and multiple lakes of that data, thus resulting in siloes. Then, when they’ve asked a question of data, he said, they haven’t necessarily received relevant responses, or they have dealt with response times that are very challenging based upon the orientation of their data.
“If you want to ask a question and get the greatest possible answer, you really need to look across all of the data as it comes in an organization,” Denworth said. “Historically, that’s been really challenging because no one system is designed to essentially see everything.”
Thus, the data lakehouse is designed to provide enhanced insight from broader looks at data. This is an incredibly valuable tool, he said, for big data teams and data science organizations that are trying to be broader and more flexible with their data analysis.
“Now you don’t have to copy data from department to department to department,” Denworth said. “You just make these stateless servers and they all have access to the same data underneath.”
For instance, the new VAST-Vertica-enabled lakehouse is being leveraged by Singapore online travel agency Agoda to support and enhance its recommendation engine. A mobile casino game company is also using the architecture for its recommendation capabilities.
Typically, Denworth said, organizations think they have to go to the cloud to get the best lake-warehouse solution. Or, if they look on-premises, their options are systems that are large and slow, or small and “very expensive and very fast.”
“Flash is something that marries both: big, cheap and fast,” he said.
Customers transitioning to the VAST-Vertica lakehouse model can save 80% to 90% while supporting capacity scale by factors of 100, Denworth said. He emphasized the fact that the average data warehouse houses terabytes in factors of 10. But companies that move beyond the data warehouse model to the lakehouse model speak extensively in petabyte terms.
“Our customers are directing huge, huge datasets into the system,” Denworth said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.
Business intelligence
Data
Data Center Infrastructure Management (DCIM) Software
Data infrastructure
Data storage (warehousing, datalakes, lakehouses, etc)
Database as a service (DBaaS)
Enterprise
Enterprise analytics
Graph databases