Datalakes: definition and ROI
Do data lakes have an ROI? Before we try to give an answer, let’s recall the definition of a data lake. If you are asking yourself the questions “What is a data lake?” and “Do data lakes have an ROI?”, then this article is for you.
What is a data lake?
Data lakes: new data storage tools to bring the company into the modern data age, data lakes are often the technological answer to numerous corporate expectations.
The promises of Big Data and the development of more efficient use cases, a vector of business transformation, have pushed many companies to embark on the creation of ambitious data lakes.
Communication around AI, process automation, augmented advisor, or AI-inflated marketing has helped accelerate the deployment of data lakes.
Do data lakes have an ROI? (or, Who is chasing data lakes?)
Substantial investments in data lakes
These new data platforms represent significant investments. The market has become structured since the Open Source vision of early years, and platform publishers now have structured and adapted offers that provide all the necessary security elements.
After an investment of more than 100 million euros, several months of work for the IT teams, and many twists and turns (GDPR compliance, security, etc.) comes the long-awaited moment of the data feeding phase.
In a data lake, data can be structured, unstructured, or semi-structured for later use, unlike a data warehouse, which mainly comprises processed and structured data.
But how profitable is it?
After this long waiting phase, the urgency of making the data lake operational means that data management practices are often forgotten, or relegated to “we’ll see later” status.
At this point, the company has a space where it can duplicate all of its data, but where creation and management costs continue to rise.
Beyond the immediate additional cost and the difficulty of managing this rapid duplication of company data (importance of keeping the information synchronized, identifying the right data source, etc.), which is often carried out without deploying data governance practices, there is above all the question of the profitability of these platforms.
These technological solutions have been created to provide a service, and thus ensure the continuity of an existing use. It is now imperative to release the new, long-awaited uses that have been considered during the previous weeks, and it is at this moment that new difficulties can arise.
The uses alternate between:
- simple outputs that validate the platform and skills, but that add little value; and
- uses that are too complex and do not take into account the maturity of the company and its control of its data assets.
Data lakes and business transformation
Various studies show that 80% of the hoped-for uses never make it past the production stage. The long-awaited AI uses still seem to be a far away and the positioning of the data lake raises questions.
So, we need to go back to basics and take a step back from the technology once again. The data lake is only a means to support the company’s transformation. This transformation must be underpinned by a reflection on the business model to be deployed to accompany the company’s transformation in terms of the data lake and the associated operational model.
The success of these types of transformations requires complete alignment of:
- the organization;
- its processes;
- its employees;
- its data.
This alignment is the only way to get the most out of these investments and to ensure that all of the company’s functions are working towards a common goal. It also offers a precise vision of the ROI, enabling the costs of these new infrastructures to be absorbed.