/ Measuring Your Data Lake Worth: Petabytes or People?

The idea of a data lake has been a part of IT conversations for at least five or so years now, with numerous Australian heavyweights from NAB to Woolies and Jetstar adopting the model as part of their Big Data architecture. These investments typically run into the tens of millions of dollars at least, involving large amounts of time and effort to spin up. 

Yet for all this investment of time and resource, most people still cannot clearly or confidently describe the value their data lake provides. Instead, they usually end up talking about how big it is, how many petabytes of storage it has, focusing on a metric that does not demonstrate the value to the business. Not only does this give a false impression of value, it encourages the organisation to collect random information with little thought to how the business might put that information to good use. Domo increasingly hear anxiety from IT executives about the cost of scaling their data lakes by the petabyte. 

So how do we more effectively ascribe value to a data lake?

One simple method is to see how many people regularly access the data lake. If only a few users frequently querying the platform, this suggests that it does not provide much value. Delving deeper into why people are or are not accessing the data lake will reveal further how best to improve its utility.

If the problem is only a small percentage of staff are authorised to access the lake, you need to make fundamental changes to deliver data to the 99% . Typically this involves one or more of: creating a simple business catalogue of the data; adding a robust governance layer to enforce access and compliance; implementing self-service analytics at scale on top of the data lake.

As we proceed with evaluating the data lake’s value, we might augment this simple metric of usage with some qualifiers, like the number of processes or services that the data lake powers. However, the fundamental principle remains the same: how “useful” is your data lake on a functional everyday level? Who is using it? And more importantly, what are they using it for?

A successful data lake should support a wide variety of use cases that expand according to what the business and its teams need at different times and seasons. Perhaps one user can better predict the risk of churn and improve retention – something that in a telco we have seen equate to hundreds of millions of dollars in market value. Perhaps another uses the data to develop new, complementary services to your business’ core offerings based on underlying customer behaviours.

Understanding what use cases drive value can then guide IT in terms of what data to include, how to better design the platform, or where to introduce or remove new functionality. Over time, that makes for a data lake that constantly improves in relevance to more users and overall value to the business.

We often take our customers through this Business Value Assessment process to help them work out their data lake’s financial value: 

  1. Discover use cases of the lake that are already generating business value;
  2. Calculate the measurable annual realised benefit of each use case;
  3. Total these benefits to derive the net financial gain of the lake; and
  4. Present that net gain against the TCO of the data lake (ideally extrapolated over the next five years or similar horizon)

Join me as I take to the stage at Domo: Reimagine to discuss how business leaders are effectively tapping into their data lakes, the four stages of Business Value Assessment, and strategies to ensure your data lake is constantly improving in relevance for both the business and your customers.

Try Domo now.

Watch a demo.