Every CIO and data leader knows that within an enterprise there is “light” data and there is “dark” data. Light data is visible, well understood and managed.
Any data that is managed by IT and has been integrated into the enterprise data warehouse is considered light data. It is generally well formed, well understood and has an established business case that justifies it being analyzed within a data warehouse. It tends to be highly structured, based on a data taxonomy, customer data model or schema that sits in a singular environment. This, by design, makes it more rigid, and harder to leverage for speed to insight.
Dark data, on the other hand, is data that isn’t managed within an IT-approved, rigorous rules-based technical workflows and business processes data architecture. Dark data is often collected but not used, such as operational logging data which is archived in a data lake but not accessed.
According to one study by IBM in 2018, more than 80% of all data is dark and unstructured, with the figure projected to rise to 93% by the end of this year.
However, the standard definition of dark data often ignores a much more valuable type of dark data: data that is already in use (such as offline data in spreadsheets or third party systems) but which isn’t integrated into the enterprise’s data architecture. This segment of dark data has greater associated risk and missed opportunity because, unlike data which hasn’t yet been exploited, it is essential to many business functions but is being managed and processed in ways that the IT department wouldn’t approve. As well as the security issues towards unmanaged data, it can also lead to conflict between teams which understandably need this data to operate.
In this article we look at the types of dark data that are essential to the enterprise, the reasons why they are left in the dark, and possible solutions to augmenting, amplifying, and making a data warehouse more complete to finally enable the vision for “single source of truth”.
In the modern enterprise, why is there still dark data?
With the pace and variety of work that goes on today, every business unit will have examples of important data that they rely on to manage their workflows — often in much larger volumes than earlier generations of data: social or programmatic advertising data for marketing, CRM data for sales, workflow and personnel data for HR, and many others.
Formerly it would have been acceptable for such data to exist in silos, but data leaders recognize the power of joined-up data and the sharing of data sources with a wider group of individuals.
There are also problems with types of data that don’t suit the enterprise data warehouse because there is too much of it, it is too unstructured, or there is not high enough priority to get time from IT departments and engineers. As this is happening, business users encounter roadblocks to doing their day-to-day functions, and IT ends up getting steamrolled with data requests and support tickets that compound — thus leading to more technical debt being racked up.
Problems caused by dark data
If dark data is not handled well, it can result in many organizational problems. Firstly, because this data is essential to the business user, they will continue to use it and will also opt for going rogue and using separate BI tools of their choice — because they often don’t need IT’s approval and can connect to data much faster.
Analysts spend up to 80% of their time seeking out and managing data required by business users but which isn’t in primary systems. This makes it a time-consuming and expensive operation.
Dark data can lead to shadow analytics — more tools, separate to the main systems, being used — and a data governance, certification nightmare without the right tools or level of visibility.
Finally, there is the issue that when data is taken out of the data warehouse it loses its security and accuracy, which could have a detrimental effect on the business. Some businesses fail because they lose access to the information that they need to reach potential clients, or because existing clients start to question the company’s trustworthiness.
Ways to integrate dark data within an existing data infrastructure
It is possible to integrate typical dark data without the overheads and have it work alongside the EDW by using Domo’s solutions.
Pre-built connectors take away the bottlenecks of ingesting data and provide business users with more flexible and easy to use ETL and data pipeline tools that traditionally remained locked in an IT sandbox.
User-friendly systems allow data pipelines to be managed by regular business users if wanted, while also providing full security and management, just like the main EDW.
Domo’s solutions provide the ability to join this dark data together with structured business data in your BI tool of choice. Domo’s capabilities augment how data is connected and transformed, visualized and analyzed, as well as extended across and beyond organizations through data, apps and workflows. Find out more here.