Saved 100s of hours of manual processes when predicting game viewership when using Domo’s automated dataflow engine.
What is Dark Data?

Every moment, tens of thousands of sensors are collecting data about us and the world around us. We are constantly generating data, from the phones in our pockets to the devices we use at work.
Machines are also producing huge amounts of data, from the sensors that track our cars’ performance to the industrial machines that keep factories running.
Organizations are sitting on a vast pool of data, and it’s only going to continue to grow. In fact, it’s estimated that by 2025, the world will generate 175 zettabytes of data per year.
That enormous volume of data provides a wealth of insightful information that businesses can utilize to enhance their operations and provide better customer service. But most organizations are only using a small fraction of it.
In fact, it’s estimated that less than 1% of all data is ever analyzed and used. The rest is left unanalyzed and unprocessed, collecting digital dust in databases and data warehouses. This is what’s known as dark data.
What is Dark Data?
Dark data refers to untapped information collected during routine business activities that remains unanalyzed and unused. Examples include web server logs, customer call recordings, IoT telemetry, chat transcripts, and legacy archives stored in data lakes, file shares, or warehouses without being queried.
Why Dark Data Matters
Dark data isn’t just unused—it can lead to considerable challenges:
- Missed Opportunities: Valuable insights and revenue potential lie dormant.
- Increased Costs: Even unused data requires storage, backups, and management.
- Security & Compliance Risks: Unmanaged sensitive data is vulnerable to breaches and regulatory violations.
Why Data Goes Dark
Dark data doesn’t accumulate overnight—it’s often a byproduct of common business practices:
- Lack of Awareness: Teams may not know certain data exists or assume it’s not useful.
- Data Silos: When departments manage data separately, valuable context is lost.
- Legacy Systems: Older platforms store years of records in outdated formats incompatible with modern tools.
- Incomplete Integration: Disconnected data pipelines can leave some data out or inconsistent.
- Shifting Priorities: As strategies evolve, older data gets ignored, even if it holds useful insights.
- ROT (Redundant, Obsolete, Trivial) Data: Duplicate files, outdated reports, and unnecessary copies bloat storage without adding value.
Understanding why data goes dark is the first step in creating a strategy to keep it active and valuable.
Examples of Dark Data
Organizational Examples
- Web server and application log files
- Customer service call recordings and chat transcripts
- IoT sensor data and product usage telemetry
- Raw survey responses and open-text feedback
- Email attachments, old presentations, and legacy documents
- Past employee records, geolocation data, and badge swipes
Personal Examples
- Unused photos and videos
- Step tracker or activity history
- Old downloads and outdated document versions
Structured vs. unstructured vs. semi-structured dark data
Structured Dark Data
Organized in rows and columns (like CRM data, ERP records, or transaction tables) but often locked behind permissions or outdated schemas.
Unstructured Dark Data
Free-form content like emails, PDFs, chats, images, audio, and videos. Requires enrichment or transcription to extract value.
Semi-Structured Dark Data
Data with tags or fields but no fixed schema (e.g., JSON, XML, HTML, or sensor payloads). It’s searchable but needs further modeling for analysis.
Turning Dark Data Into Value
Understanding the format is key to selecting the right tools—whether it’s data catalogs, ETL/ELT processes, natural language processing (NLP), or computer vision. Unlocking the potential of dark data not only boosts efficiency but also provides key insights for smarter decision-making.
Challenges in leveraging dark data
Many challenges are also present in dealing with dark data. Knowing what these are makes you better prepared to harness the potential of dark data.
- Lack of storage: Dark data can take up a lot of space, which can be expensive for organizations to store.
- Lack of expertise: Another common challenge is a lack of expertise. Many organizations don’t have the personnel or the expertise to effectively work with dark data.
- Connecting to source systems: When analysts need data, they often face hurdles with IT in accessing the data. This not only prevents them from making informed business decisions but also derails them from making an impact. They are often chasing down data sets, which keeps the dark data unusable.
- Difficulty accessing dark data: One of the biggest challenges of working with dark data is simply accessing it. Dark data is often unstructured and spread out across different systems. This makes it difficult to collect and analyze.
- Pulling data off of central systems: When analysts finally have access, they pull those datasets offline. IT loses visibility into that data and can no longer properly practice data governance. This causes issues around privacy and security.
- Privacy concerns: Another challenge of working with dark data is privacy concerns. Since dark data may contain sensitive information, organizations must take care of it to protect it.
- Making data anonymous: Online security and privacy concerns could surface when you’re dealing with data that involves personal information. One way to address this issue is to anonymize customer data by removing names, account numbers, or any other identifying information that points to a specific person.
- Ensuring data security: Storing data for long periods could compromise sensitive data such as proprietary information, clients’ financial records, employee personal data, and more. Processing dark data helps you identify what you can keep and remove, and it compels you to establish data protection and online security measures for important information.
The hidden costs of dark data
- Storage costs: Cloud and on-prem capacity, backups, and replication add up.
- Liability costs: Regulations (e.g., GDPR, CCPA) apply whether data is used or not. Retaining beyond policy invites penalties.
- Opportunity costs: Valuable signals (churn risk, product issues, demand trends) sit unused.
- Efficiency costs: People waste time searching across fragmented sources; decision cycles slow.
- Risk costs: Unmanaged sensitive data increases breach impact and reputational harm.
Leveraging dark data
Transforming dark data can be extremely beneficial for organizations, despite the challenges. Those who are able to effectively utilize dark data will be at a competitive advantage. It starts with connecting to the right source systems, giving analysts the right access to datasets, making it simpler to clean and prep data, and providing the right workflows to automate business processes right where work gets done.
As with regularly analyzed data, organizations can analyze dark data to track key metrics and performance indicators. By monitoring these metrics, organizations can make better decisions about where to allocate their resources.
Data such as website clickstream data, social media posts, and customer surveys can all be used to improve decision-making. Organizations that are able to effectively utilize an end-to-end data platform with an ETL (extract, transform, load) layer for data pipeline management are at an incredible advantage in harnessing their dark data. Those who don’t will struggle to keep up with the competition.

Data applications for dark data
Data apps with ETL data pipeline capabilities can be used to effectively mine dark data and extract valuable insights.
Organizations can use BI (business intelligence) tools to collect dark data from different sources, clean it up, and then analyze it. This process can be automated, which makes it easier for organizations to work with dark data.
BI tools can also visualize dark data, making it easier for organizations to understand and interpret. This is a valuable way to gain insights from dark data that would otherwise be hidden. But data apps provide last-mile assistance in deploying data and making it actionable.
By utilizing data apps, organizations can overcome the challenges of working with dark data and extract valuable insights that can be used to improve their business.
Don't let dark data stay in the shadows
Dark data is a valuable asset that contains a wealth of insights. However, many organizations are not utilizing it effectively. This is often due to the challenges associated with working with dark data.
Organizations can overcome these challenges by utilizing BI tools powered by data applications. BI tools with data apps can be used to collect, clean, and analyze dark data. This process can be automated, which makes it easier for organizations to work with dark data and also automate business processes. The risk of simply allowing dark data to stay in the shadows is that organizations will miss out on valuable insights. By utilizing BI tools powered by data apps, organizations can overcome the challenges of working with dark data and extract valuable insights that can be used to improve their business.