dark data

What Is Dark Data? + 5 Important Types

We’re experiencing a proliferation of smart devices, like smartphones in the hands of consumers and the Internet of Things (IoT) in homes and industries. Around the clock, these devices collect data with sensors and software, which has caused an explosion of data. This exponential generation of information has warranted the need to “warehouse” data.

Companies can utilize large “lakes” of data to analyze, draw conclusions, and make business decisions. 

The largest public companies in the world like Amazon, Google, Facebook, Alibaba, and Tencent rely on data for business success—some are exclusively data companies. Today, companies of all sizes have to work with data to remain competitive in the marketplace.

However, data is often collected quicker than it can be used. The huge amounts of data generated and stored pose a problem: dark data.

What is dark data?

Dark data is data that is stored and available but not put to any practical use. 

There’s nothing ominous about dark data; it’s just a name for data not in anyone’s purview … similar to the dark side of the moon. 

Data that is stored but not utilized is a source of storage costs. It also presents opportunity costs that are not readily apparent to management or executives in an organization. According to IDC, 90%of all data is dark data. IBM estimates the same ratio—that 90% of all sensor data collected from industrial applications (like IoT) also goes dark. It’s just never used.

Why does dark data exist?

Big data coupled with artificial intelligence (AI) is capable of transforming any organization. 

Earlier organizations used to rely on limited data and gut feeling to make business decisions, which often proved to be flawed. Big data and AI alleviated this problem by applying massive computational power to large sets of data. Data-driven decisions are much more reliable and, in turn, profitable for organizations.

Big Data requires a large amount of data for AI algorithms to provide reliable results. Therefore, companies acquire large swathes of data—many collecting as much as possible and deciding what to do later.

This causes a pile-up of unused data in their data warehouse, and the never-used becomes dark data.

Another source of dark data is the large amounts of data that cannot be used by conventional analytics methods. Even AI algorithms struggle to analyze data such as text, audio, and video. 

It’s difficult to tag such data, which is often called “unstructured data.” Such data is high in resource consumption, and the tools needed to analyze them are still being developed. Thus, many organizations don’t utilize the data, even though it’s available to them. 

Next, let’s dive into the different types of dark data to shine more light on the matter.

5 Types of Dark Data

There’s a large variety of data that becomes dark data. Due to the nature and vastness of dark data, it’s difficult to classify it into neat buckets. 

However, it can be categorized by the origin, type, and other data characteristics. Let’s unpack five main types of dark data. 

1. Regulatory requirement

Different countries have different data protection laws and mandates for security purposes. GDPR, HIPPA, CCPA, and PIPL are some of the regulations that service providers are required to follow. According to each specific legislation, service providers are required to collect and store specific data for a period of time. Organizations may not need this data, but it’s required to keep for legal purposes. Such data which is not utilized is dark data.

2. Forgotten data

Data is often collected or sourced to use later. When that data is originally acquired, an organization might need only a part of the data. The rest is warehoused to utilize later. Over the course of time, organizations forget about this existence of data and never utilize it. This is a common type of dark data.

3. Metadata

Data from smartphones and social media has metadata linked to it. Some organizations may make use of it, but others might don’t know how to use it. The organizations that don’t utilize metadata are warehousing dark data that bleeds money.

4. Time-sensitive data

Time-sensitive data is data that needs to be used within a certain timeframe; otherwise, it’s not valuable. For example, the location information of a shopper is highly relevant to a coupon company as they can push coupons according to the location. After the user moves away from the location, the location data is not relevant unless it is used for larger analytics purposes.

5. Unstructured data

Unstructured data constitute the largest chunk of dark data. This type of data cannot be tagged, categorized, or easily analyzed, and. it may be in a wide range of formats, making it tough to structure and analyze. Text files, audio data, and video clips are some examples of unstructured data. IDC estimates that by 2025, 80% of all data will be unstructured.

How to Use Dark Data

Dark data does not mean data without any use—it means an organization has not made any use of it. This can be easily changed as most dark data is a treasure trove of information and insights.

The first step toward uncovering dark data is to make your employees aware of the problem. Once employees are data literate, they can recognize the value of all kinds of data.

The second step is to conduct a thorough audit to uncover dark data. Your organization won’t be aware of dark data in their data warehouses—every corner has to be combed to identify dark data.

Next, delete all unnecessary data. There’s no need to store data in the anticipation of future utility. You can safely assume that data that’s been in storage for years and never leveraged is useless data.

Finally, align your dark data with business objectives. Ask questions and empower your data team to collaborate and look into the newly discovered data. They will be able to unearth useful insights from the data.

Over to You

Dark data forms the bulk of all data possessed by  organizations. It holds valuable information, though it cannot be easily extracted. 

Holding on to the data without making use of it is akin to leaving money on the table. Data-driven decision-making is reliable and far superior to making decisions based on gut feelings. 

We, at Kloudio, are dedicated to helping organizations gain the most from their data. We have talked with thousands of organizations about the tools and ways in which they analyze data and have distilled this information into an original research report. If you’re curious how other companies are accessing and using their data, snag a copy of the 2021 Data-Driven Decision-Making Report for yourself.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email

Kloudio empowers you to do more with your data

Recent Posts

reporting analysis

Data Reporting vs. Analysis

Establish a difference between data reporting and data analysis with the respective tools for each segment.
All articles loaded
No more articles to load
Scroll to Top