Data is being generated at an unprecedented pace, offering new opportunities for analysis and insight – but only if it can be managed effectively. While we may be generating more data than ever, we also understand less of it than ever.

By Johan Scheepers, country head at Commvault South Africa

A significant amount of information collected by businesses is collected, processed and stored without having any other purpose, particularly when it comes to unstructured data.

This ‘dark data’ costs money to store, but generates no value, and could actually introduce risk. An effective data management strategy is essential to dealing with dark data before it can become a business problem.

 

The state of the data

In 2018, IDC predicted that the Global Datasphere would grow from 33 Zettabytes that year to 175 Zettabytes by 2025, but the reality is that they may have undercalculated this figure.

From Artificial Intelligence (AI) to the Internet of Things (IoT), we now have more sources of data than ever, and the vast majority of it does not contribute in any way to the business.

In fact, the State of Dark Data survey by TRUE Global Intelligence claims that 55% of an organisation’s data is dark, unquantified and untapped.

Gartner states that most organisations only retain this data for compliance purposes and that it often costs more to store than the value it generates.

However, the reality is that while business may keep it out of fear of being non-compliant, it can actually result in breaches to regulations, especially those relating to personal privacy.

Organisations typically have no visibility into their dark data, what it is or what personally identifying information it contains. Ignorance is not bliss by any means, and businesses can be held liable for this lack of insight.

 

The multi-cloud creates multiple challenges

When all data was stored on premises, it was far simpler to manage. However, as we have broken out into the cloud, the issue of dark data has expanded.

The multi-cloud only compounds the issue. Data now sits in dozens of different systems and storage platforms, from legacy on premises solutions and historical storage to numerous different cloud offerings.

Some of them are sanctioned by the organisation, such as Office 365 and Salesforce, but cloud-based solutions are so easy to use and obtain that often employees will use others that may not be part of the business strategy. This includes offerings like Dropbox, iCloud, Google Sheets and Google Documents, and more.

If businesses do not have a clear picture of what data is being stored where and how it is being used, this data cannot be effectively protected. This leaves it vulnerable to theft, ransomware and leakware, potentially creating compliance problems and risk of reputational damage. Nor can it be analysed.

The sheer amount of data generated, added to the vast number of data repositories, increases the risk of data being unknown and dark. In addition, with a significant percentage of data not being utilised, any insights gained from analytics could be skewed. With such a large proportion of the workforce now having to work remotely, these issues are only growing.

 

Shedding light on the problem

The first step to tackling the dark data problem is understanding all of your data by creating a map of what applications you use, where data is stored and where your organisation touches the multi-cloud, per department. It is also essential to understand compliance requirements around data and how long it needs to be maintained.

Once you know what data you have and where it is stored, file optimisation can help to reduce both costs and risk. This step uses basic analytics to look at what file types are stored, who owns them, how old they are, when they were last accessed or modified and which security policies need to be applied. After data is identified and understood, it can be further analysed for personally identifying information, so that the correct policies around storage, security and defensible deletion can be applied.

Dark data can pose risk to businesses for a number of reasons, and as data growth continues to accelerate the problem will only grow. Organisations need to implement effective data management to ensure their data does not cause compliance or security problems, and that the full value of data can be realised.