Unlike most of my reviews, I should start with a small disclaimer. Robert Smallwood, the primary author, reached out to me over a year ago and asked if I’d consider helping him with some SharePoint and Office 365 content in his Information Governance: Concepts, Strategies, and Best Practices book. I’ve received a brief mention in the book and a small section of the content is something I wrote. With the disclaimer out of the way, let’s dive in.
Everything and the Kitchen Sink
One of the challenges to information governance is that it covers so many topics, many of which are full-time disciplines themselves. It’s a monumental challenge to pull together a resource with this kind of breadth and one that Smallwood has executed faithfully. I won’t say that all the content is perfect, because it’s not. However, I will say that is a great overview to several topics that are important to implementing an effective information governance plan.
Information governance has a singular goal to maximize the value of information including how it’s used and the mitigation of risk. While the goal is simple, the implementation is far from it. The number of different considerations from different disciplines is numerous and potentially overwhelming. If you need a quick summary of information governance, see my post, Explaining What Information Governance Is.
Dark Data
Astronomers estimate that dark matter represents about 85% of the matter in the universe and about a quarter of the overall density. Its presence is implied by observations, including gravitational effects, but because it doesn’t appear to interact with the electromagnetic spectrum, it’s difficult to detect directly. This gives rise to the term “dark data,” which refers to information that isn’t categorized properly and therefore is difficult to find and use.
The degree to which dark data is a problem is debated, but estimates on the amount of data that is “dark” are somewhere around 50% of all the data collected and stored by an organization. When you add in other forms of data like redundancy, about 69% of information has no business, legal, or regulatory value (according to the Compliance, Governance, and Oversight Council (CGOC)). In short, much of the data that we’re storing is data that we shouldn’t be storing, and it’s only getting worse.
Exponential Growth
The largest problem facing information governance is the velocity with which the volume of data is growing. It’s estimated that 90% of the existing worldwide data was created in the past two years. How can you keep up if every time you get a handle on things, the entire scope of the problem changes? The answer cannot be found in control, though it may be found in a combination of guiding and control.
It may be that the only way to manage the onslaught of data is to control some aspects – such as the retention of data from IoT (the Internet of Things) sources – and suggest standards for how people manage the information they work with directly.
Information Value
One of Smallwood’s key tenets to information governance is the subject of the Infonomics book. That is, that information should be treated like an asset. The only way to extract value out of something is for that something to be an asset. If information isn’t valued like an asset, then it will be impossible to extract value from it.
There’s an awareness that we continue to create more data, information, and knowledge than at any point in human history, and we spend immense sums to store and manage this information. There is relatively less awareness of the value that could be derived. We’re holding most of the data “just in case.”
Going Phishing
Holding on to information is particularly challenging because of the risk that its value will be unwittingly discovered by a nefarious third party. The most voluminous approach to infiltrating the corporate infrastructure comes in the form of phishing attempts. Every day, attackers are trying to trick users into using their credentials to authenticate to a fake website. They’re trying to convince users to open documents sent to them, which have been intentionally crafted to exploit vulnerabilities, in the hope that your organization hasn’t yet patched the vulnerability.
The propensity for employees to trust email and to do the attackers’ bidding causes it to be the most common attack vector and the one which is the hardest to address. In Transformational Security Awareness, Perry Carpenter explains why this is the case and what to do about it.
Containing the Leaks (DLP)
Sometimes, the reason that corporate information escapes the walls of the organization isn’t because of nefarious individuals trying to hack into the corporate treasure trove. Instead, employees are subverting the security controls by sending copies of the files they work with to their personal email accounts and uploading them to their personal file sharing repositories. It’s also employees sharing information to third parties in a careless manor.
Some types of information, particularly personally identifiable information (PII), should not leave the bounds of the organization’s network without clear rules and agreements, yet it happens every day. Social security numbers are transmitted in clear text via email and subjected to unauthorized observation.
Solutions for addressing these problems are called digital loss prevention (DLP) – though I believe that they would more accurately be described as digital leakage prevention solutions, since the information isn’t lost, it’s leaked.
Long Term Digital Preservation
Sometimes, the loss isn’t the direct result of a failure of hardware. Sometimes, an inability to recover important information happens because of the frailty of media. In these situations, creating multiple copies and periodically refreshing the media can help. However, another more challenging problem exists as data is locked away inside of file formats that can no longer be decoded.
Consider video recordings that were encoded in Adobe’s Flash SWF format. Most of the modern video players will not play video that was encoded in this format. If you have videos that you must maintain for a long period of time, the file format you choose matters. The MP4 H.264 AVC format is a stable format that’s likely to be supported for some time – but that means converting the file into this format for long-term preservation.
Luckily, long-term preservation of images and documents can be accomplished using the PDF/A standard format that is likely to be supported by most file viewers and operating system for the foreseeable future. Other file formats must be managed so that their file format can be read in the future.
Too Many Records to Manage
The truth is that the volume of information we’re producing now greatly exceeds the capacity of users to properly manage and classify the information. We know that users will not invest the time to properly tag and provide metadata for files in a general case. Whether they see this as not important or not their job doesn’t matter – what matters is that a substantial amount of the information being captured today is difficult to find, because it’s not been properly tagged.
The idea that employees are willfully not complying with requests for metadata information assumes at some level that they’ve been informed of what is expected and been made aware not only of the consequences of failure both personally and to the organization – but also that they’ve been given the tools to accomplish the appropriate tagging.
“The tools” means more than just the software. It means a guide for identifying what metadata needs to be supplied to which files and how to properly identify records that must be protected and preserved. Most people in the organization are focused on getting their job done, and rarely is the preservation of records considered to be a part of that process.
Developing Business Objectives
Ultimately, the most important aspect of an information governance program is the identification of the business goals for the program. What specific value does the organization believe they can get through better management of the information, including how they can extract the value of the information and how to mitigate risk? An information governance program is doomed if the only selling point to the program is reduced risk.
Organizations deal in risk, and it’s impossible to cover every risk. As a result, the organization often must decide which risks it must accept and move on. You don’t want your information governance program to be killed, because the risks it mitigates for the organization aren’t important enough when stacked up with the other competing priorities and risks.
In the end, Information Governance can put you on the right path towards extracting better value out of the information that your organization already has.