
My dozen or so years as an industry analyst focused on information management have led me to one principal overarching conclusion, which is that success in the future for companies and other organizations that rely on information for their day-to-day operations – i.e., all of them – will be based on their ability to unlock hitherto-hidden intelligence and value from information. And they will need to do that regardless of the structure of the information – structured in a relational database, unstructured text in a file system, or anything in between.
Such things wouldn’t matter if the growth in the amount of information being created within an organization was easily manageable. But any information worker from the CIO down knows that that is no longer the case.
Call it Big Data, Total Data or any other variant, the realization has hit people that no matter your area of responsibility within the organization, the information explosion and the resulting complexity it creates are a concern for you.
If security is your concern, then as the volume of information increases, spreads across more connected devices, in more formats and in more countries, then your information risk is heightened.
If storage is your area, then you don’t want your storage costs rising in parallel with the growth of information in your organization, so identifying information that can be defensibly deleted or at least archived off to cheaper storage is a major priority.
And if you’re a user of business applications, you want to be sure that decisions you make using those applications are based on the totality of information available to you; not merely a subset of transactional data from a relational database. If you rely on that structured information alone, you’re missing out on what customers are saying to your sales people, support people, call center operatives, or saying to the whole world via social media, such as Facebook and Twitter.
So how do you achieve this? Extracting value from relational databases has long been understood. An entire sector of the software industry – business intelligence – was built upon adding a layer atop the relational database business. What hasn’t been achieved yet is the same level of application sophistication leveraging unstructured information.
The key piece of technology needed to make that happen is categorization; software that can analyze unstructured information to identify words and phrases in documents, emails, tweets and so on and determine what they’re about.
It is at the heart of application areas such as e-Discovery, information governance, data loss prevention, compliance and whole swathe of search-based applications; that is, applications that use a search index as their primary source of information, rather than a relational database (though may pull information from such a database).
Categorization software enables users to automate processes that some people still think computers can’t handle: the understanding and interpreting of text. I’ve been watching the market for unstructured information management for more than a decade and I can say with confidence that such technology is ready for prime time.
One of the earliest and most successful adoption areas for categorization has been e-Discovery, where a whole industry has been transformed from one based on manual categorization of documents to one where it is now generally accepted that software can do it more accurately than humans - and doesn’t get tired in the process. Predictive Coding is a great example of how software - utilizing powerful categorization technology - can automate a repetitive human process, namely linear review. And if you look back at the history of the software business, the most successful applications have been the ones that have done just that - applied software to a repetitive manual process and proven that it can be done more accurately and efficiently than it could be by hand, thus freeing up knowledge workers of various stripes to focus on higher-value tasks.
I’m looking forward to seeing what other areas can be transformed by categorization-driven information management.












