The Enterprise Unstructured Edge

Richard Mallah, Director of Advanced Analytics, Cambridge Semantics, Inc.

Richard Mallah, Director of Advanced Analytics, Cambridge Semantics, Inc.

Text Analytics

The world continues to get more complex. The business world continues to operate at an ever faster pace. How will your company keep up with what they need to know to remain competitive? Is the explosion of more and ever-growing data sources leading to information overload, or can you actually turn this into a competitive edge?

“Scaling an integrative text analytics platform across the enterprise enables nearly all roles to be more effective at what they do.”

There's a lot of valuable information in documents. This does make sense since people communicate mostly with document-like or text-based methods in professional contexts. But how can one possibly read everything? There's an average of well over a thousand new journal articles in medicine per day, and each of these can conceivably impact what a doctor does in his day-to-day practice. But that's actually tiny compared to the number of documents with potential nuggets of information that would be useful to a random colleague of yours, regardless of their profession.

This information, actually the most prevalent type of data, is historically seldom even called “data” because few have been able to harness it effectively in the past. In reality, however, this is rich data that demands all the analytics that can be brought to bear. To retain a competitive edge, one needs a system to pre-digest as much information as possible and present to each role only the information that's most salient and important, via an understanding of the context of what that person is responsible for.

Departmental Use Cases

At this point, many roles in many verticals have already boarded the text analytics train. There have been great successes with departmental solutions. Applications like competitive intelligence, financial research analysis, compliance surveillance, scientific research informatics, customer service optimization, and marketing analytics have all benefited tremendously from unstructured analytics over the past few years. Leading organizations that perform such functions have already incorporated text analytics into their infrastructures, data flows, and workflows.

Those are all for particular departments, particular roles, or even just particular projects however, leading many to think that text analytics paradigms are only relevant for point solutions.

Enterprise Perspective

They can be so much more.

Nearly every role in the enterprise can benefit from a service that will help them know relevant new things and also connect old dots. Colleagues in a wide variety of roles can start asking questions that they couldn't have before, whether interactively or establishing standing subscriptions to things they'd want to know, in efficient data-driven ways, e.g.:

• A marketer gets warned about a new competitor so they can do what's needed to differentiate from them (even when no competitive landscape analysis is on the calendar).

• A legal or compliance officer gets notified of the fact that a particular employee, unauthorized to do so, seemed to be entering a legal contract on behalf of the company (without violating privacy rules).

• A product development manager get pushed info about a new library, or a new information source, that can speed their projects along (adapting to the project queues and reactions to suggestions).

• An account manager gets pushed information about a newly announced acquisition by one of their enterprise clients because it can catalyze deals (yet doesn't get spammed about less significant news items about clients).

This is not a proliferation of point solutions. This is democratizing a common foundational technology that can lead to efficiencies throughout the enterprise.

Moreover an entirely new class of qualitative synergies emerges when semantic models spread across the enterprise. Semantic models support multiple views on, and multiple ways to structure, the same information, even if they conflict. They power flexible metadata management, flexible permissioning, and are also ideal for joint representation of multiple modalities. The unique mix of flexibility and governance that the paradigm entails passes over a major hurdle that has traditionally plagued IT departments that've tried to foster cross-departmental collaboration. This can breathe new life into a stagnant organization and let employees feel empowered. It does give a superpower to a broad array of roles. Of course, those worried about giving too much power, or too much access, to too many users can define granular access control groups and policies at the graph and node levels in a very flexible yet systematic manner.

Actual Doability

Perhaps this seems really exciting but really daunting to you at this point. Is this really all doable? Yes indeed. Disk, memory, and compute power are all so cheap these days, relative to a workers' time, that it makes sense to make the most of all your workers.

What's needed here is an enterprise backbone for text analytics. As per “The Other Five Vs of Big Data: An Updated Paradigm” I wrote in the October 30, 2015 issue of CIO Review, a smart data lake architecture, exceling at integrating disparate multi structured sources, provides the most solid base for such a backbone. The better such systems are fully data-driven and event-driven, from models to ACLs to applications to workflows, and it's that subset you want to consider. No single NLP engine is likely to do every type of extraction your users will need, so select a platform that can seamlessly support and harmonize multiple engines. Although such systems were often traditionally configured only by NLP experts, with the better modern systems text mining goals can be configured by either self-service and/or by IT operations, depending on your organization's preferences and the roles. Keep ACLs in mind, but everything in writing is fair game: shared drives, content management systems, intranets, relevant public web content, corporate emails, news articles, press releases, journal papers, patents, social media, commercial content subscriptions, and free-text database fields.

Graph-based semantic overlay, the extraction, resolution, and connection or merger of corresponding entities or concepts (no matter the source) is what enables the system or business users to connect the dots. This architecture also actually forms the basis for future white box cognitive computing, where the organization is in charge of the models driving its business (as opposed to in a black box cognitive computing model that uses closed models like IBM's Watson does).

In isolation, most text analytics platforms can only help you scale point wise analyses of documents, social media, the web, or emails. More advanced techniques blending structured and unstructured data, applying semantic overlay to both, create new structured data that connect new and old dots in your organization's conception of the world in a near-real-time manner. When these connections, concepts, entities, and relationships surface in a stack that's both data driven and event driven from the ground up, powerful new insights and workflow options emerge, giving your team that edge.

Read Also

User Experience at the core of Application Design

Jeremy Ashley, Group VP, Oracle

Doing Things in a Whole New Way

Shane Miller, CIO, HSHS Division-Eastern Wisconsin

Transforming IT to Digital Technology

Mano Mannoochahr, CIO, GE Energy Connections [NYSE: GE]