“Data is a precious thing and will last longer than the systems themselves.”
- Tim Berners Lee (Inventor of the World Wide Web)
We live in a world of unstructured data. According to IDC, more than 90% of data generated by organizations is unstructured - but most companies have organized only 10% of it.
Up until recently, the value of unstructured data has been difficult to realize at scale. Two structures can help us manage this complexity: knowledge graphs and entity graphs.
In this article, we’ll look at how these structures are similar, how they differ, when to use which, and how to put entity graphs into action in your organization.
Structured vs. unstructured data
Before we dive into the details, let’s quickly cover exactly what we mean by “structured” vs. “unstructured” data.
Structured data is typically machine-generated and follows a set, machine-readable (and, often these days, human-readable) format. Common formats include relational or columnar database tables, Excel files, or similar.
Most of the time when we discuss data, we’re thinking of this structured data from enterprise applications - like Enterprise Resource Planning (ERP), Human Capital Management (HCM), Customer Relationship Management (CRM), and Supply Chain Management (SCM). We typically store this data in data warehouses or as-is in data lakes. There are various tools for data analysis, business intelligence, and machine learning that enable organizations to analyze and gain insights from structured data.
Unstructured data, on the other hand, is explicitly and directly human-generated data. This can be textual data from chats, texts, social media posts, speech-to-text transcripts, customer notes, emails, documents, presentations, reports, etc. These days, almost every communication leaves a trail of unstructured text. IDC research indicates that this data is growing at nearly 30% annually.
What’s more, most of the structured data sources also include unstructured data. This often takes the form of a “catch-all” database field or spreadsheet column that stores crucial information. The information itself, however, doesn’t conform to any of the other structured data fields.
This unstructured data is critical to organizations:
- Intelligence organizations scan unstructured text in the form of social media posts, online forum activity, internal intelligence reports, and more to identify the next security threat.
- Leading companies analyze public product reviews, online product forums, as well as their own internal call center transcripts and warranty claim descriptions as an early warning system for product defects.
- Others use it to understand the root causes of customer issues, understand trends and patterns in news and social media, and the list goes on.
What is a knowledge graph?
How should we think about gaining analytical value from unstructured, freeform, often messy and rapidly evolving data?
To make unstructured data useful at scale, it needs to be converted into a structured form. Past approaches to solving this problem have typically converged around trying to simplify the problem, given the historical technological limitations of natural language processing. These simplified approaches included solutions such as trending keyword identification, topic modeling, keyword alerting, and so on.
Over time more sophisticated approaches focused on named entity recognition as a starting point. This identifies the key entities (people, places, organizations, products, etc.) in the unstructured text. It then tries to associate those tagged entities to known authoritative database records of relevant entities in a process called entity linking or entity disambiguation. After that, you could perform additional text analytics processing to extract and associate the properties, attributes, and relationships of and between entities.
Initial results proved promising. Unfortunately, however, in enterprise settings these methods did not get very far. The “noise” (read “inaccuracy”) introduced at each step in these NLP pipelines yielded unusable, unreliable outputs for most production use cases.
Thus, most enterprises had to settle for an overly simplified approach. If the information was crucial enough, manual, human-powered information extraction remained the only viable path to unlocking this data with trustworthy fidelity.
Eventually, as NLP technology improved, a hybrid NLP/human-curation approach developed to help condense and organize unstructured text: the enterprise knowledge graph.
A knowledge graph is a type of distilled, typically largely human-curated knowledge base organized into a graph-like data model. The graph represents this information in a highly structured, interconnected format.
The key information in a knowledge graph centers around entities, which are represented as nodes. Nodes are connected where relevant via what are called edges, essentially relationships between two nodes. Both nodes and edges typically have labels, describing either the node or relationship. And both nodes and edges can have attributes, which are referred to as properties.
Knowledge graphs require a formal representation of entities. In other words, they define an ontology that explains the entities – their types, categories, and hierarchies – in a given knowledge domain. Knowledge graphs combine databases, graphs, and knowledge bases into a single, new representation of entities and their relationships to each other.
Knowledge graphs are powerful ways to improve search engines and information retrieval. For example, Google Knowledge Graph is a great example. The Google Knowledge Graph serves up relevant information in an inbox besides the search results. While it has been popular, it has been limited by a lack of attribution as well as bias.
Benefits of knowledge graphs
Knowledge graphs are a dynamic way to organize data from various data sources and make it easily available for deeper analysis and traversal. They can unify structured and unstructured data to enable deep collaboration among teams and provide data for decision-making.
Organizing knowledge in graph form is very powerful. Knowledge graphs power several different types of applications:
- Search (WikiData, Google Knowledge Graph)
- Finance (Customer 360, Money Laundering Prevention)
- AI & ML (Context, Reasoning)
- Content personalization and recommendation systems
However, while incredibly useful, Knowledge Graphs to date require a lot of human curation and effort in order to distill them to their convenient, condensed representation of knowledge.
What is an entity graph?
Entity graphs have emerged as a new solution that optimizes graph technology for analytics.
An entity graph is a data structure that models the most contextually relevant relationships between an organization’s entities. The nodes are your entities, whereas the edges represent the connections between the entities. We can think of entity graphs as graphs that provide a more comprehensive view of the relationships around the entities critical to your organization. They open the door to discovering connections in an automated manner, saving time and resources.
There are a few subtle differences between entity graphs and knowledge graphs:
- Knowledge graphs are typically a highly curated distillation of just the most salient authoritative knowledge about entities. By contrast, entity graphs are more comprehensive representations of the complete picture of entities within a given document corpus.
- Knowledge graphs do not link to all of the references of those entities. In other words, they don’t link references to those entities in living documents.
- Knowledge graphs are typically more static and human-curated, whereas entity graphs can be more autonomously created and thus more dynamic and even near real-time.
- Finally, there’s an interplay between the two. On the one hand, an entity graph can be used to build a distilled authoritative knowledge graph efficiently. On the other hand, a knowledge graph can (and should, if possible) be used as the seed for an entity graph.
Emerging entity graph systems can look at the data and identify a relationship or pattern. This novel approach can leverage machine learning capabilities to identify loosely defined semantic relationships between entities without understanding a priori what that relationship might be. In an additional analytical process, these systems can tap into neural networks and/or large language models to classify and define that relationship.
The key benefit of this approach is that you don’t need to define the relationships in advance when designing the system. Instead, you can leverage advanced technologies to discover critical relationships you didn't even know to look for. Stay tuned for a future post expanding on Agolo’s approach in this area.
What entities are relevant to your use case? This depends on the specific knowledge domain that is relevant to you. Common entities include people, organizations, and locations.
What makes entity graphs particularly powerful is that they are customized to your domain. By having a flexible model for entities, custom entities can include real-world things like products, technologies, and error codes. However, they can also be conceptual entities such as customer issues, problems, root causes, cross-sells, and machine learning concepts.
Conceptual entities need not be defined by literal nouns or verbs that appear in unstructured or structured text. You can define them in the entity graph and then infer information from them based on text strings within a relevant context. You can even infer conceptual entities from low-context text strings (for example, a text exchange between a technical support representative and a customer).
Entity graph use cases
While powerful, there are numerous use cases in which knowledge graphs may not be the best solution. When it's critical to understand relationships between entities in a given context, a more effective approach may be to use an entity graph. In these situations, an entity graph can provide a more complete and accurate view of those entities. In addition, because an entity graph can be built by a machine, it can be much more dynamic and near-real-time.
What are some situations in which an entity graph may outperform a common knowledge graph?
In federal intelligence agencies, for example, it is critical to understand specific entities and the relationships between them. Intelligence analysts need to understand the nuanced and precise relationships between people, organizations, places, events, as well as specialized entities like weapon systems and surveillance drones – to highlight a few key entity types. This is critical to noticing emerging entity relationships. That, in turn, is critical to noticing emerging national security threats.
Likewise, leading technology companies are increasingly interested in understanding specific entities and their detailed relationships. Product support leaders, for example, are interested in picking up on specific relationships between products, problem symptoms, support issues, root causes, and issue resolution and solutions commonly found across support knowledge bases, user forums, social media, and customer service transcripts.
In another related example, manufacturers leverage entity graph structures to better understand product quality and warranty claims. They are often challenged with a complex network of unstructured data, analytical reports and dashboards, as well as a patchwork of applications. As a result, understanding the relationship between quality and warranty problems in the field and upstream manufacturing processes is manual and error-prone.
Modern entity graph systems enable manufacturers to pick up on the early warning signals of quality problems, enabling higher customer satisfaction and lower warranty costs. They accomplish this by joining sources of data together and making connections within that data that they previously couldn’t. They can answer questions they previously thought weren’t even possible to ask.
What are entity extraction, disambiguation, and linking?
The first step to building an entity graph is creating a clear understanding of the types of entities that are important to your organization. This means ensuring you have an accurate, scalable entity extraction, disambiguation, and linking process in place.
For example, associating the entities in a graph with a specific identity is hard. It’s particularly hard when identifying entities from unstructured text.
How do we know, for instance, if a particular product issue referenced in a customer support call or ticket with regard to the customer’s “iPhone” refers to an “iPhone 15 Pro Max” vs “iPhone 14 Pro Max”? Linking the entity to a specific product is key to actionable downstream analytics.
These processes are a critical step in the data pipeline to creating a useful entity graph. Entity disambiguation strategies can infer the identity of an entity within unstructured text by using context clues to link the entity to known structured data.
Modern entity disambiguation and linking solutions rely on NLP, AI, and ML to automate processing. But they can also include “human in the loop” processes to ensure higher-quality disambiguation and connections. Based on configurable thresholds, they can automate connections and/or route them to human experts for review. This ensures the right balance between automation and quality.
Advanced entity disambiguation systems are cross-lingual, meaning they can recognize the same identity in multiple languages and assign labels in different languages to the same entity. By analyzing an entity’s context in the native language vs. translating the context text into a single language and then analyzing the context, modern entity linking systems avoid additional errors and degradation introduced via the translation process.
While an AI-powered automated system can perform many entity disambiguation processes, advanced systems enable “human in the loop” capabilities to augment the automated process.
Human experts, for instance, can merge entities into or split them from other entities. Likewise, subject matter experts may also establish “authoritative entities.” This last step with humans in the loop is critical to ensuring accuracy.
The entity linking score is an important step in the process. It’s easy here to introduce errors that undermine the entire effort. Advanced systems use AI to automate the scoring based on configurable, weighted factors, including the entity’s name, semantic context, entity coherence, and lexical similarity. These factors are typically weighted for the relevant domain.
For example, in the federal intelligence and fraud detection domains, you might weigh the name/alias more highly. By contrast, in a customer or technical support domain, you’d put more weight behind context and coherence.
Entity graph adoption
Early adopters for entity graphs have been varied. They include:
For example, in federal intelligence, entity graphs are useful in detecting the earliest signals of emerging national security threats from adversarial nations and organizations. Likewise, financial services organizations have used entity graphs to detect early activity from fraudsters.
Customer and technical support teams have similarly used entity graphs to detect emerging and trending technical support problems. In these use cases, entity graphs enable early warning systems and root-cause analysis.
Building an entity graph autonomously
Advanced entity intelligence platforms enable organizations to build entity graphs autonomously. These platforms ingest source information that is both unstructured and structured, at scale and across multiple languages.
An entity intelligence platform can automatically populate an entity graph through the following steps:
- Extract clean text from internal and/or external unstructured data sources
- Identify the relevant language(s)
- Extract, identify, and link entities in the texts to known entities
- Enable human-in-the-loop entity curation to augment the automated process
- Identify relationships between those identities and add them to the entity graph
- Classify and label the relationship type
Mining value from unstructured data
The entity graph is an important step in the data pipeline that makes previously difficult-to-access – or even inaccessible unstructured data – useful for further analytical processes. The entity graph establishes rich metadata you can use for downstream analytical or other processes. It also complements and extends the capabilities of enterprise knowledge graphs.
For example, you can use an entity graph to:
- Power early warning systems for customer problems
- Scale intelligence analysis to identify critical risks
- Drive issue tracking and resolution across channels and markets
- Enable root cause analysis at scale
- Power business intelligence dashboards and reports
- Enrich content in Knowledge Bases
- Seed data for Generative AI applications
- Complement search results
- … and many more
You can use it for many other analytical tasks beyond these to boot.
Fueling future innovations such as Generative AI requires large volumes of high-quality data. Agolo’s entity intelligence platform uses entity graphs to turn unstructured data into data you can analyze, query, and take action on. To see it in action, book a demo with our technical experts.