/* Target unordered list (bullets) in the Rich Text Block */ .Blog-Rich-Text ul { list-style-type: disc; color: #0000FF; /* Change this color to your desired bullet color */ } /* Target ordered list (numbers) in the Rich Text Block */ .Blog-Rich-Text ol { color: #FF0000; /* Change this color to your desired number color */ list-style-type: decimal; /* Customize list type */ } /* Target the list items */ .Blog-Rich-Text ul li, .Blog-Rich-Text ol li { font-size: 18px; /* Customize the font size for list items */ line-height: 1.6; }
12 October, 2024

Introduction to Hypercube’s Ontology and Reasoning Engine

 Mendel's Generative Ontology -- The First Reasoning-First Clinical Ontology

Large Language Models have the potential to revolutionize healthcare by enabling natural language understanding, generating clinical insights, and supporting decision-making. However, LLMs come with many shortcomings that make it harder for clinical applications to truly go beyond the hype. These include hallucinations, lack of explainability, and, more critically, poor reasoning capability. They cannot connect disparate pieces of information to arrive at novel insights or to address complex multi-hop questions.

The good news is that coupling large language models with a knowledge representation of the clinical domain can solve these challenges, making AI a reality in a trillion-dollar industry. This structured representation enables a clear understanding of the domain and defines the relationships and constraints that govern the entities and their interactions — it facilitates reasoning and inference.

The bad news: A clinical ontology is the backbone of these representations, and today’s current standard biomedical ontologies aren’t ready to meet the challenge.

What is an ontology?

Healthcare professionals interact with ontologies every day. In AI research, an “ontology” simply refers to a representation of a “domain”: ontologies exist to represent information about everything from language to finance to business to medicine. These ontologies are meant to store and reason about information collected in everyday situations.

Standard biomedical ontologies include:

  1. ICD-10, used to represent diagnoses and procedures.
  2. RxNorm, used to represent prescription information.
  3. SNOMED CT, used to represent general healthcare information.

Concepts within ontologies are typically represented and created hierarchically. This allows efficient search and retrieval for certain concepts within the ontology, including for the professionals who interact with them. For example, everything in ICD-10-CM beginning with the letter ‘C’ is a neoplastic condition; and everything beginning with ‘C34’ is a code for a lung cancer diagnosis. 

There are many additional terminologies used in healthcare settings that are not generally considered ontologies. For example, CPT procedure codes are a controlled, pre-defined list of procedure concepts commonly used administratively. Terminologies store information but do not enable sustained reasoning about the structure of information received or stored. 

Both terminologies and ontologies play a key role in healthcare interoperability. Terminologies are needed to control the thousands of different ways that healthcare professionals can enter diagnoses, procedures, and prescriptions into computers for both administrative and research purposes. But for computer systems to leverage information for more than just documentation, they require the kind of structure and rules provided by ontologies. In SNOMED CT it is simple to identify whether one of hundreds of SNOMED CT codes is meant to refer to a body part because all SNOMED CT codes for body parts share a common parent, ‘38866009’, meaning body part. All downstream use cases for healthcare information across population health, clinical research, quality management, commercial monitoring, and real-world evidence generation depend on the quality of the reasoning the ontology inherently enables.

The Shortcomings of Current Ontologies:

The power of an ontology is directly related to the styles of reasoning that it can enable. Unfortunately, current Ontologies generally face major challenges regarding scalability. Integrating new information without disrupting existing structures poses a significant hurdle, affecting the overall utility and adaptability of ontologies in dynamic fields such as medicine. Sometimes new use cases or scientific advances can provoke a complete remodeling of the ontology. Additionally, the reliance on manual processes to update and maintain these systems further complicates scalability. Manual interventions slow the adaptation process, increase error potential, and require extensive human oversight, hindering the efficient scaling of ontologies to meet evolving data demands.

In healthcare, there are more significant challenges.

A  medical ontology has two main roles:

  1. Providing a standard vocabulary for administrative purposes.
  2. Supporting medical reasoning through structured knowledge representation.

However, standard medical ontologies primarily cater to administrative needs, prioritizing documentation over reasoning: and, therefore, other use cases. Many ontologies degrade over time into resembling and have evolved into extensive vocabularies. This focus on including as many concepts as possible rather than facilitating efficient reasoning has led to bloated ontologies with several issues:

  1. Inconsistent and fragmented representations of disease concepts across different ontologies and databases.
  2. Redundant disease entries that do not correlate with clinical symptoms or disease subtypes.
  3. Inconsistent disease naming standards and variable naming conventions.
  4. Difficulties in aligning structured disease entities with the unstructured disease names found in medical texts and records.
  5. Challenges in defining diseases as distinct entities due to clinical variability and inconsistent ontological depiction.
  6. Incomplete coverage of abstract or general categories of disease.
  7. Poor representation of complex representations of disease state found in clinical text

Let’s illustrate these issues  by taking lung cancer as an example. In real world settings, lung cancer is thought of as a collection of diseases partly defined by morphology, partly defined by stage, and partly defined by core oncologic biomarkers. A basic description of a lung cancer type that makes sense in research, clinical, and commercial settings is something like: advanced NSCLC with an EGFR mutation. 

Immediately there are difficulties in using standard ontologies to represent this. ICD-10-CM is most commonly used to encode the diagnosis of malignancy, but the billing code contains no information about stage, morphology, or biomarkers. SNOMED CT does have a term for “NSCLC with an EGFR mutation” as a child of its term for NSCLC, but it only has ALK and ROS1 as specific biomarker-driven additional terms in its hierarchy. It does not include PD-L1 or KRAS or BRAF or MET or RET or NTRK, which are also core biomarkers salient for clinical decision-making in NSCLC. These fragmented and inconsistent representations make it very difficult to align to these ontologies and reason over them effectively.

And then in routine analytics practice, there are concepts and terms that are not reasonable to include in standard ontologies for transacting biomedical data. Say I want to identify all patients with a solid tumor with an EGFR mutation for a study I want to run. This term isn’t in SNOMED CT or ICD-10-CM at all! Analysts are forced to define these concepts ad hoc based on value sets from the originating ontologies, which costs time and research effort. These limitations complicate disease data integration across diverse datasets, undermining the potential of advanced language models in precision medicine.

Today’s Inadequate Workarounds

Because of the limitations of biomedical ontologies, their users are forced to curate additional capabilities to gain useful insights from biomedical data. The simplest version of this is creating value sets intended to reflect a certain clinical concept in medical data. Suppose I’m reading a clinical trial inclusion criterion for patients diagnosed with a solid tumor malignancy, and I want to define that criterion in medical data. But “solid tumor” doesn’t exist as a concept in either ICD-10-CM or in SNOMED CT, and many biomedical data sets contain diagnosis information represented in both ontologies. In order to then define this condition, users are required to curate and maintain lists of terms based on the original ontologies representing this new concept. In effect, this makes everyone working with medical data today an ontology engineer.

 But even in the best case, this creates significant operational overhead for teams tasked with generating value from biomedical data. Many teams end up maintaining multiple value sets for the same clinical concepts and lack tooling for tracking when concepts have dependencies or relationships with one another. We reviewed the NLM’s Value Set Authority Center used for electronic clinical quality measurement and found multiple value sets for the same clinical concept – a simple one, like cancer-directed chemotherapy, can have multiple definitions even from the same maintainer. These are flat lists that do not have relationships with other related clinical concepts (like targeted therapies), multiplying the amount of administrative overhead associated with managing these concepts.

Figure: Chemotherapy value sets identified in the NLM VSAC.

Mendel’s ontology is designed to overcome these limitations.

Mendel's Generative Ontology: A Paradigm Shift:

From day one, Mendel's vision has been to create machines that can reason and mimic physicians' cognitive abilities. We approach this by combining large language models (LLMs) with advanced knowledge representations like hypergraphs. In pursuing this vision, we identified two critical challenges: improving the scalability of ontologies and redefining clinical ontologies specifically. 

Figure: Mendel Hypergraph

We need a generative ontology that scales, especially for a complex domain like medicine. We need a Clinical Generative Ontology.  This has been our focus for the past few years, and the results outperform any LLM-only approach or an LLM coupled with standard ontologies.

The Principles

When we initially set out to develop a scalable, reasoning-ready ontology for the clinical domain, we established three core guiding principles to ensure we didn't end up with an inflexible administrative ontology like in previous attempts.

1. REDUCTIONISM: LESS IS MORE

Drawing inspiration from the periodic table and the Standard Model in physics, Mendel breaks down concepts into the smallest conceptual units called "Concemes." Just as there is no "Water" on the periodic table, only hydrogen and oxygen, Mendel's ontology focuses on the fundamental building blocks of medical knowledge.

2. RECONSTRUCTING COMPLEX THOUGHTS

With Concemes as the foundation, Mendel's Generative Ontology introduces laws that govern the combination of these elemental units into complex thoughts expressible in human language. New properties emerge at the concept level as Concemes interact, enabling the representation of any medical concept or thought.

3. EVENT-ROOTED REPRESENTATION

Mendel's Generative Ontology is rooted in an event model, ensuring that every thought expressed is anchored to time and space (the patient's body). This allows for representing a patient's journey as a collection of clinical events that interact through a causality network. Mendel calls this ontology an Event-Rooted Generative Ontology (ERGO).

The Architecture:

1. CLINICAL DATA MODEL (CDM)

Our CDM represents all of medicine on a high level and dives deeper into cancer. 

It defines clinical events (e.g., Medication) and their properties/characteristics (e.g., Dose, Drug Name) like any data model would do; however, to facilitate effective reasoning, our CDM, unlike other CDMs, defines how these properties behave and change over time as well as what categories of values they may hold and how they should be restricted.  

2. CONCEPTUAL FRAMES (CFS)

This is a proprietary knowledge representation tool. A CF defines how events and properties interact, influence, and restrict each other within and across events. For example, a CF can define how targeted therapy (e.g., Tagrisso) can treat a cancer (e.g., lung cancer with NSCC morphology) by targeting a specific biomarker (e.g., EGFR) or how a response to a treatment (e.g., Partial Response) is recorded when a change in a disease's properties (e.g., decrease in size of a tumor belonging to a cancer) is caused by a treatment. 

The formalism also defines operators on CFs that allow for extending them or merging them to create more complex CFs with higher representation power. 

CFs are narrow perspectives on clinical processes that provide the model pieces of the puzzle and allow it to piece them together from the bottom up to a coherent, consistent, and comprehensive patient journey.

3. MICRO-ONTOLOGIES (MOS)

MOs are specialized hierarchies (isA and hasA, similar to standard ontologies) that contain the vocabulary used in the above components (e.g., the values assigned to properties in CDM or the restrictions in conceptual frames). 

Each MO specializes in a specific domain, such as drugs, morphologies, types of beam arrangement in radiation therapy, surgeries, outcomes, and interpretations of test results. 

Unlike standard ontologies, these MOs do not contain concepts; instead, they contain what we call "Concemes," which we define as the smallest conceptual units to represent knowledge. Concepts, then, are broken into their elemental concemes; e.g., the concept "Breast Carcinoma" is mapped into a "Neoplasm" clinical event (from our CDM) and broken into "Breast" as a primary site, and "Carcinoma" as its morphology (restricted only to breast cancer morphologies) and implies a "Malignant" behavior. “Breast”, “Carcinoma”, and “Malignant” are concemes that belong to the Body MO, Morphology MO, and Behavior MO, respectively.

This reductionism approach not only gives our ontology more expressive power to represent new concepts but also allows our clinical experts to maintain and expand much smaller ontologies via this divide-and-conquer strategy.

To create these MOs, we studied the internal representations of dozens of standard medical ontologies, compared them to each other, and chose the subset that makes the most sense for each domain. Then we created what we call "seed" micro-ontologies by borrowing from the standard ontologies while fixing their errors and expanding them while maintaining the capacity to map back to them. 

These seed MOs are then expanded using the process defined below.

Continuous Symbolic Learning

  • All of our symbolic AI resources (CDM, CFs, and micro-ontologies) go through the following iterative improvement process:
  • Handcrafting the symbolic resources (Top-down approach): We have a full-time offshore team of clinical experts (physicians and pharmacists) who review and expand our symbolic AI resources such as the micro-ontologies and conceptual frames. 
  • Automatic "second guessing" of the ontology (Bottom-up approach): We have developed proprietary ML algorithms that sift through medical records, literature, and web resources, compare them to our ontology, and recommend them to our clinical experts to either correct or expand the ontology.

Emergentism in action:

Consider a standard ICD-10-CM code like C34.31, Malignant neoplasm of lower lobe, right bronchus or lung. In most biomedical data models this code is stored as a “diagnosis” or “problem” on its own. 

Now imagine I want to identify the following cohorts:

  1. Show me patients with cancer
  2. Show me patients with lung cancer
  3. Show me patients with cancer of the respiratory system
  4. Show me patients with cancer of the right lung
  5. Show me patients with cancer in the lower lobe of any lung

Each of these queries specifies novel concepts that would require creating a specific list of ICD-10-CM codes, some simple, some less so, in order to answer the overall question. At Mendel we support efficient translation across these concepts through creating specific models for each of the following dimensions of the complex thought embedded in the ICD code. In this case, there is information about a neoplastic condition in a certain location of the body. 

Our representation allows us to separately store:

  • Conditionsome text
    • Neoplastic conditionsome text
      • Behavior of neoplastic condition
      • Site of neoplastic conditionsome text
        • Region of the body 
        • Organ system
        • Organ
        • Body part/part of organ
        • Laterality

Because of this complete decomposition, we can emergently cover a wide range of clinical concepts automatically within the system.

Efficient Reasoning at Scale:

ERGO is not only designed for effective knowledge representation but also coupled with efficient reasoning algorithms. These algorithms can scale to millions of patient journeys in a second, enabling large-scale reasoning that powers Mendel's Hypercube platform.

Ontology-Powered Reasoning in Action:

Mendel enriches patient journeys through the application of expert clinical reasoning at scale. Mendel’s system knows that a patient’s cancer is distinctly metastatic based only on a pathology report documenting pleural invasion of lung cancer simply because it knows that in the AJCC staging guidelines, this means the M stage of the cancer is M1a, which is distinctly metastatic in the same guidelines. It knows that a phrase like “triple negative breast cancer” implies several facts about the cancer and Her2/ER/PR status. And it is able to understand that seeing facts like “lung cancer,” “Tagrisso,” and “EGFR mutation” together makes sense clinically, and to uprank AI-driven extraction of these facts from text to improve accuracy.

Our system can do this both at the patient level and also at the population level.

Hypercube Efficiency

Statisticians were given tasks to define cohorts reflecting real-world use cases (trial matching, treatment eligibility, diagnostic eligibility) using various AI platforms.

Query Execution Time: Benchmark

Query Complexity: Benchmark

Conclusion:

Mendel's Generative Ontology represents a significant advancement in knowledge representation for healthcare. By adopting the principles of reductionism and emergentism and anchoring knowledge in an event-rooted model, Mendel has created the first reasoning-first clinical ontology. ERGO not only addresses the limitations of current ontologies but also enables efficient and effective reasoning at scale.

Ready to reason with your clinical data?

Get in touch with Mendel today to see how our AI-driven solutions can accelerate your recruitment and improve trial outcomes. Contact us and discover how Mendel’s Hypercube can streamline your clinical trial processes.

Email us at marketing@mendel.ai

The Feed