Post

From Query to Match: Designing a Similarity Engine for Dental Identification

From Query to Match: Designing a Similarity Engine for Dental Identification

Introduction

Most data systems are built around queries.

  • Input a condition
  • Retrieve matching records

This paradigm works well when the data is complete and the queries are precise.

However, forensic identification does not operate under such conditions.

In real-world scenarios:

  • Data is incomplete
  • Observations are noisy
  • Records are often inconsistent across time

This leads to a fundamental limitation:

Exact matching fails when exact data does not exist.

To address this, the problem must be reframed.

Identification is not a lookup problem.
It is a similarity problem under uncertainty.

This post explores how a forensic dental identification system transitions
from rule-based querying to a similarity-driven matching architecture.


1. From Query Systems to Matching Systems

The current system is built on structured dental records:

  • FDI-based tooth indexing
  • Surface-level encoding (O, M, D, B, L)
  • Hierarchical clinical codes (caries, prosthetics, etc.)

This enables:

  • Deterministic queries
  • Filter-based retrieval
  • Exact condition matching

However, these capabilities are insufficient for forensic use.

Consider the following case:

  • Ante-mortem (AM): partial dental record
  • Post-mortem (PM): degraded panoramic X-ray

Even if both belong to the same individual:

  • Missing teeth may differ
  • Prosthetic work may have changed
  • Imaging conditions introduce distortion

A rule-based query would fail to match these records.


2. Failure of Exact Matching

Rule-based systems rely on:

  • Explicit conditions
  • Predefined equivalence
  • Deterministic logic

These assumptions break down under:

2.1 Incomplete Data

Not all teeth or surfaces are observable.

2.2 Temporal Variation

Dental conditions evolve over time.

2.3 Representation Gap

Structured records and imaging data are fundamentally different modalities.

This leads to a critical insight:

Two records can represent the same individual
without sharing identical attributes.


3. Reframing the Problem

Instead of asking:

“Does this record exactly match another?”

The system must ask:

“How similar are these two records, given uncertainty?”

This transforms the system from:

  • A filtering engine
    → into
  • A ranking engine

The goal is no longer to retrieve exact matches,
but to rank candidate identities by similarity.


4. Designing a Similarity Engine

A similarity engine requires three core components:

4.1 Representation

Dental data must be transformed into a comparable form.

Instead of storing only structured fields,
each record is mapped into a feature representation.

Examples:

  • Tooth presence/absence vector
  • Surface condition encoding
  • Prosthetic patterns
  • Structural relationships between teeth

This creates a high-dimensional representation of each individual.


4.2 Similarity Function

A function must quantify how close two representations are.

Basic approaches:

  • Hamming distance (binary features)
  • Cosine similarity (vector space)
  • Weighted distance metrics (domain-informed)

However, manually defined similarity has limitations:

  • Cannot capture complex patterns
  • Sensitive to missing data
  • Difficult to generalize

4.3 Ranking Mechanism

Given a query (e.g., PM record),
the system must:

  1. Compare against all candidates
  2. Compute similarity scores
  3. Return a ranked list

This allows investigators to focus on
top-k likely matches, rather than exact hits.


5. Limitations of Feature-Based Similarity

Feature engineering introduces structure,
but also imposes constraints.

Problems include:

  • Loss of subtle spatial patterns
  • Inability to capture imaging nuances
  • Over-reliance on predefined features

This leads to an important conclusion:

Feature-based similarity is a transition step,
not the final solution.


6. Toward Representation Learning

To overcome these limitations,
the system must move toward learned representations.

Instead of defining similarity manually:

  • Learn embeddings from data
  • Map similar individuals closer in latent space
  • Separate dissimilar cases automatically

This is the domain of:

  • Metric learning
  • Contrastive learning
  • Deep representation learning

In this framework:

  • Each dental record becomes a vector
  • Identity similarity becomes geometric proximity

7. Integrating Imaging Data

A complete system must eventually unify:

  • Structured odontogram data
  • Panoramic X-ray images

This introduces multimodal challenges:

  • Different data distributions
  • Different noise characteristics
  • Different semantic meanings

However, it also creates opportunity:

Imaging data contains patterns
that structured data cannot capture.

The future system must learn to:

  • Align these modalities
  • Extract shared identity features
  • Improve robustness under missing data

8. System Implications

Transitioning to a similarity engine changes system design:

  • Database → vector store
  • Query → nearest-neighbor search
  • Filters → similarity thresholds

It also changes evaluation:

  • Accuracy → ranking quality
  • Precision/Recall → top-k retrieval performance

This is not an incremental improvement.

It is a paradigm shift.


Conclusion

Forensic dental identification cannot rely on exact matches.

The problem is inherently:

  • Incomplete
  • Noisy
  • Evolving

A system that demands perfect data will fail in imperfect reality.

The solution is not better rules,
but a better formulation:

From querying records
to matching identities through similarity.

This post marks the transition from structured data systems
to intelligent matching systems.

The next step is to move beyond handcrafted similarity
and toward learned representations—
where the system begins not just to store data,
but to understand it.

This post is licensed under CC BY 4.0 by the author.