From Query to Match: Designing a Similarity Engine for Dental Identification

Posted Apr 20, 2026

By Dong Jun Lee

4 min read

Introduction

Most data systems are built around queries.

Input a condition
Retrieve matching records

This paradigm works well when the data is complete and the queries are precise.

However, forensic identification does not operate under such conditions.

In real-world scenarios:

Data is incomplete
Observations are noisy
Records are often inconsistent across time

This leads to a fundamental limitation:

Exact matching fails when exact data does not exist.

To address this, the problem must be reframed.

Identification is not a lookup problem.
It is a similarity problem under uncertainty.

This post explores how a forensic dental identification system transitions
from rule-based querying to a similarity-driven matching architecture.

1. From Query Systems to Matching Systems

The current system is built on structured dental records:

FDI-based tooth indexing
Surface-level encoding (O, M, D, B, L)
Hierarchical clinical codes (caries, prosthetics, etc.)

This enables:

Deterministic queries
Filter-based retrieval
Exact condition matching

However, these capabilities are insufficient for forensic use.

Consider the following case:

Ante-mortem (AM): partial dental record
Post-mortem (PM): degraded panoramic X-ray

Even if both belong to the same individual:

Missing teeth may differ
Prosthetic work may have changed
Imaging conditions introduce distortion

A rule-based query would fail to match these records.

2. Failure of Exact Matching

Rule-based systems rely on:

Explicit conditions
Predefined equivalence
Deterministic logic

These assumptions break down under:

2.1 Incomplete Data

Not all teeth or surfaces are observable.

2.2 Temporal Variation

Dental conditions evolve over time.

2.3 Representation Gap

Structured records and imaging data are fundamentally different modalities.

This leads to a critical insight:

Two records can represent the same individual
without sharing identical attributes.

3. Reframing the Problem

Instead of asking:

“Does this record exactly match another?”

The system must ask:

“How similar are these two records, given uncertainty?”

This transforms the system from:

A filtering engine
→ into
A ranking engine

The goal is no longer to retrieve exact matches,
but to rank candidate identities by similarity.

4. Designing a Similarity Engine

A similarity engine requires three core components:

4.1 Representation

Dental data must be transformed into a comparable form.

Instead of storing only structured fields,
each record is mapped into a feature representation.

Examples:

Tooth presence/absence vector
Surface condition encoding
Prosthetic patterns
Structural relationships between teeth

This creates a high-dimensional representation of each individual.

4.2 Similarity Function

A function must quantify how close two representations are.

Basic approaches:

Hamming distance (binary features)
Cosine similarity (vector space)
Weighted distance metrics (domain-informed)

However, manually defined similarity has limitations:

Cannot capture complex patterns
Sensitive to missing data
Difficult to generalize

4.3 Ranking Mechanism

Given a query (e.g., PM record),
the system must:

Compare against all candidates
Compute similarity scores
Return a ranked list

This allows investigators to focus on
top-k likely matches, rather than exact hits.

5. Limitations of Feature-Based Similarity

Feature engineering introduces structure,
but also imposes constraints.

Problems include:

Loss of subtle spatial patterns
Inability to capture imaging nuances
Over-reliance on predefined features

This leads to an important conclusion:

Feature-based similarity is a transition step,
not the final solution.

6. Toward Representation Learning

To overcome these limitations,
the system must move toward learned representations.

Instead of defining similarity manually:

Learn embeddings from data
Map similar individuals closer in latent space
Separate dissimilar cases automatically

This is the domain of:

Metric learning
Contrastive learning
Deep representation learning

In this framework:

Each dental record becomes a vector
Identity similarity becomes geometric proximity

7. Integrating Imaging Data

A complete system must eventually unify:

Structured odontogram data
Panoramic X-ray images

This introduces multimodal challenges:

Different data distributions
Different noise characteristics
Different semantic meanings

However, it also creates opportunity:

Imaging data contains patterns
that structured data cannot capture.

The future system must learn to:

Align these modalities
Extract shared identity features
Improve robustness under missing data

8. System Implications

Transitioning to a similarity engine changes system design:

Database → vector store
Query → nearest-neighbor search
Filters → similarity thresholds

It also changes evaluation:

Accuracy → ranking quality
Precision/Recall → top-k retrieval performance

This is not an incremental improvement.

It is a paradigm shift.

Conclusion

Forensic dental identification cannot rely on exact matches.

The problem is inherently:

Incomplete
Noisy
Evolving

A system that demands perfect data will fail in imperfect reality.

The solution is not better rules,
but a better formulation:

From querying records
to matching identities through similarity.

This post marks the transition from structured data systems
to intelligent matching systems.

The next step is to move beyond handcrafted similarity
and toward learned representations—
where the system begins not just to store data,
but to understand it.

AI, Dentistry, Software Engineering

This post is licensed under CC BY 4.0 by the author.