From Query to Identification: Designing a Similarity Engine for Dental Matching

Posted Apr 3, 2026

By Dong Jun Lee

3 min read

Introduction

In the previous post, I explored how dental records can be transformed into structured, queryable data.
However, querying alone does not solve the core problem of forensic identification.

The real question begins after retrieval:

Given two dental records, how do we determine whether they belong to the same individual?

This is fundamentally a matching problem under uncertainty.

In this post, I outline the design of a similarity engine for dental identification—
a system that quantifies how likely two records correspond to the same person.

The Challenge of Matching Dental Records

At first glance, matching may seem straightforward: compare two records and check if they are identical.

In reality, forensic data is rarely complete or clean.

Incomplete Data

Missing teeth
Partial records
Degraded structures

Inconsistent Observations

Variations in documentation
Differences in interpretation

Asymmetric Information

Postmortem (PM) data may be limited
Antemortem (AM) records may be outdated

This leads to a critical constraint:

Matching must be performed even when data is partial, noisy, and asymmetric.

Why Naive Matching Fails

A naive approach might assign equal importance to all features:

Matching crown → +1
Matching filling → +1
Missing vs present → 0

However, this fails for two reasons:

1. Not All Features Are Equal

An implant is highly distinctive
A simple filling is not

Treating them equally reduces accuracy.

2. Partial Matches Are Common

Exact matches are rare in real scenarios.

Some teeth may be missing
Some data may be unavailable

A system must still produce a meaningful result.

Designing a Similarity Score

The goal is to compute a score:

How similar are two dental records?

Weighted Feature Matching

Each feature is assigned a weight based on its importance:

Implant → high weight
Crown → medium weight
Filling → lower weight

The similarity score becomes a weighted sum:

Match on important features contributes more
Minor discrepancies contribute less

Tooth-Level Comparison

Each tooth is evaluated independently:

Presence / absence
Surface-level findings
Global attributes

This allows fine-grained analysis across the entire dentition.

Surface-Level Matching

Surface data (O, M, D, B, L) enables detailed comparison:

Occlusal restorations
Mesial/distal differences

This improves discrimination between otherwise similar cases.

Handling Partial and Missing Data

Real-world identification requires flexibility.

Partial Matching

If only a subset of teeth is available:

Compare only overlapping data
Normalize the score accordingly

Missing Data Tolerance

Instead of penalizing missing data:

Ignore unknown fields
Focus on available evidence

Asymmetric Matching

PM and AM records often differ in completeness.

The system must support:

Unequal data sizes
Directional comparison

From Score to Decision

The similarity score itself is not the final answer.

It is a tool for decision-making.

High score → strong candidate
Medium score → requires expert review
Low score → unlikely match

This preserves the role of forensic experts while enhancing efficiency.

Why This Matters

This approach fundamentally changes the identification process:

Reduces manual comparison effort
Enables large-scale matching
Introduces reproducible logic

Most importantly:

Identification becomes a quantifiable problem, not purely a subjective judgment.

Toward AI-Assisted Matching

Once similarity scoring is defined:

Feature vectors can be constructed
Machine learning models can be trained
Matching can be partially automated

The similarity engine becomes the foundation for:

Pattern recognition
Predictive identification
Decision support systems

Conclusion

Querying allows us to find relevant data.
Similarity scoring allows us to interpret it.

Identification is not just about retrieving records,
but about reasoning over them.

By transforming dental records into a structured and comparable format,
we move closer to a system where forensic identification is not only possible,
but scalable, consistent, and computationally grounded.

AI, Dentistry, Software Engineering

This post is licensed under CC BY 4.0 by the author.