Post

From Query to Identification: Designing a Similarity Engine for Dental Matching

From Query to Identification: Designing a Similarity Engine for Dental Matching

Introduction

In the previous post, I explored how dental records can be transformed into structured, queryable data.
However, querying alone does not solve the core problem of forensic identification.

The real question begins after retrieval:

Given two dental records, how do we determine whether they belong to the same individual?

This is fundamentally a matching problem under uncertainty.

In this post, I outline the design of a similarity engine for dental identification—
a system that quantifies how likely two records correspond to the same person.


The Challenge of Matching Dental Records

At first glance, matching may seem straightforward: compare two records and check if they are identical.

In reality, forensic data is rarely complete or clean.

Incomplete Data

  • Missing teeth
  • Partial records
  • Degraded structures

Inconsistent Observations

  • Variations in documentation
  • Differences in interpretation

Asymmetric Information

  • Postmortem (PM) data may be limited
  • Antemortem (AM) records may be outdated

This leads to a critical constraint:

Matching must be performed even when data is partial, noisy, and asymmetric.


Why Naive Matching Fails

A naive approach might assign equal importance to all features:

  • Matching crown → +1
  • Matching filling → +1
  • Missing vs present → 0

However, this fails for two reasons:

1. Not All Features Are Equal

  • An implant is highly distinctive
  • A simple filling is not

Treating them equally reduces accuracy.


2. Partial Matches Are Common

Exact matches are rare in real scenarios.

  • Some teeth may be missing
  • Some data may be unavailable

A system must still produce a meaningful result.


Designing a Similarity Score

The goal is to compute a score:

How similar are two dental records?


Weighted Feature Matching

Each feature is assigned a weight based on its importance:

  • Implant → high weight
  • Crown → medium weight
  • Filling → lower weight

The similarity score becomes a weighted sum:

  • Match on important features contributes more
  • Minor discrepancies contribute less

Tooth-Level Comparison

Each tooth is evaluated independently:

  • Presence / absence
  • Surface-level findings
  • Global attributes

This allows fine-grained analysis across the entire dentition.


Surface-Level Matching

Surface data (O, M, D, B, L) enables detailed comparison:

  • Occlusal restorations
  • Mesial/distal differences

This improves discrimination between otherwise similar cases.


Handling Partial and Missing Data

Real-world identification requires flexibility.

Partial Matching

If only a subset of teeth is available:

  • Compare only overlapping data
  • Normalize the score accordingly

Missing Data Tolerance

Instead of penalizing missing data:

  • Ignore unknown fields
  • Focus on available evidence

Asymmetric Matching

PM and AM records often differ in completeness.

The system must support:

  • Unequal data sizes
  • Directional comparison

From Score to Decision

The similarity score itself is not the final answer.

It is a tool for decision-making.

  • High score → strong candidate
  • Medium score → requires expert review
  • Low score → unlikely match

This preserves the role of forensic experts while enhancing efficiency.


Why This Matters

This approach fundamentally changes the identification process:

  • Reduces manual comparison effort
  • Enables large-scale matching
  • Introduces reproducible logic

Most importantly:

Identification becomes a quantifiable problem, not purely a subjective judgment.


Toward AI-Assisted Matching

Once similarity scoring is defined:

  • Feature vectors can be constructed
  • Machine learning models can be trained
  • Matching can be partially automated

The similarity engine becomes the foundation for:

  • Pattern recognition
  • Predictive identification
  • Decision support systems

Conclusion

Querying allows us to find relevant data.
Similarity scoring allows us to interpret it.

Identification is not just about retrieving records,
but about reasoning over them.

By transforming dental records into a structured and comparable format,
we move closer to a system where forensic identification is not only possible,
but scalable, consistent, and computationally grounded.

This post is licensed under CC BY 4.0 by the author.