From Query to Identification: Designing a Similarity Engine for Dental Matching
Introduction
In the previous post, I explored how dental records can be transformed into structured, queryable data.
However, querying alone does not solve the core problem of forensic identification.
The real question begins after retrieval:
Given two dental records, how do we determine whether they belong to the same individual?
This is fundamentally a matching problem under uncertainty.
In this post, I outline the design of a similarity engine for dental identification—
a system that quantifies how likely two records correspond to the same person.
The Challenge of Matching Dental Records
At first glance, matching may seem straightforward: compare two records and check if they are identical.
In reality, forensic data is rarely complete or clean.
Incomplete Data
- Missing teeth
- Partial records
- Degraded structures
Inconsistent Observations
- Variations in documentation
- Differences in interpretation
Asymmetric Information
- Postmortem (PM) data may be limited
- Antemortem (AM) records may be outdated
This leads to a critical constraint:
Matching must be performed even when data is partial, noisy, and asymmetric.
Why Naive Matching Fails
A naive approach might assign equal importance to all features:
- Matching crown → +1
- Matching filling → +1
- Missing vs present → 0
However, this fails for two reasons:
1. Not All Features Are Equal
- An implant is highly distinctive
- A simple filling is not
Treating them equally reduces accuracy.
2. Partial Matches Are Common
Exact matches are rare in real scenarios.
- Some teeth may be missing
- Some data may be unavailable
A system must still produce a meaningful result.
Designing a Similarity Score
The goal is to compute a score:
How similar are two dental records?
Weighted Feature Matching
Each feature is assigned a weight based on its importance:
- Implant → high weight
- Crown → medium weight
- Filling → lower weight
The similarity score becomes a weighted sum:
- Match on important features contributes more
- Minor discrepancies contribute less
Tooth-Level Comparison
Each tooth is evaluated independently:
- Presence / absence
- Surface-level findings
- Global attributes
This allows fine-grained analysis across the entire dentition.
Surface-Level Matching
Surface data (O, M, D, B, L) enables detailed comparison:
- Occlusal restorations
- Mesial/distal differences
This improves discrimination between otherwise similar cases.
Handling Partial and Missing Data
Real-world identification requires flexibility.
Partial Matching
If only a subset of teeth is available:
- Compare only overlapping data
- Normalize the score accordingly
Missing Data Tolerance
Instead of penalizing missing data:
- Ignore unknown fields
- Focus on available evidence
Asymmetric Matching
PM and AM records often differ in completeness.
The system must support:
- Unequal data sizes
- Directional comparison
From Score to Decision
The similarity score itself is not the final answer.
It is a tool for decision-making.
- High score → strong candidate
- Medium score → requires expert review
- Low score → unlikely match
This preserves the role of forensic experts while enhancing efficiency.
Why This Matters
This approach fundamentally changes the identification process:
- Reduces manual comparison effort
- Enables large-scale matching
- Introduces reproducible logic
Most importantly:
Identification becomes a quantifiable problem, not purely a subjective judgment.
Toward AI-Assisted Matching
Once similarity scoring is defined:
- Feature vectors can be constructed
- Machine learning models can be trained
- Matching can be partially automated
The similarity engine becomes the foundation for:
- Pattern recognition
- Predictive identification
- Decision support systems
Conclusion
Querying allows us to find relevant data.
Similarity scoring allows us to interpret it.
Identification is not just about retrieving records,
but about reasoning over them.
By transforming dental records into a structured and comparable format,
we move closer to a system where forensic identification is not only possible,
but scalable, consistent, and computationally grounded.