MIF Series

Thousands of exact duplicates in high-profile structural databases.

31st October 2025, 15:00 add to calender
Olga Anosova
Computer Science

Abstract

The talk discusses several databases of solid crystalline materials and protein structures, such as Google's GNoME and the Protein Data Bank (PDB), where every entry is given by dozens or hundreds of atomic positions in Euclidean or fractional coordinates with respect to a unit cell. Though independent experiments and non-trivial simulations are highly unlikely to produce identical numerical outputs, the papers in the journals IUCrJ 2024, MATCH 2025, Acta Cryst D 2025, and Pattern Recognition 2026 revealed thousands of exact duplicates with all x,y,z coordinates identical.
add to calender (including abstract)