MIF Series
Thousands of exact duplicates in high-profile structural databases.
31st October 2025, 15:00
![]()
Olga Anosova
Computer Science
Abstract
The talk discusses several databases of solid crystalline materials and protein structures, such as Google's GNoME and the Protein Data Bank (PDB), where every entry is given by dozens or hundreds of atomic positions in Euclidean or fractional coordinates with respect to a unit cell. Though independent experiments and non-trivial simulations are highly unlikely to produce identical numerical outputs, the papers in the journals IUCrJ 2024, MATCH 2025, Acta Cryst D 2025, and Pattern Recognition 2026 revealed thousands of exact duplicates with all x,y,z coordinates identical.![]()
School of Computer Science & Informatics
,
University of Liverpool
Ashton Street, Liverpool, L69 3BX
United Kingdom
Ashton Street, Liverpool, L69 3BX
United Kingdom
+44 (0)151 795 4275
Call the school
+44 (0)151 795 4275