W0471
A New Search Algorithm For Identifying Motifs in the CSD.
J.A. Chisholm1, W.D.S Motherwell1, N.
Feeder2, 1Cambridge Crystallographic Data Centre, 12 Union
Road, Cambridge CB2 1EZ, UK. 2Pharmaceutical R&D, Pfizer Global
R&D, Ramsgate Rd., Sandwich, Kent, CT13 9NJ, UK.
A search algorithm (3DSearch) is presented that can identify
challenging extended chemical queries or motifs from 3D crystal structure
coordinates. The Cambridge Structural Database (CSD) system already has a search
program, Conquest1,2, which will find
intermolecular contacts between specified atoms. However, the task of searching
3D coordinates can be problematic when the query pattern or motif becomes large
and extended beyond first-neighbour contacts. For example, the task of finding a
hydrogen bond motifs involving arrangements of 6-membered rings in the total CSD
is slow, taking several days to complete the search.
The new algorithm combines graph matching and contact search
methods within a depth first backtracking algorithm. This approach provides the
ability to search, in an efficient and accurate manner, for general chemical
queries that may contain several intermolecular contacts. Performance metrics
are presented for two example searches: a hydrogen bond ‘tape’ motif
(a pattern of four adjacent 6 membered rings), and a query representing the
geometric arrangement of key functional groups on the surface of orthorhombic
paracetamol. It is shown how such searches, that are out-with the capability of
the existing search engine, can now be performed on the entire CSD in a matter
of minutes.
1. F.H. Allen, “The Cambridge Structural
Database...”, Acta Cryst. (2002), B58, 380-388.
2. I.J. Bruno et al. “New software for searching the
Cambridge Structural Database...”, Acta Cryst., 2002,
B58, 389-397.