W0471

A New Search Algorithm For Identifying Motifs in the CSD. J.A. Chisholm1, W.D.S Motherwell1, N. Feeder2, 1Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK. 2Pharmaceutical R&D, Pfizer Global R&D, Ramsgate Rd., Sandwich, Kent, CT13 9NJ, UK.

A search algorithm (3DSearch) is presented that can identify challenging extended chemical queries or motifs from 3D crystal structure coordinates. The Cambridge Structural Database (CSD) system already has a search program, Conquest1,2, which will find intermolecular contacts between specified atoms. However, the task of searching 3D coordinates can be problematic when the query pattern or motif becomes large and extended beyond first-neighbour contacts. For example, the task of finding a hydrogen bond motifs involving arrangements of 6-membered rings in the total CSD is slow, taking several days to complete the search.

The new algorithm combines graph matching and contact search methods within a depth first backtracking algorithm. This approach provides the ability to search, in an efficient and accurate manner, for general chemical queries that may contain several intermolecular contacts. Performance metrics are presented for two example searches: a hydrogen bond ‘tape’ motif (a pattern of four adjacent 6 membered rings), and a query representing the geometric arrangement of key functional groups on the surface of orthorhombic paracetamol. It is shown how such searches, that are out-with the capability of the existing search engine, can now be performed on the entire CSD in a matter of minutes.

1. F.H. Allen, “The Cambridge Structural Database...”, Acta Cryst. (2002), B58, 380-388.
2. I.J. Bruno et al. “New software for searching the Cambridge Structural Database...”, Acta Cryst., 2002, B58, 389-397.