W0095

Automatic Classification of Protein Crystallization Screens on 1536-well Plates. I. Jurisica1, C. Cumbaa1, A. Lauricella2, N. Fehrman2, C.Veatch2, R. Collins2, J. Luft2, G. DeTitta2, 1Ontario Cancer Inst./PMH, 610 University Ave., Toronto, ON M5S2M9, 2Hauptman-Woodward Institute, 73 High St., Buffalo, NY 14203-1196.

Utilizing high-throughput protein crystallization screening will help to eliminate protein crystallization as a bottleneck in modern structural biology. The challenge is systematic and automated computational analysis of the resulting data deluge.

Our technique for automatic classification of microbatch protein crystallization experiments on 1536-well plates addresses the analysis problems introduced at the sub-microlitre scale, including non-uniform lighting and irregular droplet boundaries.

Image segmentation is applied to separate the droplet from the well, using a loopy Bayes net with a two-layered grid topology. Resulting images are analyzed to extract a 23-element feature vector from each droplet contents using the Radon transform for straight edge features and a set of correlation filters for microcrystalline features. Image classification is performed using a linear discriminant analysis on image feature vectors. Currently, the system automatically classifies images into crystal, clear and precipitates categories.

We compared the results of our automatic protein crystallization image classification with those of a human expert on 18 plates (27648 images). Using the human-labeled images as ground truth, our method classifies images with 89% accuracy and a ROC score of 0.875. This result compares well with the experimental repeatability rate assessed at 87%.

There are several profound findings from this validated analysis. First, the accuracy is dependent on the number of crystals on a given plate. Second, there is an interesting pattern to false positives and negatives. False positives are drops with particles that look like microcrystals, or wrinkles in the skin that resemble crystal edges. False negatives are crystals too fine for detection or crystals without straight edges.

A characterization of these misclassifications suggests directions for improving the method. Important new extension will integrate data mining of historical information in order to increase specificity, while keeping sensitivity high