Journal of Chemical Information and Modeling Volume 38 Issue 3 1998 (Doi 10.1021/ci970429i) Lewell, X.Q. Judd, D.B. Watson, S.P. Hann, M.M. - RECAP-Retrosynthetic Combinatorial Analysis Procedur
Journal of Chemical Information and Modeling Volume 38 Issue 3 1998 (Doi 10.1021/ci970429i) Lewell, X.Q. Judd, D.B. Watson, S.P. Hann, M.M. - RECAP-Retrosynthetic Combinatorial Analysis Procedur
Journal of Chemical Information and Modeling Volume 38 Issue 3 1998 (Doi 10.1021/ci970429i) Lewell, X.Q. Judd, D.B. Watson, S.P. Hann, M.M. - RECAP-Retrosynthetic Combinatorial Analysis Procedur
Xiao Qing Lewell,* Duncan B. Judd, Stephen P. Watson, and Michael M. Hann
Glaxo Wellcome Research and Development, Medicines Research Centre, Gunnels Wood Road,
Stevenage, Hertfordshire SG1 2NY UK
The use of combinatorial chemistry for the generation of new lead molecules is now a well established
strategy in the drug discovery process. Central to the use of combinatorial chemistry is the design and
availability of high quality building blocks which are likely to afford hits from the libraries that they generate.
Herein we describe RECAP (Retrosynthetic Combinatorial Analysis Procedure), a new computational
technique designed to address this building block issue. RECAP electronically fragments molecules based
on chemical knowledge. When applied to databases of biologically active molecules this allows the
identification of building block fragments rich in biologically recognized elements and privileged motifs
and structures. This allows the design of building blocks and the synthesis of libraries rich in biological
motifs. Application of RECAP to the Derwent World Drug Index (WDI) and the molecular fragments/
building blocks that this generates are discussed. We also describe a WDI fragment knowledge base which
we have built which stores the drug motifs and mention its potential application in structure based drug
design programs.
Figure 8. Examples of 1- through 6-connection fragments. Isotopic labels denote atom environment (see text). Numbers in a header box
denote the number of occurrences within the WDI collection.
ing methods should be viewed as complementary methods types: those that have preformed core templates, in which
for identifying drug motifs and may be used to such monomers are reacted with the templates to form products,
effect. and those in which the cores are formed in situ by reacting
The above examples illustrate how RECAP can be used monomers together to give final products. In either case,
to identify specific or generic drug motifs or substructures. the qualities of the monomers and cores are important for
For each of the fragments generated from WDI, we can the successful outcome of screening the libraries. We have
identify those that are frequently occurring in particular used RECAP as an approach to identify target oriented
therapeutic classes and therefore may be termed as drug monomers and core templates.
motifs. These motifs are clearly useful in designing target Again we can illustrate this with the WDI results. Figure
biased libraries. 1 shows the process from fragmentation of structures to
Library Design using RECAP Fragments. An impor- building block design based on the motifs identified.
tant application of RECAP is to identify potential building In the process of fragmentation, fragments with molecular
blocks (monomers and core templates) for library synthesis. weights that are appropriate for monomers are ideally desired.
Combinatorial chemical libraries consist principally of two Figure 7 shows the peak distribution position of molecular
RECAPsRETROSYNTHETIC COMBINATORIAL ANALYSIS PROCEDURE J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 517
Figure 9. Examples of some WDI fragments with their top therapeutic classes. The first number in a header box denotes the total occurrence
of the fragment within WDI collection, whereas the second denotes the occurrence within the labeled therapeutic class.
518 J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 LEWELL ET AL.
Figure 12. A cluster of indole-containing fragments. Isotopic label 1 denotes the nitrogen came from an amide environment. Header
box numbers denote occurrence in the WDI collection
Table 2. WDI Fragment Knowledge Base Data Types
data type meaning
SMILES represents the chemical structure of a fragment
FREQ represents total number of occurrences of the
fragment within the subset of WDI having
biological activity keywords
CLASS represents the biological activity class the
fragment is associated with
Figure 13. Transforming target fragment to a target monomer and CLASS FREQ represents the number of occurrences of the
search for most close monomers. fragment within a biological class
CLASS ORDER represents the orders in which the fragment most
frequently occurrs in different biological classes
fragments from molecules made in the past and may be CONNECTION represents the number of connection points within
biased by the number of active analogues in a database. a fragment in the original uncleaved structure
However through different recombination of these structural
fragments using combinatorial library technology, it is hoped
that new and potentially novel molecules will be produced Other applications of the RECAP technique include data
which will yield higher success rate in lead generation and mining of commercial databases and commercial supplies
optimization programs compared to the random approach. to identify constituent building blocks.
RECAPsRETROSYNTHETIC COMBINATORIAL ANALYSIS PROCEDURE J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 521
his group for our earlier collaboration on CAESA modifica- about chemical reactions. J. Chem. Inf. Comput. Sci. 1990, 30(4), 467-
tions where some of the ideas have been implemented, and 76.
(11) Ridings, J. E.; Barratt, M. D.; Cary, R.; Earnshaw, C. G.; Eggington,
Prof. Peter Willet, the referees, and Derwent Information Ltd C. E.; Ellis, M. K.; Judson, P. N.; Langowski, J. J.; Marchant, C. A.
for comments on the manuscript. Computer prediction of possible toxic action from chemical struc-
ture: an update on the DEREK system. Toxicology 1996, 106(1-3),
Supporting Information Available: Leukotriene an- 267-79.
tagonists numbering 344 using Ward based clustering model. (12) Lewell, X. Q.; Smith, R. Drug Motif Based Diverse Monomer Selection
The label M143 1 means arbitrary model name (M143 ) 143rd - Method and Application in Combinatorial Chemistry; J. Mol.
molecule in the original input list) and for cluster number 1 Graphics Model. 1997, 15, 43-48.
) cluster 1 (23 pages). See any current masthead page for (13) Derwent Information Ltd., Derwent House, 14 Great Queen Street,
ordering and Web access instructions. London, WC2B 5DF, UK. Web address: http://www.derwent.co.uk/.
(14) Daylight Chemical Information Systems Inc., 27401 Los Altos, Suite
#370, Mission Viejo, CA 92691. Web address: http://www.daylight/
REFERENCES AND NOTES com/.
(15) DAYLIGHT SMILES and SMARTS notations to represent chemical
(1) Young, S. S.; Farmen, M.; Rusinko III, A. Random Versus structures or fragments. Taking advantage of the isotopic notation of
RationalsWhich is better for General Compound Screening, Network the SMARTS notation, 3 was chosen to represent an amine bond type.
Science, Feature 9, Aug 1996. http://www.awod.com/netsci/Science/ [3N] therefore represents an amine nitrogen. For all other isotopic
Screening/feature09.html. representations, see Figure 2.
(2) Warr, W. A. Commercial Software Systems for Diversity Analysis,
Perspect. Drug DiscoVery Des. 1997, 7/8, 115-130. (16) SPRESI93. A chemical substances database extracted by the Academy
(3) Leach, A. R. Molecular Modelling, Principles and Applications; ISBN of Science, USSR and marketed by InfoChem GmbH. A DAYLIGHT
0-582-23933-8, Addison-Wesley Longman Limited: 1996. version is available from DAYLIGHT.
(4) Baggio, R.; Shi, Y.-Q.; Wu, Y.-q.; Abeles, R. H. From Poor Substrates (17) Spearman, C. Am. J. Psych. 1904, 15, 88.
to Good Inhibitors: Design of Inhibitors for Serine and Thiol Proteases. (18) Ward, J. H. Hierarchical Grouping to Optimize an Objective Function.
Biochemistry 1996, 35(11), 3351-3. J. Am. Statistical Assoc. 1963, 58, 236-244.
(5) Klopman, G. Artificial Intelligence Approach to Structure-Activity (19) Matzke, M.; Beckermann, B.; Fruchtmann, R.; Burkhard, F.; Gardiner,
Studies. Computer Automated Structure Evaluation of Biological P. J.; Goossens, J.; Hatzelmann, A.; Junge, B.; Keldehnich, J. et al.
Activity of Organic Molecules. J. Am. Chem. Soc. 1984, 106, 7315- Leukotriene synthesis inhibitors of the quinoline type: parameters for
7321. the optimization of efficacy. Eur. J. Med. Chem. 1995, 30(Suppl.,
(6) Muskal, S. M. Enriching Combinatorial Libraries with Features of Proceedings of the 13th International Symposium on Medicinal
Known Drugs, C. Divisions, 20th ACS National Meeting (American Chemistry, 1994), 441s-51s.
Chemical Society, Anaheim, CA, 1995; Vol. 1, p 029.
(7) Fujita, T. Concept and features of EMIL, a system for lead evolution (20) Shemetulskis, N. E.; Weininger, D.; Blankley, C. J.; Yang, J. J.;
of bioactive compounds. Trends QSAR Mol. Modell. 92, Proc. Eur. Humblet, C. Stigmata: An Algorithm To Determine Structural
Symp. Struct.-Act. Relat.: QSAR Mol. Modell., 9th 1993, Meeting Commonalities in Diverse Datasets. J. Chem. Inf. Comput. Sci. 1996,
Date 1992, pp 143-59. Wermuth, C.-G., Eds.; ESCOM: Leiden, The 36(4), 862-871.
Netherlands, CODEN: 59XTAS. CAN 121: 169406. (21) Jarvis, R. A.; Patrick, E. A. Clustering using a similarity measure based
(8) Corey, E. J. Computer-assisted analysis of complex synthetic problems. on shared near neighbors. IEEE Trans Computers, 1973, C22, 1025-
Quart. ReV. Chem. Soc. 1971, 25(4), 455-82. 1034.
(9) Myatt, G.; Baber, J. C.; Johnson, A. P. New developments in the caesa (22) Bemis, G. W.; Murcko, M. A. The properties of Known Drugs. 1.
system for estimation of synthetic accesibility. Book of Abstracts, 21st Molecular Frameworks. J. Med. Chem. 1996, 39, 2887-2893.
ACS National Meeting, Chicago, IL, August 20-24 1995; American (23) Gillet, V. J.; Johnson, A. P.; Mata, P.; Sike, S.; William, P.
Chemical Society, Washington, D.C., Issue Pt. 1, COMP-007 SPROUT: A Program for Structure Generation. J. Comput-Aided Mol.
CODEN: 61XGAC. AN 1995:920276. Des. 1993, 7, 127-153.
(10) Gasteiger, J.; Marsili, M.; Hutchings, M. G.; Saller, H.; Loew, P.;
Roese, P.; Rafeiner, K. Models for the representation of knowledge CI970429I