GAMESS As a Free Quantum-Mechanical Platform for Drug Research

Share Embed


Descrição do Produto

Send Orders of Reprints at [email protected] Current Topics in Medicinal Chemistry, 2012, 12, 000-000

1

GAMESS As a Free Quantum-Mechanical Platform for Drug Research Yuri Alexeev1, Michael P. Mazanetz2, Osamu Ichihara2 and Dmitri G. Fedorov3,* 1

Argonne Leadership Computing Facility, Argonne National Laboratory, 9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA; 2Evotec (UK) limited, 114 Milton Park, Abingdon, Oxfordshire, OX14 4SA, UK; 3NRI, National Institute of Advanced Industrial Science and Technology, Central 2, Umezono 1-1-1, Tsukuba 305-8568, Japan Abstract: Driven by a steady improvement of computational hardware and significant progress in ab initio method development, quantum-mechanical approaches can now be applied to large biochemical systems and drug design. We review the methods implemented in GAMESS, which are suitable to calculate large biochemical systems. An emphasis is put on the fragment molecular orbital method (FMO) and quantum mechanics interfaced with molecular mechanics (QM/MM). The use of FMO in the protein-ligand binding, structure-activity relationship (SAR) studies, fragment- and structure-based drug design (FBDD/SBDD) is discussed in detail.

Keywords: Quantum chemistry, fragment molecular orbital, drug design, ab initio, GAMESS, FMO, QM/MM, FBDD, SBDD. 1. INTRODUCTION Molecular mechanics (MM) has been very successfully applied to many systems in biochemistry [1]. Many elaborate force fields such as CHARMM [2] or AMBER [3, 4] have been developed and highly tuned to the studies of biochemical systems. The traditional force fields offer high computational efficiency, however, the parameterization and the simplistic models have very considerable limitations. Two main drawbacks of the traditional force fields, the neglect of the polarization and charge transfer (CT), have been the subject of developing a new generation of force fields [5-10]. Frequently, parameters for force fields are generated from quantum-mechanical (QM) calculations of model systems. The question arises to the transferability of those parameters, which in the language of physics can be formulated as the importance of many-body effects in describing physical interactions. Importantly, the studies of chemical reactions with bond breaking are typically conducted with QM approaches. The reason why relatively few biochemical scientists have taken advantage of QM approaches lies in several factors. One is their large computation cost. However, driven by both the remarkable progress in QM method development and the amazing pace in computational hardware improvement, with the revolutionary advent of multicore CPUs (a single node with 48 cores is readily available) and GPUs (with hundreds of computing units), one can now routinely perform calculations of systems containing hundreds and thousands of atoms. The other factor is the difficulty in setting up and running the calculations. Many excellent graphical packages enable users to employ QM methods with relative ease. The aura of being too slow and difficult for the non-initiated, which has long loomed over quantum chemis*Address correspondence to this author at NRI, National Institute of Advanced Industrial Science and Technology, Central 2, Umezono 1-1-1, Tsukuba 305-8568, Japan; Tel: +81-29-861-7218; Fax: +81-29-851-5426; E-mail: [email protected] 1568-0266/12 $58.00+.00

try, is quickly disappearing. We also note that overall QM calculations are more standard and transparent than for example, force fields. The other factor of time scale, that is, the need to take into account the entropy and consider dynamic aspects of biochemical process on a realistic time scale, still remains a major problem. It should be said here that while for force fields one can get away with the dubious idea of simulating processes at an unphysically high temperature to accelerate them, the same will not work with QM, because bonds are not fixed by springs and will easily break as temperature increases. The use of QM in drug research [11] is one of the most exciting developments in the recent years which has the potential to revolutionize this field. QM methods have important advantages over MM approaches. Firstly, QM has no implicit parameterization and relies only on the use of wellknown physical constants such as the velocity of light and the masses and charges of atoms. The molecular properties and geometries are obtained by solving the differential Schrödinger equation [12-14]. Unfortunately, the equation can be solved directly only for only very small systems [15, 16]. A number of QM methods based on various models and assumptions on the structure of the wave function have been developed. Density functional theory (DFT) [17] owes its popularity to relatively low cost and a fairly good accuracy, and systems of biochemical size are becoming accessible [18]. Recently developed DFT methods deliver bond lengths for organic molecules within 0.02 Å and an absolute energy error within 3 kcal/mol [14] in comparison to experiment. However, such high accuracy comes at a hefty computer time price. Until recently, only fairly small systems (~100 atoms) could be computed because the cost of calculations scales at least cubically with respect to the system size. A number of ab initio [19, 20] and fragment-based [21, 22] linearly scaling methods have been proposed, which © 2012 Bentham Science Publishers

2 Current Topics in Medicinal Chemistry, 2012, Vol. 12, No. 18

Alexeev et al.

compete with semi-empirical approaches [23, 24]. QM methods are indispensable for describing chemical reactions and excited states, as well as for free radicals [25]. The fragment molecular orbital (FMO) method [26] is one of the fragment-based methods. Impressive results can be achieved when both linear scaling QM approaches and supercomputing technologies are combined [27]. For example, FMO energy calculation of the receptor-ligand system with size of 17,767 atoms takes only ~54 minutes on Blue Gene/P 40,960 CPU cores at the RHF level of theory and 631G* basis set [28]. With the current pace of advances in both computer technology and algorithms, a wide use of QM simulations for realistic biochemical systems is expected in the near future. Ligand binding is a very complex phenomenon and both enthalpic and entropic contributions need to be appropriately dealt with in order to estimate the free energy of binding accurately. It is important to mention that QM methods described in this paper are used exclusively for computing enthalpy and solvation energy. Computation of configurational entropy is obviously important to obtain complete energetic picture of ligand binding but it is notoriously difficult and it is often not practical to use quantum-chemical methods for this purpose. Classical molecular dynamics simulations (for example: quasiharmonic analysis with the covariance) [29] are commonly used to estimate configurational entropy which will not be covered in this paper. However, in medicinal chemistry program, particularly in lead optimization stage, ligands of interest share a common molecular framework and it is often safe to assume that they the entropic contributions cancel out. We would also like to emphasize that binding free energy is not the only property useful for drug design. QM methods, particularly together with energy decomposition analyses, can also provide much richer information on the nature of protein-ligand interactions to medicinal chemists for ligand design. For example, nonclassical interactions such as CH- and halogen- etc can be readily identified, which are otherwise hard to detect and quantify. In this review we focus on in silico drug design methods and applications of QM methods implemented in the quantum chemistry package called General Atomic and Molecular Structure System (GAMESS) [30, 31]. There are several excellent general reviews of GAMESS by its principal developers, Gordon and Schmidt [32, 33], whereas we only describe applications of GAMESS relevant to drug research. We cover in detail the following studies: pKa estimation, geometry optimization, computation of ligand binding energy, definition of quantitative structure-activity relationship (QSAR) descriptors, and analysis of the ligand-receptor interaction energies with the goal of designing new drugs with a better affinity and specificity. We conclude with an outlook on a future use of QM in drug research, with the hope to provide convincing arguments of the great potential held by QM methods. 2. QUANTUM-MECHANICAL GAMESS OVERVIEW

METHODS

AND

The history of GAMESS is described in detail by Gordon and Schmidt [33]. At some point, GAMESS-UK [34] and PC

GAMESS (now renamed Firefly) [35] branched off from the main project GAMESS-US [36]. The three branches share much of the source code, and the input and output file structure, but a lot of independent new development has been invested in each project so that they have a substantially different functionality and performance. GAMESSPlus is an add-on with additional functionality for GAMESS-US, including solvation models, DFT functionals and QM/MM capability [37]. While Firefly is designed specifically for Intelcompatible CPUs, GAMESS-US and GAMESS-UK also run on other types of architecture: in particular, when a UNIX operating system is available. Source code can be obtained for the former two projects, and precompiled binaries are provided for all three. GAMESS-UK is available for free for academic users in the UK, while GAMESS-US and Firefly are free for both industry and academics. WinGAMESS is a GAMESS-US binary precompiled for Microsoft Windows. The GAMESS forum [38] can be used for discussions. GAMESS can take advantage of massively parallel computers. For GAMESS-US, distributed data interface (DDI) [39] was developed which is one sided-communication message passing library that works on top of either Unix sockets or standard MPI [40]. DDI was used to parallelize various ab initio QM methods [41-43]. Generalized DDI (GDDI) [44] based on an efficient node grouping can be utilized with some methods such as FMO on large supercomputers. For parallelization, Firefly uses MPI in combination with the thread-based Point-to-Point (P2P) message oriented interface; the latter works via Ethernet (using TCP/IP sockets), shared memory or InfiniBand. GAMESS-UK can be parallelized with the Global Array toolkit or MPI. Here we list the QM methods which are of particular relevance to drug research. The most basic QM method is Hartree-Fock (HF), which exists in the restricted (RHF) and unrestricted (UHF) varieties. The former is used for closed shell systems such as organic molecules and proteins, while the latter finds it use mainly for systems with transition metals or radicals. Two main branches of methods improving on HF have evolved in QM. One is DFT, which offers a similar computational cost to RHF, and is in general very successful in many systems without nearly degenerate orbital spaces. The other is given by the explicit treatment of the electron correlation, for which there is a well arranged ladder of methods offering a clear way to improve the accuracy. Among these, second order Møller–Plesset perturbation theory (MP2) is most readily useful for large systems, although more advanced methods can be also applied in some cases, such as coupled cluster (CC) theory. For DFT, there are many functionals whose choice is driven by experience. B3LYP [45] and PBE [46] are popular, and among recent development we mention long-range corrected CAM-B3LYP [47], and M06 [48]. The latter performs especially well for organometallic and inorganometallic chemistry and for non-covalent interactions. An ad hoc way of improving DFT is to include an empirically parameterized dispersion correction [49], which can also be used for RHF. For MP2, there is some choice of methods: the spin-component scaled version of MP2 (SCS-MP2) [50], and

GAMESS for Drug Research

accelerated regular MP2, based on the resolution of the identity (RI-MP2) [51]. We focus on the methods which are particularly suitable to calculations of large systems: the fragment molecular orbital (FMO) method and QM/MM, performed by interfacing GAMESS with MM programs CHARMM (Chemistry at HARvard Macromolecular Mechanics) [52] or TINKER [53]. 3. METHODS FOR LARGE SYSTEMS The choice of an optimal method depends on the type of calculations and should pursue the balance between the accuracy and speed. The latter is an especially relevant factor for drug research because of the need for providing useful information for ligand design to chemists during rapidly moving medicinal chemistry projects. A large ligand-receptor system can be split into three major regions: an active site with a ligand, the rest of the protein, and solvent. Ideally, one would prefer to treat the whole system by QM, but to achieve high speed and accuracy each region can be treated with different models. QM/MM methods important for drug research are reviewed in detail below. QM/MM is suitable for fast high-throughput calculations, but for an analysis of an active site revealing ligand-receptor interaction details, full QM methods like FMO are powerful alternatives. The reason why it is desirable to perform QM calculations of the whole system can be a large ligand having multiple weak van der Waals interactions with the receptor, multiple successive chemical reactions in the active site, or an excited state involving delocalized CT. 3.1. Full ab initio In addition to a high computational cost of traditional ab initio methods, the memory requirements are also high, especially for MP2 and CC. Even in the most economic RHF and DFT, the memory requirement scales quadratically with the system size, which eventually becomes a problem. Usually, there are several matrices one has to store. Assuming for simplicity that one stores the Fock, density, overlap and MO matrices, this gives 48N2 bytes, where N is the number of basis functions. For N=10000 (approximately 1000 atoms with 6-31G*), this is 3.2 GB. Unless the matrices are distributed among computer nodes, this amount of memory is required per CPU core, plus some memory is taken by OS and the program itself. Even for a modern workstation 4 GB per CPU core is a significant amount, and large supercomputers such as Blue Gene typically have very moderate amount of memory per core. Thus, Alexeev et al. [42] developed a self-consistent field (SCF) algorithm for RHF in GAMESS-US, in which large matrices are distributed between nodes. On the other hand, to overcome steep computational cost of QM, one can use the fast multipole method (FMM) [54] to accelerate twoelectron integrals in large systems, which was implemented [55] in GAMESS-US. Ishimura et al. developed an efficient OpenMP/MPI parallelization [56] capable of using many nodes in ab initio calculations. Despite this method development, systems of biological size are still hard to treat with fully ab initio methods. Among very few published applica-

Current Topics in Medicinal Chemistry, 2012, Vol. 12, No. 18

3

tions of GAMESS, one can find the study of HIV-1 by Weltman et al. [57]. 3.2. Fragment-Based Approaches The following low-scaling ab initio based methods are available in GAMESS-US: effective fragment potential (EFP) [8], elongation method (ELG) [58], divide-andconquer (DC) approach [59] and FMO [26]. The former is a QM-parameterized force field and we describe it below together with other force fields. The ELG method is primarily used for polymers, although some preliminary probing has been performed on DNA [60, 61] and the applicability to biochemical systems has been discussed [62]. DC methods in GAMESS-US have so far not been applied to biochemical systems, so we do not describe them any further. The FMO method [26] has been widely applied to large molecular systems, including a recent geometry optimization of a 0.1 μm BN nanoring [63]. There are several reviews of FMO [22, 64, 65], book chapters [66-68] and a full book on FMO [69], which can be used for a more detailed and concise study of FMO. Here we provide an overall picture without giving too many technical details. The relation of FMO to other methods has been recently discussed [22, 65] and we do not address it here. The idea of functional groups is paramount to chemistry. This is because the functional groups, if properly defined, retain to a large extent their physical properties in various molecular systems, under the perturbing influence of environment (e.g., other functional groups). This is exactly the basic principle behind fragment-based methods such as FMO. By taking advantage of chemical knowledge, one can define groups of atoms, called fragments, which are described by QM, and an interaction between them. Clearly, the success of such an approach relies on the efficiency of treating the interactions. In FMO, the electrostatic interaction is accounted for by including the polarizing electrostatic field in all fragment calculations. The field corresponds to the whole system, and it is computed using the electron densities and nuclei of all fragments (called monomers). The other key step is to calculate pairs of fragments (dimers) using QM, in order to describe non-electrostatic interactions, such as exchangerepulsion and CT. Alternatively, there is also effective FMO (EFMO) [70], which differs from regular FMO in the description of the electrostatics. EFMO can be used in many ways similar to FMO, for example, to do QM calculations of proteins [71]. Treatment of covalent bond detachment during fragmentation is a technical issue well described elsewhere [65, 69]. Here we only mention that the electron density distribution of a detached bond is assigned to a single fragment, i.e., bonds are detached heterolytically and without the addition of hydrogen caps. The detached bonds are saturated by the embedding potential. The methodology ensures that for closed shell systems the fragments are also closed shell, and neutral unless some charged functional group is present. For practical applications, an important question is, how to define fragments? There is a rather simple answer for FMO, and the main principle is to minimize the CT between

4 Current Topics in Medicinal Chemistry, 2012, Vol. 12, No. 18

fragments. This is because CT is accounted for in FMO with dimer calculation, i.e., it is limited to two-body contributions only. Charge transfers between pairs of fragments are coupled, i.e., there are many-body CT effects. The same applies to exchange-repulsion (a quantum effect originating from the Pauli exclusion principle) and the dispersion. All of these effects are typically localized so that one can meaningfully define fragments. When CT is not localized, then one cannot fragment the system, e.g., conjugated systems such as aromatic ring systems or donor-acceptor metal complexes. For the latter type, assigning a metal cation as a separate fragment will result in every substantial CT between the metal and ligands, thus leading to a poor accuracy. Salt bridges between positively and negatively charged amino acid residues are also prone to a large CT, and one can combine them into a single fragment for a better accuracy. Biochemical systems are composed of a few dozens of standard units, enabling an easy automatic fragmentation. The caveat here is that for polypeptides, there is a considerable CT across a peptide bond, so that one cannot fragment it. Instead, polypeptides are fragmented at C, and residues in FMO become residue fragments shifted by one carboxyl group relative to the conventional residues. To distinguish between the conventional residues and residue fragments, we often insert a dash in the latter, e.g., Trp-6. Polysaccharides are fragmented into sugar units, and nucleic acids such as DNA or RNA into bases. Such automatic fragmentation can be readily performed using Facio [72]. Ligands leave some freedom to the user. There are two reasons to divide a large ligand into several fragments. If a ligand is large, it takes relatively long to compute, especially for correlated methods such as MP2. Also, ligands often have some subunits and it desirable to analyze the properties of these subunits such as the interaction energy between a piece of the ligand and a residue fragment in a protein. This naturally shows the connection between FMO and fragmentbased drug design (FBDD). The small molecules (
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.