Assessing protein resilience via a complex network approach

June 5, 2017 | Autor: Alessandro Giuliani | Categoria: Graph Theory, Biology, Complex Networks, Proteins
Share Embed


Descrição do Produto

Manuscript for Review

Resilience of Protein Contact Networks

Journal: Manuscript ID: Manuscript Type: Date Submitted by the Author: Complete List of Authors:

Keywords:

Electronics Letters ELL-2013-1162 Letter 05-Apr-2013 Oliva, Gabriele; University Campus Biomedico, Complex Systems and Security Laboratory Di Paola, Luisa; University Campus Bio-Medico, Complex Systems and Security Laboratory Giuliani, Alessandro; stituto Superiore di Sanita`, Pascucci, Federica; University Roma Tre, Setola, Roberto BIOLOGICAL TECHNIQUES, NETWORK TOPOLOGY

Page 1 of 4

Resilience of Protein Contact Networks Gabriele Oliva, Luisa Di Paola, Alessandro Giuliani, Federica Pascucci, and Roberto Setola Given the peculiar structure of the proteins, composed by a folded chain of aminoacids, that establish non-covalent link due to the bundling, the use of Complex Network theory to describe them is gaining momentum rapidly among the scientific community. Among the others, Protein Contact Networks are an effective tool to capture the complexity of such structures. In this paper we provide a preliminary study on the resilience of such networks by simulating a selective disruption of nodes, according to both traditional and protein-specific topological indicators. Specifically, we provide a case study on Human Serum Albumin, and we define an indicator to compare the results of the different strategies.

Introduction: Protein Contact Networks (PCN) [2, 3, 4] are an effective tool to represent the non-covalent links generated by protein folding (Fig. 1). In this paper, based on the preliminary results in [1], we provide an estimate of resilience of such networks following the approach in [5], by considering the response of such networks to attacks on vertices selected following topological criteria. Protein Contact Network: Protein Contact Networks can be obtained as follows: as first step the network nodes are set as the α-carbons, i.e., the central atoms of aminoacid residues; then, the distance matrix D = {Dij } is computed, being Dij the Euclidean distance between the i-th and the j -th residues (ordered according to their position in the protein sequence). An edge between a pair of aminoacids exists if their distance is within the range I = [4, 8] Å. Such a choice allows to deal only with noncovalent interactions, which are sensible to environmental cues. The edges of PCN correspond to the presence of a non-covalent interaction between aminoacid pairs; hence, the entries of the adjacency matrix A that encodes the network are equal to one if the distance Dij ∈ I and zero otherwise. Besides topological indices common to any kind of networks like node degree ki and clustering coefficient, the specific nature of proteins allowed us to define a peculiar index called contact order (ordki ) connecting topological and sequence metrics. Two other mesoscopic, clustering dependent indexes, the Within-module z-score zi and the Participation coefficient Pi proved to be descriptive of both the local and global structure of proteins [6, 7, 8]: P ordki =

j6=i

|i − j| Aij

;

(1)

ki,Si − kSi ; σS i

(2)

2

(3)

ki

zi =

Pi = 1 −



ki,Si ki

where the intra-cluster degree ki,Si is the restriction to the cluster Si of the degree of the the i-th node, kSi is the average of the intra-cluster degree of the nodes in Si and σSi is the corresponding standard deviation. Contact order captures the distance in sequence of residues that are spatially close due to the bending of the thread; thus, a high ordk residue is involved in long range contacts, crucial to ensure protein stability. High z residues are in charge of the cluster stability, establishing a high number of links with residues lying in the same cluster, while high P residues establish most of their links with nodes belonging to other clusters. Assessing Resilience; a Case Study on Human Albumin: In this Section we study, following [5], the decrease of the size of the largest giant component S , to evaluate the resilience of the Human Serum Albumin (HSA) PCN, a graph composed of 582 nodes (some extended results on our study are available in [?]). The idea is to assess the resilience by iteratively removing nodes or edges from the network in sequence, according to a given strategy and then evaluate how S degrades; the presence of a threshold and the fraction of nodes required to reach the threshold are effective measures of the resilience with respect to a given attack strategy. The results of the tests are reported in Figure 2.(b), where the size of the giant component is plotted versus the fraction of removed nodes according to different

Fig. 1 Cartoon representation of the protein structure: it can be thought as a bundled pearl thread, kept in the bundled (folded) state by the pinching of many links between pearls (residues) that are close spatially but far away along the sequence (dashed red lines).

Fig. 2 Degradation of the size of the giant component of the HSA PCN: (a) average (in blue) and standard deviation (in red) for a random removal of nodes (150 tries); (b) node removal strategies according to decreasing degree (blue), decreasing clustering coefficient (red), decreasing ordk (black), decreasing P index (magenta), increasing z index (cyan).

strategies, while Figure 2.(a) shows the effect of the removal of nodes in random order. As shown by Figure 2.(a), the random order removal strategy is quite ineffective, and the size of the giant component degrades gracefully according to a sigmoid-like shape; the standard deviation, also, is almost null for very small or very large fraction of removed nodes, while being very limited in the range [0.4, 0.7], showing a peak value of about ±46 nodes (i.e., about ±7.9% of the entire HSA PCN size, corresponding to about 55% of removed nodes). Figure 2.(b) shows the effect of different strategies. All the guided strategies, at odds with random removal, provoke the appearance of sharp discontinuities in the degradation curve of Fig.2. These discontinuities point to the existence of an ordering of nodes in terms of their contribution to global stability variously mirrored by the different topological descriptors driving the removal strategy. In order to compare the topological descriptors in terms of their efficiency to measure node contribution to stability it is straightforward to compare the relative magnitude of the induced discontinuities in the degradation curves. For this aim, the Degradation Index DI is introduced as a normalized measure of the effectiveness of the different strategies: DI =

Difference in giant component size . Fraction of removed nodes

(4)

such an index is evaluated at the first discontinuity in size of the giant component. Table I reports a summary of the effectiveness of different removal strategies in terms of DI ; as shown by the Table, the ordk based strategy appears the most effective. Conclusions: In the present case study, we noted how ordk was the most crucial topological descriptor in terms of network resilience. Looking at Fig. 1, it is self-evident how the removal of residues involved in contacts between aminoacids far in sequence (high ordk) disrupts the folding but keeps alive the continuity of the thread along the peptide backbone. While our analysis in the case of proteins can have some utility in the recognition amino-acid residues endowed with a relevant structural role, being practically impossible to pursue the proposed strategy with a real protein in the lab, the demonstrated peculiar importance of a topological index halfway between two different metrics applied on the same network system, can be an useful indication to the study of resilience of other natural and artificial networks supported by different metrics space (e.g., topology and fluxes of a metabolic network or of an urban transport system). Future works will be devoted to assess the resilience of a large database of protein PCNs, highlighting common patterns and differences and evaluate the resilience of protein contact networks in terms of edge removal strategies. G. Oliva, L. Di Paola and R. Setola (University Campus Biomedico of Rome, Italy)

Ranking Strategy Degradation Index 1 Contact Order(decreasing) 21.83 2 Degree(decreasing) 6.06 3 z-index (increasing) 4.1 4 P-index (decreasing) 3.39 5 Clustering Coeff. (decreasing) 1.47 Table 1: Ranking of the strategies by degradation index DI .

Page 2 of 4 E-mail: [email protected] Alessandro Giuliani (Istituto Superiore di Sanità, Rome, Italy) E-mail: [email protected] Federica Pascucci (University “Roma TRE”, Rome, Italy) E-mail: [email protected] References 1 G. Oliva, L. Di Paola, A. Giuliani, F. Pascucci, and R. Setola, “Assessing Protein Resilience via a Complex Network Approach", IEEE 2nd International Workshop on Network Science, West Point, NY, USA 29 April – 1 May, 2013 (To Appear). 2 P. Paci, L. Di Paola, D. Santoni, M. De Ruvo, and A. Giuliani, “Structural and functional analysis of hemoglobin and serum albumin through protein long-range interaction networks,” Curr Proteomics, vol. 9, no. 3, pp. 160– 166, 2012. 3 G. Bagler and S. Sinha, “Assortative mixing in protein contact networks and protein folding kinetics,” Bioinformatics, vol. 23, no. 14, p. 1760, 2007. 4 M. Gromiha, A. Thangakani, and S. SelvaraJ, “Fold-rate: prediction of protein folding rates from amino acid sequence,” Nucleic Acids Res, vol. 34, pp. W70–W74, 2006. 5 P. Holme and B. Kim, “Attack vulnerability of complex networks,” Phys Rev E, vol. 65, p. 056109, 2002. 6 A. Giuliani, L. Di Paola, and R. Setola, “Proteins as networks: a mesoscopic approach using haemoglobin molecule as case study,” Curr Proteomics, vol. 6, no. 4, pp. 235–245, 2009. 7 R. Guimera and L. Amaral, “Functional cartography of complex metabolic networks,” Nature, vol. 433, no. 895-900, 2005. 8 M. Gromiha and S. Selvaraj, “Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: Application of long-range order to folding rate prediction,” J Mol Biol, vol. 310, pp. 27–32, 2001.

Page 3 of 4

Page 4 of 4

Size of the Giant Component

600 500

Random (average) Random (standard deviation)

400 300 200 100 0

0

0.1

0.2

0.3

0.4

0.5 Fraction of removed nodes

0.6

0.7

0.8

0.9

1

Size of the Giant Component

600 Decreasing degree Decreasing clustering Decreasing ordk index

500 400

Decreasing P index Increasing z index

300 200 100 0

0

0.1

0.2

0.3

0.4

0.5 Fraction of removed nodes

0.6

0.7

0.8

0.9

1

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.