Mass spectrometric genomic data mining: Novel insights into bioenergetic pathways inChlamydomonas reinhardtii

Share Embed


Descrição do Produto

Proteomics 2006, 6, 6207–6220

6207

DOI 10.1002/pmic.200600208

RESEARCH ARTICLE

Mass spectrometric genomic data mining: Novel insights into bioenergetic pathways in Chlamydomonas reinhardtii Jens Allmer*, Bianca Naumann*, Christine Markert, Monica Zhang and Michael Hippler* Plant Science Institute, Department of Biology, University of Pennsylvania, Philadelphia, PA, USA

A new high-throughput computational strategy was established that improves genomic data mining from MS experiments. The MS/MS data were analyzed by the SEQUEST search algorithm and a combination of de novo amino acid sequencing in conjunction with an error-tolerant database search tool, operating on a 256 processor computer cluster. The error-tolerant search tool, previously established as GenomicPeptideFinder (GPF), enables detection of intron-split and/or alternatively spliced peptides from MS/MS data when deduced from genomic DNA. Isolated thylakoid membranes from the eukaryotic green alga Chlamydomonas reinhardtii were separated by 1-D SDS gel electrophoresis, protein bands were excised from the gel, digested ingel with trypsin and analyzed by coupling nano-flow LC with MS/MS. The concerted action of SEQUEST and GPF allowed identification of 2622 distinct peptides. In total 448 peptides were identified by GPF analysis alone, including 98 intron-split peptides, resulting in the identification of novel proteins, improved annotation of gene models, and evidence of alternative splicing.

Received: March 29, 2006 Revised: August 10, 2006 Accepted: August 14, 2006

Keywords: Chlamydomonas reinhardtii / Error-tolerant search / Genomic data mining / Mass spectrometry

1

Introduction

Proteomic research is driven by the development of new MS technology and bioinformatics tools to handle and evaluate the MS data. In recent years, new mass spectrometers were developed permitting ever more sensitive, fast, and precise peptide and protein analysis. In line with this, the amount of data stemming from proteomic experiments is becoming humongous. Today’s bottleneck seems to be the evaluation of these data.

Correspondence: Dr. Michael Hippler, Department of Biology, Institute of Plant Biochemistry and Biotechnology, University of Münster, Hindenburgplatz 55, 48143 Münster, Germany E-mail: [email protected] Fax: +49-251-832-8371 Abbreviations: ANSI, American National Standard Institute; GPF, GenomicPeptideFinder; JGI, Joint Genome Institute

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

A well-established way to identify peptides and proteins from MS data is to search a protein database and match the mass spectra to the database entries. This is realized in several search engines like SEQUEST [1], MASCOT [2, 3], Sonar [4], GutenTag [5] and other novel approaches [6–8]. One obvious limitation of all these algorithms is that the sequence searched for must be present in the database. Another problem is the high number of false positive identifications. Several tools have been developed, to make the findings more reliable [4, 9–13]. A different way of addressing the mass spectra is to directly deduce the amino acid sequence from the information contained therein. A number of these programs have been described and are in use today [14–19]. These programs face other limitations. They are

* Present address: Department of Biology, Institute of Plant Biochemistry and Biotechnology, University of Münster, Hindenburgplatz 55, 48143 Münster, Germany

www.proteomics-journal.com

6208

J. Allmer et al.

usually computational intensive and dependent on high quality spectra [20, 21]. For these reasons, they are quite limited in practice. New tools might increase both speed and reliability of de novo predictions [21–23]. Today, de novo amino acid sequencing is far from being perfect, but using the predictions for an error-tolerant search in the genomic database might reveal new information. For advancing annotation, it is desirable to map back the proteomic information to the corresponding genome [24], which can be achieved by this combination of tools. We will present results that take advantage of such an approach, using PEAKS [14] to perform de novo amino acid sequencing from the mass spectra and the GenomicPeptideFinder (GPF) [25] to search the translation of the genomic database of the unicellular eukaryotic green alga Chlamydomonas reinhardtii in an error-tolerant fashion. The aim was to employ MS data for genomic data mining. The error-tolerant search tool, GPF, was shown to enable detection of intron-split and/or alternatively spliced peptides from MS/MS data when deduced from genomic DNA [25]. Another approach that enables detection of peptides split by an intron on the “genomic level” does so by locally defining donor and acceptor sites in an area around an initial match in the genomic database [17, 26]. This approach essentially uses local gene prediction and may be as error prone as global gene prediction, which was found to be rather imprecise [27–29]. From the EST and genomic data obtained for model organism like human, mouse or Arabidopsis it is obvious that roughly half of the gene products that are encoded by a genome cannot be annotated from EST data [27, 30–32]. Additionally, it is suggested that alternative splicing adds another enormous factor to proteome diversity in eukaryotes with complex genomes. It is assumed that 25% of the peptides found via MS from eukaryotes contain introns when mapped back to the genome [33]. It is further supposed that about 40% of the human genes are alternatively spliced [34]. Thus, genomic data mining employing MS information will be important and indispensable for the identification of protein diversity in these organisms. For the analysis of the MS data, we established a platform where MS/MS spectra were analyzed by the SEQUEST search algorithm and a combination of de novo amino acid sequencing in conjunction with our error-tolerant database search tool, operating on a 256 processor computer cluster. This high-throughput platform was applied to identify proteins, facilitate annotation of gene models as well as to recognize proteins that originate from alternative splicing. The analysis was done with C. reinhardtii, which is an important eukaryotic model system for the investigation of fundamental molecular processes, for instance, operating in bioenergetics [35] and motility [36], and is an emerging model system for proteomic research [37]. Chlamydomonas has complex gene structures with several exons per gene. It is therefore an appropriate system to address the question of the feasibility of genomic data mining using MS data. We processed MS data origi© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Proteomics 2006, 6, 6207–6220

nating from thylakoid membranes isolated from arginine auxotrophic cell wall-less Chlamydomonas cells. Thylakoid membranes represent, beside the inner and outer envelope membranes, the third membrane system of the chloroplast and harbor the photosynthetic machinery. It is commonly accepted that chloroplasts in green algae and plants have evolved from a cyanobacterial endosymbiont. The majority of the chloroplast proteome is encoded by the nucleus, whereas the chloroplast genome itself encodes for slightly more than 100 proteins and RNA molecules. The nuclear encoded chloroplast proteins are synthesized as precursors in the cytosol including an N-terminal transit peptide sequence that marks targeting of the protein to the chloroplast. The N-terminal transit sequence is proteolytically removed after import of the protein into the chloroplast. From an algorithm that aimed to predict the presence of chloroplast transit peptide sequence in all the proteins encoded by the Arabidopsis nuclear genome, 4225 proteins have been suggested to be localized in the chloroplast [38]. Among these proteins, about 520 are predicted to have at least one or more transmembrane domains and are therefore localized either in the outer/inner envelope or in the thylakoid membrane system. Proteomic studies performed in vascular plants indicated the presence of more than 700 non-redundant proteins in thylakoid and outer/inner envelope [39–43]. So far, no large-scale thylakoid membrane proteomic study has been performed with green algae. Therefore, we aimed to use our newly developed computational genomic data mining strategy to elucidate the thylakoid proteome of the green algae C. reinhardtii. Our study identified numerous new plant, as well as Chlamydomonas-specific, proteins. The data also demonstrated that the high-throughput platform could be used to detect intron-split and/or alternatively spliced peptides from MS/MS data. The findings also indicated that the concerted efforts of different algorithms improve protein identification.

2

Materials and methods

2.1 Mass spectrometry Instrument set-up for LC (Ultimate system, LC-Packings) and MS (LCQ-Deca XP plus, ThermoFinnigan), protein ingel tryptic digest, sample handling and SDS-PAGE analysis were performed as described [44]. Protein samples originated from isolated isotopically labeled iron-sufficient or unlabeled iron-deficient thylakoid membranes. Isotopic labeling of proteins was achieved by growth of arginine auxotrophic Chlamydomonas cell wall-less cells (CW15) in the presence of either 13C-labeled or 12C-unlabeled arginine as described [44]. Protein identification data were further used for quantitative peptide profiling (Naumann, Allmer and Hippler, manuscript in preparation). www.proteomics-journal.com

Plant Proteomics

Proteomics 2006, 6, 6207–6220

6209

2.2 SEQUEST

2.4 Query creation

Widely used settings were used when searching databases with SEQUEST. The mass spectra were filtered for common contaminants such as keratin during the dta-file creation process. Our significant thresholds for Xcorr values were chosen very conservatively, which makes us loose a number of potentially good peptides, but reduces the number of false positive identifications. The thresholds were 1.75 for singly charged, 2.5 for doubly charged and 3.5 for triply charged parent ions. We used SEQUEST to match all acquired spectra against several databases to get the maximum result. These fasta-files contained the following sequences, which all stem from Chlamydomonas reinhardtii (i) the chloroplast and mitochondrion proteomes; (ii) the Joint Genome Institute (JGI) gene models; (iii) all available EST sequences; (iv) the six-frame translation of the genome; and (v) all of the above databases were used with and without possible PTM. When matching GPF results against the corresponding dta-files with SEQUEST, an in silico enzyme was used that cuts after each letter J. As J does not code for any amino acid and it is therefore absent from the sequences, this procedure ensures that none of the predicted sequences are actually cut, but accepted as full-length sequences without further processing. When using enzymes, as supplied by SEQUEST, predicted peptides were still processed, resulting in false positive identifications. These could be removed by using the above approach. In addition to the SEQUEST Xcorr value, we also calculated DCn values for all spectra that matched to multiple potential peptide sequences. The DCn value was calculated as follows:

The GPF does not directly accept PEAKS results as input. A level of indirection was introduced so that new de novo sequencing algorithms can easily be implemented to work with GPF. Furthermore, some filtering can be performed at this step. We accepted as input all those predictions from PEAKS, that contained at least eight amino acids in their sequence and whose score was higher than 10%. These predictions were extracted from the fas-files created by PEAKS. In that file format modifications are not reported. The masses of the peptides, however, are adjusted to the assumed modification. If modifications were assumed by PEAKS, the mass of the GPF query was adjusted accordingly, i.e. the mass calculated from the peptide sequence was used for searching.

DCn ¼ 1 

X Corr X Best Corr

(1)

All Xcorr values were normalized by dividing with the best Xcorr value for a spectrum. The result was subtracted from one to yield the final DCn score. Note that a value of 0 for the DCn score means either that two identifications are identical or that there is no second identification for the particular spectrum. In both cases, the identifications were retained in the dataset. All other identifications were deleted from the dataset if their DCn was below 0.08.

2.5 GPF All searches, whether performed on our Windows PC or on the LINIAC cluster, used the same GPF core compiled in the appropriate environment. There is thus no difference in algorithm employed while running in different environments. When matching the queries against the genome, we enforced that in the first search at least 5 amino acids of the prediction must exactly match the sequences found in the database. For the second search, within a window of ± 700 amino acids around hits from the first search, at least 3 consecutive amino acids had to match. A large gap of 700 amino acids was chosen to ensure full coverage of peptides potentially split by an intron. The value is higher than any intron we encountered so far. The difference of calculated and measured precursor mass was restricted to 1000 ppm. Tryptic cleavage sites (R or K) had to be present on both sides of the peptide sequence. These settings only allow us to find peptides of a minimal length of 8 amino acids. Shorter peptides are always missed. In addition, other peptides, which do not include stretches of 5 or more correct amino acids are missed. It should be of note that GPF cannot identify intronsplit peptides, where splice donor or acceptor sites are split, i.e. one base pair comes from one reading frame whereas the other two base pairs come from another, except when such a “split” does not change the encoding amino acid, which occurs frequently.

2.3 PEAKS

2.6 AutoMS

As we found that only little information could be gained from triply charged peptides, we completely excluded all charges higher than 2 from PEAKS processing. The parent ion tolerance was set to 1 Da and the fragment ion tolerance to 0.5 Da. For each dta-file five predictions were reported. We allowed the cysteines to be carboxy-amidomethylated. As we used heavy arginine in the experiments involving the 1-D SDS-PAGE, we allowed arginines to be 6 Da heavier.

AutoMS is a software designed and programmed in Microsoft Visual Studio .net using C++ programming language. Where possible, ANSI C++ was used. It runs under most Microsoft Windows operating systems. AutoMS automates the programs used in this study. It enables batch processing for hundreds of tasks which can easily be set and if required individually customized. Some data filtering can be performed and the significant results are reported in a plain text file that is easy to import in our database. All settings and

© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.proteomics-journal.com

6210

J. Allmer et al.

Proteomics 2006, 6, 6207–6220

options for the individual programs can be set in AutoMS as well. The settings used are described in the individual sections of the tools employed. File locations are adjusted to achieve a structured file repository. The code is robust enough to allow the software to run without interruption. GPF was run outside of the AutoMS environment. As all information was contained in a specific directory structure, it was, however, no problem to integrate the results after processing on the cluster. Picking up further processing downstream of GPF did not pose any problem due to the same reason. The software is still in its beta phase, but the executable is available upon request. 2.7 Confidence calculation for PEAKS-GPF-SEQUEST findings In a first screen the PEAKS-GPF predictions are validated by SEQUEST. Significance thresholds as described above were used for this confirmation. As PEAKS does not calculate a probability or an expectation value, we will base the assessment of the gain in confidence on calculations made by Mann and Wilm [45]. The GPF algorithm requires an initial match of a user-settable number of consecutive amino acids (here 5 sequential amino acids were required to represent a correct match). The probability of matching an amino acid in the sequence database is assumed to be 1/20. Therefore, the probability of matching an amino acid at any position in the sequence database is 1/20: Pmaa ¼

1 20

(2)

Pmaa : Probability for finding an amino acid in a sequence database at a given position. In addition to the correct sequence of amino acids, both, N- and C-terminal cleavage sites must be present in the sequence database. The probability of matching a cleavage site is equal to the number of amino acids recognized by the protease divided by the total number of amino acids Pcs ¼

N raa 20

(3)

Pcs: Probability for finding a cleavage site in a sequence database at a given position. Nraa: Number of amino acids recognized by the protease. Another constraint can be added to this probability. Mann and Wilm [45] postulated “The probability for matching a given mass m1 with unit mass accuracy is equal to the mean molecular weight of the amino acids.” Thus, Pm ¼

1 110

(4)

with Pm: Probability for randomly matching a given mass. We use Eq. (4) for assessing the probability of randomly matching a peptide with the measured precursor mass. The probability for a random match within a sequence database © 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

given these three constraints, matching amino acids, cleavage sites, and precursor mass is given by the combined probabilities of the three independent constraints. Prm = Pcs ?Pmaan ?Pm

(5)

Prm: Randomly matching a peptide in a sequence database n: Number of sequentially matching amino acids. For a peptide, which is not split by an intron when deduced from genomic DNA, found by the combination of PEAKS, GPF, and SEQUEST, the probability for a random match can thus be calculated  5 2 1 2 1 Prm ¼   3  1011   (6) 20 20 20 110 The six-frame translation of the second release of the C. reinhardtii genome, which contains about 108 nucleotides, consists of about 26108 amino acids. A match in this database will thus have a confidence of about 99.4% for being non-random. Peptides that are suggested to be split by an intron when deduced from genomic DNA have at least three more matching amino acids. Therefore, the probability for a random match can be computed as  5  3 2 1 1 2 1 Prm ¼   3:5  1015    (7) 20 20 20 20 110 The corresponding confidence for a non-random match in the six-frame translation of the C. reinhardtii genome computes to about 99.99%. The restraints for usability of this probability have been discussed [45] and are valid for this study. To assess the empirical false discovery rate for peptides identified by the combination of PEAKS/GPF/SEQUEST, which fulfilled the Xcorr and DCn criteria, we shuffled the amino acids of the six-frame translation of the genome of C. reinhardtii. We added all sequences identified by the combination of these tools to the database and then searched the spectra that gave rise to these identifications with SEQUEST. In total, 706 spectra were searched. Of these, 22 did not return the expected result as the best identification. Another 24 spectra did not yield a DCn value above 0.085 and are therefore considered false positive identifications. These possible false positive identifications represent about 7% of the overall identifications.

2.8 Database The database was designed in Microsoft Access. Microsoft Visual Basic was used to program all necessary features. The aim of the database is to map sequences found using distinct methods back to a sequence database. The table-space and the relations can be seen in Supplementary Fig. 9. We used the JGI gene models, the mitochondrion, and the plastid proteomes of C. reinhardtii for sequence information in our database. All SEQUEST, GPF, and PEAKS findings were www.proteomics-journal.com

Proteomics 2006, 6, 6207–6220

mapped to this sequence repository present in one table of the database. Automatic procedures were devised to extract significantly identified proteins from the database. Significantly identified proteins were either those that had more than one supporting peptide or those that had one supporting peptide, which in turn was supported by multiple methods. Additionally, the significance thresholds (see SEQUEST) had to be met for each of these supporting peptides as well. The confidently identified proteins were kept in experimental context. That means that only peptides from a certain band or a number of related bands were combined to identify proteins. The threshold to combine the peptide pool of multiple SDS-PAGE bands was ± 5% of their molecular weight. Peptides that would be the single supporting source for a given protein were checked for sequence complexity. If there was less than 40% variability in their sequence, the result was discarded. For a peptide with ten amino acids, at least four different amino acids need to constitute the sequence in order to pass this filter. Only a very small fraction of the identified peptides could not pass this filter. A number of functions were designed to automatically extract more information on each of these proteins via the internet. All data leading to confident protein identification can be found in the online supplement, where we included a zipped instance of the Access database containing all results, which led to protein identifications for download: http:// www.pepprotdb.de.ms.

3

Results

It was our aim to integrate de novo amino acid sequence predictions and GPF in the standard computational processing workflow for MS data analysis. To achieve this, de novo amino acid predictions were used as search strings in an error-tolerant genomic database search, performed by GPF, to detect intron-split and/or alternatively spliced peptides when deduced from genomic DNA. Using this platform, we explored the thylakoid proteome of the green alga C. reinhardtii, to identify proteins, facilitate annotation of gene models as well as to recognize proteins that may originate from alternative splicing. The thylakoid membranes were isolated from iron-sufficient and iron-deficient arginine auxotrophic cell wall-less cells that were grown in the presence of either isotopically labeled 13C6 or unlabeled 12C6 arginine. The aim was to perform comparative quantitative proteomics to elucidate adaptation to iron-deficiency (Naumann, Allmer, Hippler, manuscript in preparation). Labeled and unlabeled thylakoid proteins were mixed and separated by 1-DE, protein bands were excised from the gel, digested in-gel with trypsin and analyzed by LC-MS/MS. In total, 126 bands were analyzed from four independent SDS-PAGE fractionations. We introduced a new approach in computational processing designed to gain maximum information from the MS data gathered, which is synergetic with other approaches described earlier [46]. The complete workflow downstream of © 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Plant Proteomics

6211

spectra acquisition is summarized in Fig. 1. As depicted therein, six different tasks were performed on the spectra before the results were imported into a database. First, the spectra were extracted from the raw-file produced by the software of the mass spectrometer. Then, SEQUEST searches were performed on several databases. The significant results were extracted and stored in the database. So far, the process was well established. In addition to this, we also performed de novo amino acid sequence predictions using the PEAKS algorithm [14]. All recorded singly and doubly charged mass spectra were submitted to de novo amino acid analysis by PEAKS. The predictions were converted to queries for the GPF and searched against the six-frame translation of the genomic database of C. reinhardtii. To close the loop, the GPF predictions were then evaluated by SEQUEST against the original spectra. The process is illustrated with an example in Fig. 2 that shows a recorded spectrum, its b- and y-ions, and the sequences as given by PEAKS and GPF, respectively. The PEAKS prediction permitted, although varying at N and C terminus, identification of a peptide sequence with significant SEQUEST Xcorr value when searched against the translation of the genomic database with GPF. This result was also present in both, the EST data and the gene models, which underlines the significance of this finding. SEQUEST and PEAKS employ distinct methods to analyze the data contained in the recorded mass spectra. The SEQUEST algorithm uses cross correlation of measured mass spectra to spectra computationally derived from peptide sequences in a database. PEAKS obtains sequences from the mass spectra themselves employing a de novo amino acid sequencing algorithm. As the two methods are distinct, same results are complementary and are considered to raise the confidence in a particular finding. Consecutively, the significant results were filtered and imported into a database. Many of these steps involved manual input and adjustments to options for the various processes. This proved very tedious and error prone. Therefore, we devised a program named AutoMS that automated these tasks. Using AutoMS, we were able to make full use of de novo amino acid sequencing by PEAKS and error-tolerant database search with GPF in a high-throughput fashion. The use of a 256 processor cluster at the University of Pennsylvania (The LINIAC cluster), enabled faster processing of the de novo amino acid sequence predictions with GPF. In total, 435 475 de novo amino acid sequence predictions, originated from the MS analysis of the 1-D PAGE bands. These predictions were run against the six-frame translation of the genomic database and produced 76 996 817 GPF predictions. After validation of these predictions with SEQUEST, 448 peptides could be imported into the database. Among these peptides, 98 were potentially split by an intron. The concerted action of SEQUEST and GPF allowed identification of 2622 non-redundant peptide sequences. About 3% (2.88) of the sequences, identified by SEQUEST, were predicted to have the same sequence by PEAKS, directly. Using GPF to map the PEAKS predictions www.proteomics-journal.com

6212

J. Allmer et al.

Proteomics 2006, 6, 6207–6220

Figure 1. Computational processing of the MS/MS spectra acquired by MS. Certain data associated with each process such as processing time are presented in the boxes above, each box representing a distinct process. Processing times were calculated for one PC if not indicated otherwise. Most triply charged dta-files were not submitted to de novo prediction analysis. The GPF core is the same in both PC and UNIX distribution. Most GPF processing was done on the LINIAC Cluster.

back to the genome to find matching sequences in-between SEQUEST and this combination significantly increased the number of matches [~18% (17.95)]. Non intron-split peptides identified by SEQUEST and PEAKS/GPF/SEQUEST analysis were taken for protein identification purposes. Proteins that matched with at least two peptides identified by the SEQUEST algorithm were considered confidently identified. We further considered every protein that was recognized by at least one PEAKS/ GPF/SEQUEST peptide, matching the SEQUEST significant criteria, as proper identification. Given the mass of the peptide within 1/2 1 Da and an exact matching of five amino acids, we calculated that any peptide found with the combination of PEAKS, GPF, and SEQUEST can be used as a single supporting peptide for protein identification in respect to the size of the translated Chlamydomonas genomic database (see Section 2). Of the non intron-split peptide sequences identified by both, SEQUEST and GPF, 292 could be directly mapped to existing gene models, thus underlining the correctness of the gene model prediction in those areas. Interestingly, several lower abundant proteins were identified by a peptide, as the single supporting evidence (see Table 1) like a novel light-harvesting protein not described on protein level before © 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

(LhcbP1, C_20371), an ATP-sulfurylase (C_20033), a DegP type serine protease (C_1010043), the Stt7 serine/threonine protein kinase (C_1150050) [47], a hypothetical glutathione S-transferase-related protein (C_1470041), a hypothetical rhodanese like protein (C_20358), a FKBP-type peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (C_1630014), a hypothetical protein up-regulated under low carbon dioxide (C_530007), two other hypothetical protein (C_250026, C_1450004) and a putative lumen protein, related to OEE3, PsbQ (C_180041) and other peptides (see Supplementary Tables 1, 2, 3 and 4). This was also true for 21 peptides that were split by an intron when deduced from genomic DNA. The corresponding intron-exon structure for these gene models could thus be validated for the associated area, thereby facilitating gene model annotation. Among the recognized proteins, two intron-split peptides identified the corresponding proteins as sole supporting peptide hits, one peptide matched with NADH:ubiquinone oxidoreductase B17.2-like subunit (C_140183) (Supplementary Table 3) and another with a putative chloroplast inner envelope protein (C_320089) (Table 1). Other intron-split peptides could be used to correct corresponding gene models (Fig. 3). Figure 3 shows the correction of a Chlamydomonas gene model (JGI release 2.0) by the use of two intron-split peptides stemming www.proteomics-journal.com

Proteomics 2006, 6, 6207–6220

Plant Proteomics

6213

Figure 2. (A) The figure shows the recorded MS/MS spectrum (bnT_3_063004.1322.1322.2.dta) with labeled b-ions and y-ions. Both, the most significant PEAKS prediction and the GPF prediction are displayed in the graph as amino acid sequences. They are centered amidst their corresponding b-ion peaks. Of the PEAKS predictions, the first two led to the same GPF result. The third and fourth prediction did not lead to a significant GPF result. The fifth prediction was not tested, as it was filtered due to low confidence (
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.