GlycoSpectrumScan: Fishing Glycopeptides from MS Spectra of Protease Digests of Human Colostrum sIgA

Share Embed


Descrição do Produto

GlycoSpectrumScan: Fishing Glycopeptides from MS Spectra of Protease Digests of Human Colostrum sIgA Nandan Deshpande, Pia H. Jensen,† Nicolle H. Packer, and Daniel Kolarich* Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, 2109, Australia Received October 22, 2009

With the emergence of glycoproteomics, there is a need to develop bioinformatic tools to identify glycopeptides in protease digests of glycoproteins. GlycoSpectrumScan is a web-based tool that identifies the glycoheterogeneity on a peptide from mass spectrometric data. Two experimental data sets are required as inputs: (1) oligosaccharide compositions of the N- and/or O-linked glycans present in the sample and (2) in silico derived peptide masses of proteolytically digested proteins with a potential number of N- and/or O-glycosylation sites. GlycoSpectrumScan uses MS data, rather than MS/MS data, to identify glycopeptides and determine the relative distribution of N- and O-glycoforms at each site. It is functional for assigning monosaccharide compositions on glycopeptides with single and multiple sites of glycosylation. The algorithm allows the input of raw mass data, including multiply charged ions, making it applicable for both ESI and MALDI data from all mass spectrometer platforms. Manual analysis time for identifying glycosylation heterogeneity at each site on glycoprotein(s) is substantially decreased. The application of this tool to characterize the N- and O-linked glycopeptides from human secretory IgA (sIgA), consisting of secretory component (7 N-linked sites), IgA1 (2 N-linked, e5 O-linked sites), IgA2 (4 N-linked sites) and J-chain (1 N-linked site) is described. GlycoSpectrumScan is freely available at www.glycospectrumscan.org. Keywords: bioinformatics • glycoinformatics • N-glycan • O-glycan • glycosylation • sIgA • glycoproteomics • glycopeptide • graphitized carbon • mass spectrometry • colostrum

Introduction Glycosylation is one of the most universal and structurally diverse eukaryotic post-translational modifications (PTM) with more than half of all mammalian proteins known to be glycosylated.1 The significance of correct glycosylation in various biological events such as protein secretion, stability and immunogenicity2 and the direct correlation of aberrant glycosylation with diseases like cancer and congenital disorders of glycosylation are well documented.3,4 Identification and characterization of glycans together with their particular protein carriers is crucial for deciphering glycan functionality in normal and diseased conditions. The nontemplate nature of glycan synthesis makes analysis and prediction of sugar structures a challenge and glycan heterogeneity on a large proportion of the proteins in all cellular systems further adds to this complexity. N-linked glycans are (with some exceptions) attached to the protein backbone only at “Asn-X-Thr/Ser” motifs (sequons). Nevertheless, most glycoproteins possess more than one se* To whom correspondence should be addressed. Dr. Daniel Kolarich, Biomolecular Frontiers Research Centre, Department of Chemistry & Biomolecular Sciences, Macquarie University, Sydney, 2109 Australia. E-mail: [email protected], [email protected]. Telephone: +61-2-9850 6914. Fax: +61-2-9850 8313. † Present address: Institute of Medical Biology, University of Southern Denmark, 5230 Odense, Denmark. 10.1021/pr900956x

 2010 American Chemical Society

quon, which are very often differentially glycosylated5,6 with from zero to one hundred percent site occupancy.7 An even greater micro- and macroheterogeneity is also observed for O-linked glycans attached to serine/threonine residues which lack a known consensus sequence and are often found on multiple sites in close proximity.8 Diversity of site specific glycosylation on a protein is generally protein dependent and strongly influenced by the cell type and state. This heterogeneity is often reflected by detection of the same protein with different types and degrees of glycosylation, for example, in IEF and 2D SDS-PAGE experiments where a single protein backbone can result in a “train” of spots over a wide pH-range in the gel.9 Mass spectrometry is the preferred technique adopted for glycan and glycoprotein analysis due to its excellent sensitivity, capability and versatility in handling highly complex mixtures by combining various kinds of separation techniques such as LC or CE, with online and offline MS.10 A challenge in glycoproteomics has been the analysis of heterogeneously glycosylated peptides in the mass spectra of protease digests. Partially due to their heterogeneity, glycopeptide signals are generally of lower intensity and found in a higher mass range compared to nonglycosylated peptides. Standard CID MS/MS experiments generally result in fragmentation of the glycosidic bonds and provide some information on the monosaccharide composition but rarely answer this problem since (1) minor Journal of Proteome Research 2010, 9, 1063–1075 1063 Published on Web 12/23/2009

research articles

Deshpande et al.

glycopeptide signals are often not selected for an MS/MS experiment and (2) during the time of MS/MS fragmentation the mass spectrometer does not record relative quantitative data of different glycoforms of the same peptide. Thus the relative distribution analysis of the glycoforms of a particular glycopeptide is difficult to determine by this approach. Once located, the identification and characterization of the glycopeptides remains a highly time-consuming and complex task. Though in theory many proteomic identification algorithms allow the inclusion of modification masses on peptides to increase the success of identification, the vast number of possible glycan structures that may be present makes inclusion of all potential glycomodifications impossible. Present day software tools are fine-tuned for accurate analysis of peptide mass and sequence for protein identification, and glycopeptide data is usually not considered. Some prediction tools do exist which can assist in computing the range of possible glycopeptide compositions from MS experimental data.11–17 The web based tool GlycoMod, freely available on the ExPASy server, suggests the potential oligosaccharide compositions corresponding to a particular glycopeptide mass, if the peptide or protein sequence is known. This tool links the information to those compositions reported previously in the literature and assembled in the GlycoSuiteDB database (now publicly available on the ExPASy server). The many possible oligosaccharides that can comprise the glycopeptide input mass however make it difficult to apply this tool to whole spectra. Ozohanics et al. (2008) developed Glycominer, a software tool that uses MS/ MS spectra to identify and characterize “both parts” of glycopeptides (sugar and peptide) and reported very good discovery rates of glycopeptides with almost no false positives.14 A beta version of their software is available but is currently optimized for pkl data files (Waters ProteinLynx data). A different algorithm for identifying glycopeptides from MS/MS spectra was published by Ren et al. (2007) in combination with a linearized Glycan Structure database (GlyDB).16 Another automated approach for glycopeptide identification using MS/MS spectra was published by Joenva¨a¨ra¨ et al. (2008) and applied to human plasma, where they identified 80 glycopeptides with various attached glycan compositions.15 GlycoX by An et al. (2006) has been developed to identify glycosylation sites and oligosaccharide heterogeneity of glycopeptides from MALDI-FT-ICR MS spectra and successful application to a variety of model and unknown glycoproteins was reported.12 No data was presented on multiply charged ESI mass spectra or multiply glycosylated glycopeptides. Peptoonist developed by Goldberg and coworkers (2007) uses both MS/MS and MS data to identify and also qualify glycopeptides that are not selected for MS/MS.17 Their algorithm apparently requires high resolution and high accuracy MS data which limits its applicability across MS platforms and, similar to many of the above software packages, the authors were not able to access a publicly available version. There remains a need for bioinformatic tools that are robust, freely available and instrument independent and that allow scanning of MS data for the actual (rather than potential) glycopeptides in a mixture of peptides and glycopeptides. To minimize false positive assignments, an accurate analysis of the actual oligosaccharide compositions that are present on the protein, and thus are possible on the glycopeptide, rather than a calculation of those which may be possible from an observed mass difference, is of benefit. GlycoSpectrumScan, as described in this paper, was therefore developed to allow rapid identification of all glycopeptide masses present in a 1064

Journal of Proteome Research • Vol. 9, No. 2, 2010

Figure 1. Human colostrum sIgA blotted onto PVDF-membrane after SDS-PAGE separation. The protein MW marker is given in kDa.

combined MS scan of a protease digested glycoprotein using an accurate predetermination of all oligosaccharide structures actually present on the protein(s). Any new bioinformatic tool needs validation. We have chosen a set of differentially glycosylated proteins to show the versatility as well as the boundaries of bioinformatic tools developed to address the complex field of glycoproteomics. Secretory Immunoglobulin A (sIgA) from human colostrum is a protein complex with three glycosylated subunits18 and thus is the perfect example for testing GlycoSpectrumScan. Specifically, sIgA is comprised of several very different glycoproteins: secretory component (7 N-glycosylation sites, UniProt Entry P01833), IgA1 (2 N- and 5 O-glycosylation sites, P01876), IgA2 (4 N-glycosylation sites, P01877) as well as the so-called Joining Chain (J-chain, 1 N-glycosylation site, P01591) and the λ and κ chains (the latter two are not known to be glycoproteins, Figure 1). These proteins are commercially available, and to our knowledge, little data has been reported on the site specific glycosylation of these glycoproteins.19 An in silico tryptic digest (Table 1) showed that these glycoproteins provide a wide range of potential N-glycosylated peptides as well as a challenging O-glycosylated peptide in IgA1 which contains up to 5 Oglycosylation sites on one tryptic peptide.20 Here we present the development of a bioinformatic tool, GlycoSpectrumScan, which can, independent of MS-platform, accurately identify and assign the oligosaccharide heterogeneity on glycopeptides from MS data of a mixture of peptides and glycopeptides. It was successfully applied to characterize the site specific glycosylation of the sIgA glycoprotein complex from human colostrum. GlycoSpectrumScan drastically reduces the researchers’ manual data interpretation efforts and is freely available on www.glycospectrumscan.org.

Material and Methods Materials. Secretory IgA from human colostrum (sIgA), sodium borohydrate, ammonium bicarbonate, LC-MS grade acetonitrile and potassium hydroxide were obtained from Sigma. Glycerol free Peptide-N-glycosidase F (PNGase F) was

research articles

Fishing for Glycopeptides

Table 1. Glycosylation Sites Reported in the sIgA Proteins and in silico Predicted Tryptic Peptides Used for the Glycopeptide Screening with GlycoSpectrumScana glycosylation sites

peptide mass [M]

position

MC#

secretory component, P01833, Polymeric immunoglobulin receptor Asn83 + Asn90 2808.0 82-107 0 3099.4 82-109 1 3255.5 78-107 1 Asn135 2176.4 118-138 0 Asn186 2525.8 168-190 0 2525.8 2795.2 168-192 1 2795.2 Asn421 2402.7 413-434 0 Asn469 2391.8 457-479 1 1397.6 466-479 0 Asn499 2626.9 494-515 1 1989.1 498-515 0 1990.1 498-515 0 IgA1, P01876, Ig alpha-1 chain C region Asn144 2964.4 Asn340 2833.2 2347.7 Ser105 4138.6 Ser111 Ser113 Ser119 Ser121 IgA2, P01877, Ig alpha-2 chain C region Asn47 4778.3 Asn92 2910.3 Asn131 2964.4 Asn205 958.1 Asn327 2851.3 2365.7 J-Chain, P01591, Immunoglobulin J chain Asn49 1228.3 1485.6 2148.4 2405.7

peptide sequence

ANLTNFPENGTFVVNIAQLSQDDSGR ANLTNFPENGTFVVNIAQLSQDDSGRYK YAGRANLTNFPENGTFVVNIAQLSQDDSGR GLSFDVSLEVSQGPGLLNDTK QIGLYPVLVIDSSGYVNPNYTGR Pyroglutamic acid on Q168 QIGLYPVLVIDSSGYVNPNYTGRIR Pyroglutamic acid on Q168 LSLLEEPGNGTFTVILNQLTSR IIEGEPNLKVPGNVTAVLGETLK VPGNVTAVLGETLK YWCKWNNTGCQALPSQDEGPSK WNNTGCQALPSQDEGPSK WNDTGCQALPSQDEGPSK

127-153 328-353 332-353 89-126

0 1 0 0

LSLHRPALEDLLLGSEANLTCTLTGLR TIDRLAGKPTHVNVSVVMAEVDGTCY LAGKPTHVNVSVVMAEVDGTCY HYTNPSQDVTVPCPVPSTPPTPSPSTPPTPSPSCCHPR

8-51 89-113 114-140 200-208 315-340 319-340

0 0 0 0 1 0

VFPLSLDSTPQDGNVVVACLVQGFFPQEPLSVTWSESGQNVTAR HYTNPSQDVTVPCPVPPPPPCCHPR LSLHRPALEDLLLGSEANLTCTLTGLR TPLTANITK TIDRMAGKPTHVNVSVVMAEVDGTCY MAGKPTHVNVSVVMAEVDGTCY

48-58 48-60 40-58 40-60

0 1 1 2

ENISDPTSPLR ENISDPTSPLRTR IIVPLNNRENISDPTSPLR IIVPLNNRENISDPTSPLRTR

a Italic: potential missed cleavage and/or modified peptide sequences entered into the software, but not found in the analysis of the glycopeptide mass spectra. (#MC ) number of missed tryptic cleavages.)

from Roche and PVDF membranes were obtained from Millipore. Cation exchange resin (Dowex AG 50W X8) was from BioRad. Sequencing grade trypsin was from Promega. Software Development. GlycoSpectrumScan is developed using open source technologies like Zope (V2.8.1) (http:// www.zope.org) and Python programming language (V2.4.3) (http://www.python.org). Zope is a web application server primarily written in python and displays a stable and reliable environment for development of dynamic HTML templates for web based tools like GlycoSpectrumScan. The powerful programming capabilities of python which include the highly flexible imaging module called PIL (Python Imaging Library) (http://www.pythonware.com/products/pil/) and the threading module (for multithreading processing of the user queries) have been efficiently incorporated in the development of the tool. Browser and platform independent javascripts have been used for user data-entry validations and also for enhancing the flexibility of the query and output pages. GlycoSpectrumScan is available as a web service at www.glycospectrumscan.org. Glycan Analysis. For determination of the N- and O-glycan structures attached to the protein(s) the glycoproteins were electro blotted after SDS-PAGE separation onto PVDF mem-

branes as described previously.21 In short, N-glycans were enzymatically released with PNGase F and collected from the membrane before O-glycans were released chemically from the same spot by reductive β-elimination. Reduced N- and Oglycans were separated by graphitized carbon chromatography (300 µm × 10 cm, in-house columns). Liquid chromatography ESI-MS was performed on an Ultimate 3000 (Dionex, Sunnyvale, CA) coupled to a HCT Ultra ion trap (Bruker Daltonics, Germany) or a 1100 Cap-LC coupled to a XCT ion trap MS (Agilent, Santa Clara, CA). N- and O-glycans were identified and characterized based on retention time, MS and MS/MS as described previously.22,23 Glycopeptide Preparation. Five µg of sIgA was loaded onto 4-20% gradient gels (DKSH, Australia) and the component proteins in the sIgA complex were separated according to the manufacturer’s recommendations. To avoid unwanted modifications of cysteines with acrylamide in the gel, iodacetamide (50 mM final concentration) was used to alkylate the reduced sample for 30 min in the dark before electrophoresis. Gels were stained using the zinc-imidazole stain.24 Visible bands were cut, washed and peptide and glycopeptide mixtures were prepared by trypsin digestion5,6 and separated and analyzed by RPJournal of Proteome Research • Vol. 9, No. 2, 2010 1065

research articles LC-ESI-MS and MS/MS as described in detail below. No explicit glycopeptide enrichment was performed and the tryptic digests were loaded as is for analysis. Glycopeptide Analysis. LC-MS analysis of glycopeptides was performed on an HCT ion trap coupled to the Ultimate 3000 and on a Q-TOF Ultima (Waters, UK) as described previously,6 with an Atlantis NanoEase trap and Atlantis dC18 NanoEase analytical column (3 µm, 75 µm × 100 mm, Waters, UK). When glycopeptides were analyzed using the Dionex-HCT LC-MS system, glycopeptides and peptides were loaded directly onto a ProteCol C18 column (300 µm ID × 10 cm, SGE, Australia) equipped with an 0.5 µm peek filter (Upchurch, Oak Harbor, WA) in front of the column. The column was equilibrated in 100% solvent A (0.1% formic acid in water), the sample loaded and a gradient up to 50% (0.5%/min slope) solvent B (0.1% formic acid in acetonitrile) was applied before washing the column in 80% B for 10 min and reequilibration in starting conditions. A m/z range from 500-2500 Da was scanned. Two separate LC-MS runs were performed for every sample with one performing MS/MS analyses for subsequent protein identification and one where only MS scans were performed for recording the peptide/glycopeptide mass profiles. Biotools 3.1 with Mascot search (Bruker Daltonics) was used for protein identification (Homo sapiens, UniProt protein Database). Elution times of glycopeptides were manually identified when required based on MS/MS fragment spectra of individual LC peaks. MS-Spectra Preparation for Submission to GlycoSpectrumScan. A crucial point for the success of any bioinformatic tool is the quality of the submitted data. Generally mass spectra are submitted to GlycoSpectrumScan using a simple m/z vs intensity table ensuring MS-instrument independent functionality. However, the great variety of mass spectrometers results in spectra of different quality with regard to resolution and mass accuracy. Low resolution ion trap data will in most cases give limited monoisotopic mass information on multiply charged glycopeptides. In this case the best results are obtained when the respective spectra are strongly smoothed to produce single peaks of average mass and a slightly higher mass tolerance is used for submission to GlycoSpectrumScan. In contrast, high resolution instruments, for example, QTOFs, Orbitrap or FT-MS instruments allow narrower ranges with regard to mass tolerance and accuracy. Though the GlycoSpectrumScan algorithm takes different charge stages into account, deconvolution and/or deisotoping of the summed MS spectra can help significantly in simplifying the mass spectrum and can help avoid “false positives”. This process of deconvolution/deisotoping is often routinely included in many protein identification tools to simplify the spectra and data inputs before submission to the respective database algorithms. If, however, the required software is not available the raw data should be centroided before submission to GlycoSpectrumScan. For the data presented on sIgA, MS spectra of glycosylated peptides covering the entire LC elution time of a particular glycopeptide and its glycoforms were summed and prepared for submission to GlycoSpectrumScan as follows: MS spectra from ion trap MS data were smoothed rather harshly (Savitzky Golay, 0.25 [Da] smoothing width, 3 cycles) in order to obtain a single mass peak per signal reflecting the average mass of the glycopeptide. These spectra containing multiply charged signals were then submitted to GlycoSpectrumScan for analysis. Glycosylated peptide mass spectra from Q-TOF data were 1066

Journal of Proteome Research • Vol. 9, No. 2, 2010

Deshpande et al. prepared with harsh smoothing giving a single average signal (Savitzky Golay, Smooth Window 4, Number of smooths 16) as well as smoothed softly (Savitzky Golay, Smooth Window 3, Number of smooths 2) due to the higher resolution on the Q-TOF. The latter were then deconvoluted and deisotoped with MaxEnt3 (Waters) before submission to the GlycoSpectrumScan software.

Results Glycoprotein analysis primarily involves some kind of isolation and/or separation of the glycoprotein(s) of interest from the complex biological mixture. Reduction and alkylation followed by proteolytic digestion is generally applied to obtain glycopeptides, whereas PNGase F digestion and reductive β-elimination sets the basis for N- and O- glycan analysis respectively. Mass spectrometry has developed as the major detection method for subsequent analysis of both these biomolecules. GlycoSpectrumScan allows the scientist to combine the results from the above two MS experiments and assists in identifying glycopeptides within a mixture of peptide and glycopeptide masses. The GlycoSpectrumScan algorithm requires two experimental data sets as inputs (Figures 2 and 3): first, the glycan monosaccharide compositions derived from the masses obtained from the analysis of the N- and O-linked oligosaccharides released from the glycoprotein(s) are entered into the software. These composition files can be stored on the user’s computer and uploaded whenever needed or extended and edited. It is recommended to determine the glycan compositions from the experimental masses using tools such as GlycoMod.11 Second, the peptides that may be potentially glycosylated in the protein are submitted via any one of the following options: (i) peptide sequence, (ii) peptide mass (can include mass of any other modification present on the peptide) and (iii) protein sequence. In the latter case the protein sequence is subjected to theoretical protease digestion using the PeptideMass tool25 which has been integrated into GlycoSpectrumScan. PeptideMass cleaves the protein using the selected protease and computes proteolytic peptide masses. All potentially glycosylated peptides in the protein of interest, based on the presence of the N-X-S/T sequon for N-glycans or the presence of S/T for O-glycans, are used for computing all glycoform combinations of each peptide. The experimental MS data is entered, and all calculated peptide glycoform combinations are screened for in the mass spectra. Acquisition dependent parameters of ion mode, adducts and allowed mass tolerance are defined before uploading mass spectra of interest derived from a proteolytic digest. If the number of glycans released from a peptide is defined by “n” and the number of glycosylation sites on the peptide is defined as “k”, the following equation is used to determine the possible combinations (nCk) by which the available glycans can be present on the available glycosylation sites of the glycopeptide, each combination representing a theoretical glycoform entity.

n

Ck )

(n + k - 1)! k!(n - 1)!

The glycoform masses corresponding to the above combinations are then computed over a variety of selected charge states (z) using the released oligosaccharide masses data sets, theo-

Fishing for Glycopeptides

Figure 2. (Top) Glycoproteomics work-flow incorporating GlycoSpectrumScan tool. (Bottom) Process of sample preparation and identification of glycans and peptides from the glycoprotein is shown in “Experimental data”. The tool GlycoSpectrumScan then uses the inputs: identified glycans and peptide details for computing all possible theoretical glycoform combinations which is followed by mapping the REAL glycoforms on the mass spectrum.

retically derived peptide masses and the user selected parameters of ion mode/adducts. The glycoform masses thus generated (m/z) constitute a theoretical subset which should contain all glycopeptide m/z values included in the experimentally detected masses in the mass spectrum. The mass spectrum is scanned with these putative mass values after applying user defined filters such as mass tolerance, charge states and intensity cutoff. The theoretical glycoform masses which match the actual mass value peaks in the mass spectrum are highlighted as real glycopeptide peaks with a color coded display for the same glycoforms detected at different charge states (Figure 4). From these highlighted glycoforms the relative abundance of each glycoform of a peptide is calculated. Salient Features of GlycoSpectrumScan. 1. Multiple Charge States. The GlycoSpectrumScan tool can accept a choice of single or multiply charged ion charge states as input to be used to compute peptide glycoform masses. Filters are also available in the output to restrict the highlighted glycoform peaks to user desired charge states.

research articles 2. Multiple Peptide Scanning. GlycoSpectrumScan allows the user to scan the mass spectrum with a range of peptide compositions, in parallel, using the common set of glycan compositions. The input can be individual peptides uploaded by the user or peptides generated from theoretical digest of the protein under study, using the PeptideMass tool. 3. Overcoming False Negatives. The GlycoSpectrumScan algorithm generates all possible theoretical peptide glycoform combinations for a given set of released glycans and possible peptide glycosylation sites, which are then overlaid on the actual MS data set. This approach nullifies any possibility of false negatives (missed peptide glycoform masses) allowing the scientist to detect all glycopeptides, even those represented by low intensity m/z peaks in the MS, which are often missed during manual identification. 4. Fine-Tuning for Removal of Possible False Positives. There is the possibility of glycopeptide masses being computed theoretically which match m/z peak(s) in the MS data set which are not actually glycopeptide compositions. A range of selection parameters are provided at the output level for the scientist to assess/edit the final list as being biologically possible glycopeptide compositions. 5. Glycoform Identification Platform. GlycoSpectrumScan provides a complete platform for methodical and biologically relevant assignment of heterogeneous peptide glycoform masses detected by a mass spectrometer. This includes an easily downloadable final peptide glycoform mass output with corresponding spectral intensity. GlycoSpectrumScan thus facilitates the discovery of differences in the glycans attached to specific peptides. 6. Improves time efficiency in discovery rate. Taking into consideration the highly heterogeneous nature of the glycans, the GlycoSpectrumScan tool is extremely efficient in identifying ALL possible glycopeptides in a MS spectrum. With its flexible and user-friendly interface, the tool substantially reduces data analysis time. The overall glycopeptide identification and analysis time is considered to be e5% of manual interpretation time. 7. MS-Platform Independent. Submission of m/z vs intensity tables assures that data can be submitted independent of the MS-instrument used. In the course of this work, LC-ESI-MS data was used, however the software is not limited to this input and MALDI-TOF MS data can equally be used. 8. Peptides with Both N- and O-Glycosylation and Other Modifications. The possibility of input of a user-defined peptide mass with a number of possible glycosylation sites allows the assignment of glycopeptides with both N- and O-linked glycosylation, as well as other mass modifications. Application of GlycoSpectrumScan. Glycan characterization and protein identification of human colostrum sIgA components. sIgA was separated into its protein components by SDS-PAGE, stained and blotted to PVDF (Figure 1). Peptides and glycopeptides were obtained by in-gel tryptic digestion and were separated by LC-ESI-MS/MS5 and released N- and O-glycans were isolated from the PVDF blots and analyzed by carbon LC-ESI-MS/MS.22,23 The component proteins were identified as polymeric immunoglobulin receptor aka secretory component (∼80 kDa), IgA1 and IgA2 heavy chain (∼60 kDa), J-Chain (∼26 kDa) and λ and κ chains (∼26 kDa, Supplementary Table 1, Supporting Information, Figure 1). The N- and (if present) O-glycans from each band were characterized in terms of monosaccharide composition (Table 2 and Supplementary Table 2, Supporting Journal of Proteome Research • Vol. 9, No. 2, 2010 1067

research articles

Deshpande et al.

Figure 3. Web user interface of GlycoSpectrumScan. Information on glycan composition (orange) and protein ID (yellow) are required to screen potential glycopeptide MS spectra (green). Deciding the type of glycosylation (red) determines if potential N-glycosylation (N-X-S/T [X*P]) sequon or potential O-glycosylation sites (S/T) are considered if peptide/protein sequences are used in the protein input.

Figure 4. Example of a (A) positive and (B) negative hit in GlycoSpectrumScan. The summed mass spectrum of potential glycopeptide signals eluting at retention time∼33 min is subjected to GlycoSpectrumScan analysis. (A) Using peptide 498-515 (C ) CysCAM) the signals are assigned to correspond to this peptide as several different glycoforms. (B) When peptide 494-515 is entered as the peptide mass essentially just random signals below a set threshold are labeled and no glycopeptides are identified. 1068

Journal of Proteome Research • Vol. 9, No. 2, 2010

research articles

Fishing for Glycopeptides Table 2. N-Glycan Compositions and Their Relative Abundance Identified from the Three Protein Bands Obtained after SDS-PAGE Separation and Blotting to PVDF Membranea rel. abundance [%] [M - H]-

1235.5 1397.4 1463.6 1479.6 1520.6 1559.6 1567.4 1584.6 1625.6 1641.6 1666.8 1682.6 1713.8 1721.6 1729.6 1746.6 1770.6 1787.6 1828.8 1844.8 1875.8 1883.8 1891.8 1892.8 1916.8 1932.8 1933.6 1973.8 1990.8 2078.8 2079.8 2119.8 2136.0 2223.8 2224.8 2282.0 2369.8 2573.0

Hex

HexNAc

5 6 3 4 3 7 4 5 4 5 3 4 4 8 5 6 4 5 4 5 5 9 6 6 4 5 5 4 5 5 5 4 5 5 5 5 5 5

2 2 4 4 5 2 3 3 3 4 5 5 3 2 3 3 4 4 5 5 3 2 3 3 4 4 4 5 5 4 4 5 5 4 4 5 4 5

Fuc

NeuAc

secretory component

1 1.3 0.3 1.8

1 1

3.8

3.0 1

1 1

19.4

1 1

2

2 1 1 1

1 1 1 2 1 1 2 2

9.8 4.6 13.2

0.8 0.8 3.1 34.6

1 1 1 3 1

1.9 0.4 13.3 0.9

1 1

2 1

6.7

1.1 0.6

1

1

1.4 23.6 9.3

1

1

J-Chain

1.1 1.0 0.1 0.8 25.6 1.1 0.2

1

1 1

sIgA

12.1 17.5

5.3

0.8 0.6 0.4 4.2 2.6 1.1 1.0 0.8 2.8 0.8 0.4 0.3 1.3 0.3 0.3

27.9

21.0

7.2

5.9

a The released glycans were analysed using porous graphitised carbon LC-ESI-MS/MS. IgA1 and IgA2 were not separated in the electrophoresis and therefore the glycans correspond to the glycan pool present on the combined entity. The major glycans are shown in bold and the corresponding structures are depicted in Supplementary Figure 1, Supporting Information.

Information) and structure (Supplementary Figure 1, Supporting Information) and the results were in agreement with previous analyses of the N- and O-glycans attached to these proteins.18 In short, the secretory component contained highly fucosylated N-linked glycan structures (up to three fucose residues on diantennary structures) whereas the IgA chains had high amounts of bisected, biantennary N-glycans with mainly single, core fucosylation. The J-chain glycosylation profile revealed biantennary, partially core fucosylated N-glycans with up to 2 sialic acid residues. The compositional data sets of the identified proteins and their overall global glycoprofile (Table 1, 2 and Supplementary Table 2, Supporting Information) are the necessary and sufficient basis of the subsequent glycopeptide analyses facilitated by the GlycoSpectrumScan software. GlycopeptidessFinding the Needle in the Haystack. A prerequisite for analyzing glycopeptides in a protease digest is

identifying them in a mass spectrum of the usually more intense, unmodified peptides. To identify the elution position of the glycopeptides in a LC-ESI-MS base peak chromatogram, MS/MS spectra can be used to locate those glycopeptide masses which fragment to give the typical saccharide masses. Positive ion CID MS/MS fragmentation of glycopeptides generally results in specific fragments that differ with m/z ions corresponding to the component monosaccharide masses (e.g., Hexose ) 162 Da as a singly charged signal, 81 Da as doubly charged, 54 Da as triply charged and so on) as well as singly charged sugar fragments (“B”-type oxonium ions that correspond to fragmented mono-, di-, tri- and tetrasaccharides [e.g., 204.1, 366.1, 657.2, 803.3 Da]). This feature makes glycopeptides easy to spot in CID MS/MS spectra and facilitates identification of their elution position in the corresponding chromatographic separation, providing the masses have been selected for MS/MS. The above-mentioned software tools such as GlycoMiner use this approach or the researcher can identify glycopeptides manually using these criteria. However, often glycopeptide masses are not selected for MS/MS due to their (usually) lower intensity. Different glycan structures on the same peptide backbone divide the possible glycopeptide signal into several molecular species of different mass. The presence of glycopeptides can also be indicated in a total mass spectrum as a ladder of masses with steps based on specific monosaccharide differences (e.g., m/z 146 [Fucose], 162 [Hexose], 203 [N-acetylhexosamine], 291 [N-acetyl neuraminic acid]) between the different glycoforms. CID-MS/MS fragments, if MS/MS is triggered, can thus locate the glycopeptide parent masses but does not easily give information on the heterogeneity of glycosylation, the range of peptide glycoforms or the relative glycan occupation of the same peptide. GlycoSpectrumScan functionality is based on the MS data from either LC-ESI or MALDI of LC separated or enriched glycopeptides. The mass spectra are acquired over the elution time of an identified or potential glycopeptide and are summed, smoothed and centroided before the m/z vs intensity spectrum is submitted to the software for analysis. To increase signal-to-noise ratio for low intensity glycoforms, it is recommended to sum the MS-spectra only over the respective elution peak rather than over a wider chromatographic time frame. N-Glycosylated Peptides in Secretory Component of Human Colostrum sIgA. The secretory component of sIgA separates as an 80 kDa band on SDS-PAGE (Figure 1) and has 7 reported N-glycosylation sites (Asn83, Asn90, Asn135, Asn186, Asn421, Asn469 and Asn499). The in silico trypsin digest including missed cleavages and modifications of the secretory component protein produced the potential glycopeptide sequences containing these sites (Table 1). MS/MS scanning of the LC-MS data for sugar fragment ions located several LCpeaks containing potential glycopeptides. MS scans over these respective elution times were summed and m/z and peak intensity values were submitted to GlycoSpectrumScan. The algorithm takes each glycan compositional mass that was characterized to be present on the protein (Table 2) and combines it with every individual submitted peptide mass (Table 1). The resultant glycopeptide masses found in the spectrum are labeled in different colors (corresponding to the charge states, Figures 4, 5 and 6 and Supporting Information). A threshold, depending on the signal-to-noise intensity of a particular spectrum can be manually entered to eliminate false positive hits in the base level noise. Families of glycopeptides, Journal of Proteome Research • Vol. 9, No. 2, 2010 1069

research articles

Deshpande et al.

Figure 5. N-Glycan distribution on Asn49 of sIgA Joining Chain. Green bars represent the glycan distribution as found on peptide 48-58, where the abundant glycoforms do not contain any core R1,6 fucose (as determined separately by carbon LC-ESI-MSMS, see Supporting Information). Red bars: glycan distribution on peptide 40-58 with one missed cleavage on Arg47. The majority of Nglycans attached to this peptide contain core R1,6 fucose, indicating an influence of core fucosylation on tryptic digestion of this protein.

differing by single monosaccharide masses, are often found and increase the confidence of assigning glycopeptide signals. Using GlycoSpectrumScan, the potential glycopeptide masses were quickly assigned, including the relative glycan distribution on each site (Figure 4, Table 3 and Supporting Information). Out of seven potential N-glycosylation sites on the secretory component, four singly glycosylated peptides (Asn 186, 421, 469, 499) and one doubly glycosylated peptide (Asn 83 + 90) were found. Previous studies have shown the remaining site, Asn135, as being glycosylated,26,27 however in the course of this study no glycopeptides containing this site, nor any mass corresponding to the unglycosylated form of peptide 118-138, could be detected. In general, the major glycan structures determined by the global glycoprofiling of the intact secretory component protein (Table 2) were found on each of the identified glycopeptide sites. However, some sites showed different relative glycan abundances for particular structures (Table 3). Asn421 contained a significantly higher amount (38%) of sialylated structures compared to the other sites, whereas Asn499 showed a considerable percentage (27%) of hybrid type structures and mainly singly fucosylated structures (Figure 4), whereas multiply fucosylated glycans were identified on most of the other sites (Table 3). As GlycoSpectrumScan has been developed to handle multiply glycosylated peptides, it was possible to obtain an overall monosaccharide composition profile for the doubly N-glycosylated peptide containing both Asn83 and Asn90 (Table 3). The tryptic in silico digest results in two glycosylated peptides of 82-107 and 82-109, respectively (the latter with 1 missed cleavage at Arg107; 82ANLTNFPEN GTFVVNIAQL SQDDSGRYK109). Usually missed cleavages result in different peptide masses, however in this particular case, the mass of the additional two amino acids YK (∆M ) 291.1583 Da) very closely resembles the mass difference introduced by the addition of one NeuAc (∆M ) 291.0954 Da). Discrimination of this differ1070

Journal of Proteome Research • Vol. 9, No. 2, 2010

ence on a complex molecule is beyond the accuracy of most mass spectrometers. Therefore we were only able to assign a combined site monosaccharide composition to this doubly glycosylated peptide. Use of different proteases may allow separate analysis of these two sites.19 N-glycosylated peptides of human colostrum IgA1 and IgA2. The heavy chain band at ∼60 kDa comprised both IgA1 (P01876) and IgA2 (P01877). The extracted ion chromatogram areas of two very similar tryptic peptides from IgA1 and IgA2 (IgA1: DASGVTFTWTPSSGK, [MH]+ ) 1540.73 Da; IgA2: DASGATFTWTPSSGK, [MH]+ ) 1512.70 Da) estimated that the IgA1: IgA2 ratio in the sample was approximately 4:1. IgA1 and IgA2 differ significantly in the number and types of glycosylation present.18,20 IgA1 contains two N-linked sites as well as a heavily and well studied O-glycosylated hinge region, whereas in IgA2 the hinge region is drastically reduced and not O-glycosylated, but instead has 4 N-linked sites (Table 1). Both share one N-linkedglycopeptide(LSLHRPALEDLLLGSEANLT(CysCAM)TLTGLR, [MH]+ ) 2963.60 Da) and therefore any exact assignment to one or the other IgA molecule without prior separation is not possible for this site. It should be noted that Picariello et al. (2008) identified a possible fifth N-glycosylation site in IgA2 on Asn92 in the 92NPSQ95 sequon.27 This motif is usually unlikely to be glycosylated because of the proline in the N-X-S sequon and in our study we only detected the unglycosylated form of this peptide (Table 4). The Asn92 glycosylation site was suggested27 because of the 1 Da mass increase of the deglycosylated peptide containing this site after PNGase F digestion, and needs to be confirmed. This mass change is also produced by deamidation as well as by conversion of Asn to Asp by PNGase F deglycosylation. Using GlycoSpectrumScan by entering the N-glycan compositional analysis of IgA1 +IgA2 (Table 2) and the peptides containing the NXS/T (X*P) motif (Table 1) we were able to quickly identify and characterize the glycan heterogeneity on unique N-glycosylated sites of IgA1 and of IgA2, as well as the

Fishing for Glycopeptides

research articles

Figure 6. Hinge region glycopeptide of IgA1 from human colostrum. Top spectrum: summed and smoothed mass spectra over the elution time of peptide 89-126 containing up to 5 O-glycosylation sites with 10 possible glycan compositions obtained from LC-ESI-MS analysis. Bottom spectrum: Glycopeptides identified by GlycoSpectrumScan after submission of this mass spectrum. All but 4 signals (yellow labeled in top spectrum) out of 47 assigned by manual annotation were correctly identified by the program.

glycopeptide common to both (Table 4). The unique Nglycopeptides of IgA2 were not detected, possibly because of the lower overall abundance (∼20%) of IgA2 in the sample. There was a clear difference in terms of structures found on Asn144 (or Asn131 in IgA2) compared to the structures found on Asn205 of IgA2 and Asn340 of IgA1 (Table 4). About threequarters of the Asn144 peptide contained a bisecting biantennary agalacto structure with no fucosylation whereas the structures identified on the other two detected sites of IgA1 and IgA2, respectively, contained mainly bisected, fucosylated N-glycans and considerable amounts of oligomannosidic Nglycans in the case of IgA1 Asn340 (Table 4 and Supporting Information). Table 4 clearly shows that the overall glycoprofile of a protein is not equally distributed on the various glycosylation sites.

N-Glycosylated Peptides in J-Chain. The J-chain of sIgA contains a single N-glycosylation site (Asn49). Nevertheless, two glycopeptides were detected in the sample (Table 1). With GlycoSpectrumScan we identified that the peptides 48-58 and the missed cleavage peptide 40-58 were glycosylated, with the latter having a higher signal intensity. Glutamic acid in position 48 may contribute to this missed cleavage since trypsin is known to be less efficient with an acidic amino acid in position P1′.28 Surprisingly, the two glycopeptides exhibited different glycoprofiles on Asn49. The tryptic peptide 48-58 was glycosylated with compositions corresponding to mainly biantennary, non fucosylated structures with one and two NeuAc residues. The peptide with one missed cleavage 40-58 however was found with core fucosylated, biantennary structures attached as the major components (Figure 5). The glycosylation Journal of Proteome Research • Vol. 9, No. 2, 2010 1071

research articles

Deshpande et al.

Table 3. N-Linked Site Occupancy of Glycopeptides of sIgA Secretory Component As Determined by GlycoSpectrumScan using the Experimentally Determined Oligosaccharide Compositions Showna Asn83+Asn90 Hex

HexNAc

Fuc

4 5 5 6 5 6 5 5 5 9 10 10 10 10 10 10 10 10 10

3 3 4 3 4 3 4 4 4 7 8 8 7 8 8 8 8 8 8

1 1

NeuAc

82-109

Asn135

Asn186

Asn421

168-190

413-434

Not detected

1 1 2 2 3 2 3 1 2 4 3 4 3 4 5 4

457-479

498-515

Global N-glycan analysis

11.0

6.1 7.7 3.6 18.9 23.2 7.8 25.8 5.1 1.8

0.3 1.3 1.8 3.0 19.4 0.8 34.6 17.5 5.3

Asn469 466-479

Asn499

3.2 13.0

3.3 55.1 35.4 6.1

1

24.2 37.7 38.2

43.7 36.6 3.4

46.9 42.1

2.6 5.2 15.6 4.2 21.8 21.3 15.9 8.1 3.3 2.0

1 1 1 2

a Numbers correspond to relative amounts [%] of each peptide glycoform detected by GSS. Identified signals of all glycopeptide charge states were used in the calculation and relative ratios of each peptide glycoform were obtained as described in the text.

Table 4. N-Linked Site Occupancy of Glycopeptides of sIgA, IgA1 and IgA2 as determined by GlycoSpectrumScan using the Experimentally Determined Oligosaccharide Compositions Showna

Hex

HexNAc

Fuc

NeuAc

IgA1

IgA2

IgA1

IgA2

Asn144

Asn131

Asn340

Asn205

127-153

5

2

6 3 4 3 7 3 4 4 5 5 4 5

2 4 4 5 2 5 5 5 5 5 5 5

114-140

332-353

4.1

9.4

3.3

22.7

200-208

1

7.4 2.5 77.2 4.1

1 8.8 1 1 1 1

IgA2

14.5 13.4 31.3 4.4 4.3

1 1

54.7 22.2 4.7 3.4 7.7

IgA2

IgA2

b

Asn47

Asn92

Asn327

Not detected

This site has just been detected unglycosylatedb

Not detected

Global N-glycan analysis

1.0

1.0 0.1 0.8 25.6 1.1 23.6 9.3 13.3 0.9 1.1 2.8 1.3

a Numbers correspond to relative amounts [%] of each peptide glycoform detected by GlycoSpectrumScan. Identified signals of all glycopeptide charge states were used in the calculation and relative ratios of each peptide glycoform were obtained as described in the text. b IgA2 Asn92 (92NPSQ95) does not contain the N-X-S/T (X*P) N-glycosylation sequon motif.

site Asn49 is located in position P2′ from the tryptic cleavage site and our data indicates that core fucose on the N-linked oligosaccharides at this site may have inhibited the trypsin cleavage, resulting in two peptides with distinct glycoprofiles at the same glycosylation site (Figure 5). Combining the data on the glycans attached on both glycopeptides is in good agreement with the overall glycan analysis obtained for the J-chain. IgA1 Hinge Region O-Glycopeptide Heterogeneity. The IgA1 hinge region O-glycopeptide of sIgA presents an extreme example to test GlycoSpectrumScan for its ability to assign the heterogeneity of a multiply glycosylated O-linked glycopeptide. Five O-glycosylation sites are present on the one tryptic peptide (Ser105, Ser111, Ser113, Ser119 and Ser121)20 from His89 to 1072

Journal of Proteome Research • Vol. 9, No. 2, 2010

Arg126 (Table 1). Each site may have no glycosylation or any one of the 10 O-linked glycans characterized (Supplementary Table 2, Supporting Information). Both core 1 and 2 type structures were identified with different degrees of extension, fucosylation and sialylation. The O-glycopeptide was eluted between 17-19 min on the LC-ESI Q-TOF base peak chromatograms with g3+ charged signals in the m/z range g1100 Da. The software MaxENT3 had trouble deisotoping and deconvoluting this complex spectrum successfully as evidenced by several “ghost-peaks” found in the output spectrum. Instead, the raw data was smoothed to obtain single peaks mirroring the average masses, centroided and the spectra then submitted to GlycoSpectrumScan containing the multiply charged ions. However, it needs to be mentioned that

research articles

Fishing for Glycopeptides Table 5. Total Oligosaccharide Compositions Identified on the Heavily O-Glycosylated Hinge Region Glycopeptide from IgA1 using GlycoSpectrumScan Mcalc

Hex

HexNAc

5071.4 5274.6 5436.8 5524.8 5598.9 5640.0 5728.0 5745.1 5802.1 5816.1 5843.2 5890.2 5964.3 6005.3 6019.3 6093.4 6108.4 6167.5 6181.4 6198.5 6222.5 6255.5 6296.6 6384.6 6401.7 6458.7 6472.7 6546.8 6587.8 6675.9 6750.0 6764.0 6838.0 6967.1 7129.3

2 2 3 3 4 3 3 4 4 3 3 4 5 4 3 4 3 5 4 5 3 5 4 4 5 5 4 5 4 4 5 4 5 4 5

3 4 4 3 4 5 4 4 5 3 6 4 5 6 4 5 3 6 4 4 5 5 6 5 5 6 4 5 6 5 6 4 5 5 5

Fuc

NeuAc

1

1 1 2 1

2

1

1

2 1 2 2 1 2 1 1 2 1 1 3 2 2 3 2 4 3 4 4

rel. abundance [%]

1.1 1.2 4.5 2.4 6.7 3.7 3.8 0.6 6.7 3.1 1.1 6.6 3.7 2.1 2.6 6.9 2.3 1.4 5.9 0.8 1.6 4.0 1.6 5.7 0.9 1.1 5.3 2.6 1.0 3.1 0.6 2.2 1.5 1.2 0.7

mass differences of 1 Da (e.g., difference of one NeuAc vs two Fuc) are difficult to assign at this level of complexity if both NeuAc and Fuc are present and distributed over several sites on a single peptide. Submission of the 10 possible O-glycan compositions and the 5 possible glycosylation sites on this single peptide to GlycoSpectrumScan resulted in the output reflecting the number of Hex, HexNAc, NeuAc and Fuc residues present on the one glycopeptide (His89-Arg126, Table 5, Figure 6). With this complexity, site-specific assignments are not possible but a broad overview of the compositions present on the peptide is obtained (Figure 6 and Table 5). GlycoSpectrumScan identified 43 out of 47 signals identified as glycopeptides by detailed manual annotation. This might be due to the possible presence of a single N-acetylgalactosamine residue on some of the sites. This single monosaccharide is not detected by porous graphitized carbon LC-ESI-MS/MS profiling of the O-glycans and was therefore not included in the glycan composition list submitted to GlycoSpectrumScan. Inclusion of this single HexNAc results in assignment of these peaks. A HUPO Glycomics Initiative multiple laboratory comparison study on the hinge region O-glycopeptide of plasma IgA1s showed that the assignment of the oligosaccharide compositions attached to this glycopeptide is able to monitor the changes on the glycopeptide in different disease states (Wada

et al., in press). GlycoSpectrumScan is a tool facilitating the data interpretation of these complex glycopeptides.

Discussion Analysis of glycopeptides is still far from being easily incorporated into automated proteomic workflows. Glycopeptides are usually detected in higher m/z ranges and occur as a heterogeneous mixture of signals which result in lower signal intensities and preclude their selection for automated tandem MS experiments. As a consequence, these crucial components of glycoproteins are frequently overlooked in proteomic studies. Two methods are typically used for their detection in a tryptic digest: (1) MS/MS spectra give diagnostic fragment ions represented by typical monosaccharide building block masses and (2) in MS spectra glycopeptides often appear as a ladder of peaks differing by the specific monosaccharide masses. These techniques are often dependent on manual identification and expert analysis, and rarely are able to characterize the range of glycan structures present at one or more sites on a peptide. Glycosylation occupancy on large multiply glycosylated peptides is even more difficult to analyze because of the size, charge state and the heterogeneity of the parent ions. With GlycoSpectrumScan we present an approach for identifying and characterizing glycopeptides from MS spectra of protease digested proteins. This freely accessible web based tool aims to support scientists in the analysis of the overall site heterogeneity of glycopeptides and its workflow can be easily incorporated into a proteomics laboratory. GlycoSpectrumScan requires three major input parameters: (1) information on the exact glycan compositions found on the protein of interest (preferentially acquired on the same sample by the glycoanalytical method of choice or can be obtained from previous publications), (2) the potential glycopeptide sequences or masses (predicted in silico after initial protein identification) and (3) a mass spectrum of a protease digest obtained from any mass spectrometer. The most extensive coverage is achieved, in our experience, from separation of the glycopeptides by LC-MS. After data submission to GlycoSpectrumScan the glycopeptides are identified in the spectrum and color highlighted with the glycosylation mass determined at each N- or O-linked site. Relative quantitation of glycans at each site on the resultant glycopeptides is also given, and user input is minimized to the deselection of false positives. GlycoSpectrumScan allows the list of detected glycopeptide compositions, with their calculated relative abundance based on the signal intensities of the mass spectrum, to be easily downloaded. All m/z signals from all charge states identified for the same glycopeptide composition are incorporated into this calculation. A general concern for relative quantitation by mass spectrometry is the variation of ionization efficiencies. Glycopeptides often have a mixture of neutral and negatively charged attached oligosaccharides, possibly influencing ionization and detection of glycopeptides, although there is good correlation in relative signal intensities of released neutral and sialylated N-glycans in positive mode LC-ESI-MS when compared to traditional LC-methods.23 Glycopeptides analyzed by MALDI TOF MS usually require specific enrichment and ionization of sialylated glycopeptides can be suppressed if matrices and sample preparation conditions are not optimized. Furthermore, different instruments and instrument parameters might also lead to variable in-source fragmentation, where more labile components such as glycopeptides might be partially fragmented. Care should be taken when using the signal intensities Journal of Proteome Research • Vol. 9, No. 2, 2010 1073

research articles depicted in a mass spectrum to calculate relative glycopeptide abundances, but at this stage it is, at the least, a reflection of the glycopeptide isoform distribution in a sample. GlycoSpectrumScan reduces analysis time of complex glycopeptide/peptide samples significantly. As a validation of the tool, glycopeptide data is presented on four differently glycosylated proteins comprising sIgA. Peptides with one and two N-glycosylation sites as well as a peptide with 5 O-glycosylation sites were assigned with the relative abundance of each of the known oligosaccharides attached to the relevant protein at each site. As a first step, a comprehensive analysis of the released glycans gives a global overview of the glycan structures present on each intact protein, and is the most efficient method to determine the structural details of sequence and branching. The second step of entering only the peptides which have the relevant sequon or amino acid that is able to be glycosylated ensures that only the masses of possible glycopeptides are in the output for each protein. Our results on the J-chain protein of sIgA clearly indicate that the type of N-glycosylation at a site can affect the cleavage by proteolytic enzymes, with core fucosylation appearing to inhibit tryptic digestion at a proximal site. This suggests that core fucose may cause steric hindrance to protease activity when cleavage sites are in close proximity to a glycosylation site. In vitro this emphasizes the need for the selection of the right protease(s) prior to analysis to avoid bias, in vivo this function of a specific glycan structure may be crucial for the stability of the molecule. Carbohydrate deficient transferrin has been shown to be more susceptible to chymotrypsin digestion if one or both glycosylation sites are not occupied, whereas normally glycosylated transferrin was hardly digested by chymotrypsin.29 In the secretory component of sIgA, all but one glycosylation site was identified, with different degrees of sialylation on each site. The secretory component showed a high degree of fucosylation with Lewis type fucose and core fucose on most of the detected sites, whereas the IgAs essentially lack Lewis type fucose. The secretory component is synthetised by epithelial cells, whereas IgA and J-chain are essentially produced in plasma cells.18 Therefore these differences might reflect the different cellular locations of the synthesis of these molecules. The two N-glycosylation sites of IgA1 showed a very different glycoprofile. Whereas Asn144 carried compositions essentially of non fucosylated, bisected N-glycans, Asn340 contained a mixture of oligomannosidic and mainly fucosylated, bisected structures. Any potential biological impact of these observed structural differences at the two N-glycosylation sites on IgA1 remains to be clarified. The glycosylation of the hinge region O-glycopeptide of plasma IgA1 has been associated with disease. Renfrow et al. (2007) profiled the tryptic peptide using FT-MS in the context of studying IgA nephropathy-related glycosylation changes30 and disease related differences in plasma IgA1 was the subject of a multi laboratory comparison study organized by the HUPO Glycomics initiative (Wada et al., in press). With GlycoSpectrumScan it was possible to identify and profile the most likely glycan compositions present on this multiply O-glycosylated peptide. The determination of the heterogeneity of glycosylation at each individual site in the sequence of these mucintype domains still presents a challenge to the mass spectrometrist (Christiansen et al, submitted for publication). This paper describes and validates a bioinformatic tool for the characterization of the site glycosylation on peptides using the data from the mass spectra of protease digests of glyco1074

Journal of Proteome Research • Vol. 9, No. 2, 2010

Deshpande et al. proteins. GlycoSpectrumScan computes the masses of all possible glycopeptides based on knowledge of the global glycoprofile of the glycans on the identified intact protein. If present, these actual glycopeptides are highlighted visually in the experimental mass spectrum and the output lists the relative heterogeneity at each glycosylation site on the peptide. The tool can be used on data from any mass spectrometer platform and was validated by the LC-ESI-MS analysis of the four differently glycosylated proteins comprising the sIgA complex of human colostrum. It is freely available on the web at www.glycospectrumscan.org. Abbreviations: PTM, post-translational modifications; sIgA, secretory immunoglobulin A; PNGase F, peptide-N-glycosidase F; MS, mass spectrometry; MS/MS, tandem mass spectrometry; CID, collision-induced dissociation; LC, liquid chromatography; ESI, electrospray ionization; MALDI, matrix assisted laser desorption/ionization; PVDF, polyvinylidene fluoride; SDSPAGE, sodium dodecyl sulfate polyacrylamide gel electrophoresis; HexNAc, N-acetylhexosamine; Hex, hexose; Fuc, fucose; NeuAc, N-acetyl neuraminic acid; HILIC, hydrophilic interaction chromatography; CE, capillary electrophoresis; IEF, isoelectric focusing.

Acknowledgment. We thank Macquarie University for the award of iMURS research scholarships to ND. PHJ was supported by the Danish Agency for Science, Technology and Innovation (grant 272-07-0066). DK was supported by an Erwin Schro¨dinger Fellowship from the Austrian Science Fund (grant J2661) and Macquarie University. Supporting Information Available: Supplementary Tables and Figure. GlycoSpectrumScan output data. This material is available free of charge via the Internet at http:// pubs.acs.org. References (1) Apweiler, R.; Hermjakob, H; N., S. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim. Biophys. Acta 1999, 1473 (1), 4–8. (2) Lee, J.; Park, J. S.; Moon, J. Y.; Kim, K. Y.; Moon, H. M. The influence of glycosylation on secretion, stability, and immunogenicity of recombinant HBV pre-S antigen synthesized in Saccharomyces cerevisiae. Biochem. Biophys. Res. Commun. 2003, 303 (2), 427– 32. (3) Dwek, M. V.; Ross, H. A.; Leathem, A. J. Proteome and glycosylation mapping identifies post-translational modifications associated with aggressive breast cancer. Proteomics 2001, 1 (6), 756–62. (4) Freeze, H. H. Update and perspectives on congenital disorders of glycosylation. Glycobiology 2001, 11 (12), 129R–43R. (5) Kolarich, D.; Weber, A.; Pabst, M.; Stadlmann, J.; Teschner, W.; Ehrlich, H.; Schwarz, H. P.; Altmann, F. Glycoproteomic characterization of butyrylcholinesterase from human plasma. Proteomics 2008, 8 (2), 254–63. (6) Kolarich, D.; Weber, A.; Turecek, P. L.; Schwarz, H. P.; Altmann, F. Comprehensive glyco-proteomic analysis of human alpha1antitrypsin and its charge isoforms. Proteomics 2006, 6 (11), 3369– 80. (7) Kolarich, D.; Altmann, F.; Sunderasan, E. Structural analysis of the glycoprotein allergen Hev b 4 from natural rubber latex by mass spectrometry. Biochim. Biophys. Acta 2006, 1760, 715–20. (8) Gerken, T. A.; Gilmore, M.; Zhang, J. Determination of the sitespecific oligosaccharide distribution of the O-glycans attached to the porcine submaxillary mucin tandem repeat. Further evidence for the modulation of O-glycans side chain structures by peptide sequence. J. Biol. Chem. 2002, 277 (10), 7736–51. (9) Schulz, B. L.; Packer, N. H.; Karlsson, N. G. Small-scale analysis of O-linked oligosaccharides from glycoproteins and mucins separated by gel electrophoresis. Anal. Chem. 2002, 74 (23), 6088–97. (10) Hernandez-Borges, J.; Neususs, C.; Cifuentes, A.; Pelzing, M. Online capillary electrophoresis-mass spectrometry for the analysis of biomolecules. Electrophoresis 2004, 25 (14), 2257–81.

research articles

Fishing for Glycopeptides (11) Cooper, C. A.; Gasteiger, E.; Packer, N. H. GlycoMod--a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics 2001, 1 (2), 340–9. (12) An, H. J.; Tillinghast, J. S.; Woodruff, D. L.; Rocke, D. M.; Lebrilla, C. B. A new computer program (GlycoX) to determine simultaneously the glycosylation sites and oligosaccharide heterogeneity of glycoproteins. J. Proteome Res. 2006, 5 (10), 2800–8. (13) Peltoniemi, H.; Joenvaara, S.; Renkonen, R. De novo glycan structure search with the CID MS/MS spectra of native Nglycopeptides. Glycobiology 2009, 19 (7), 707–14. (14) Ozohanics, O.; Krenyacz, J.; Ludanyi, K.; Pollreisz, F.; Vekey, K.; Drahos, L. GlycoMiner: a new software tool to elucidate glycopeptide composition. Rapid Commun. Mass Spectrom. 2008, 22 (20), 3245–54. (15) Joenvaara, S.; Ritamo, I.; Peltoniemi, H.; Renkonen, R. N-glycoproteomics - an automated workflow approach. Glycobiology 2008, 18 (4), 339–49. (16) Ren, J. M.; Rejtar, T.; Li, L.; Karger, B. L. N-Glycan structure annotation of glycopeptides using a linearized glycan structure database (GlyDB). J. Proteome Res. 2007, 6 (8), 3162–73. (17) Goldberg, D.; Bern, M.; Parry, S.; Sutton-Smith, M.; Panico, M.; Morris, H. R.; Dell, A. Automated N-glycopeptide identification using a combination of single- and tandem-MS. J. Proteome Res. 2007, 6 (10), 3995–4005. (18) Royle, L.; Roos, A.; Harvey, D. J.; Wormald, M. R.; van GijlswijkJanssen, D.; Redwan el, R. M.; Wilson, I. A.; Daha, M. R.; Dwek, R. A.; Rudd, P. M. Secretory IgA N- and O-glycans provide a link between the innate and adaptive immune systems. J. Biol. Chem. 2003, 278 (22), 20140–53. (19) Hughes, G. J.; Reason, A. J.; Savoy, L.; Jaton, J.; Frutiger-Hughes, S. Carbohydrate moieties in human secretory component. Biochim. Biophys. Acta 1999, 1434 (1), 86–93. (20) Putnam, F. W.; Liu, Y. S.; Low, T. L. Primary structure of a human IgA1 immunoglobulin. IV. Streptococcal IgA1 protease, digestion, Fab and Fc fragments, and the complete amino acid sequence of the alpha 1 heavy chain. J. Biol. Chem. 1979, 254 (8), 2865–74. (21) Jensen, P. H.; Karlsson, N. G.; Kolarich, D.; Packer, N. H. Same but different: separation of released N- and O-glycans. Methods Mol. Biol in press.

(22) Wilson, N. L.; Schulz, B. L.; Karlsson, N. G.; Packer, N. H. Sequential analysis of N- and O-linked glycosylation of 2D-PAGE separated glycoproteins. J. Proteome Res. 2002, 1 (6), 521–9. (23) Pabst, M.; Bondili, J. S.; Stadlmann, J.; Mach, L.; Altmann, F. Mass + retention time ) structure: a strategy for the analysis of N-glycans by carbon LC-ESI-MS and its application to fibrin N-glycans. Anal. Chem. 2007, 79 (13), 5051–7. (24) Castellanos-Serra, L.; Proenza, W.; Huerta, V.; Moritz, R. L.; Simpson, R. J. Proteome analysis of polyacrylamide gel-separated proteins visualized by reversible negative staining using imidazolezinc salts. Electrophoresis 1999, 20 (4-5), 732–7. (25) Wilkins, M. R.; Lindskog, I.; Gasteiger, E.; Bairoch, A.; Sanchez, J. C.; Hochstrasser, D. F.; Appel, R. D. Detailed peptide characterization using PEPTIDEMASS--a World-Wide-Web-accessible tool. Electrophoresis 1997, 18 (3-4), 403–8. (26) Eiffert, H.; Quentin, E.; Decker, J.; Hillemeir, S.; Hufschmidt, M.; Klingmuller, D.; Weber, M. H.; Hilschmann, N. [The primary structure of human free secretory component and the arrangement of disulfide bonds]. Hoppe Seylers Z. Physiol. Chem. 1984, 365 (12), 1489–95. (27) Picariello, G.; Ferranti, P.; Mamone, G.; Roepstorff, P.; Addeo, F. Identification of N-linked glycoproteins in human milk by hydrophilic interaction liquid chromatography and mass spectrometry. Proteomics 2008, 8 (18), 3833–47. (28) Keil, B. Specificity of proteolysis; Springer-Verlag: NewYork: 1992. (29) Valmu, L.; Kalkkinen, N.; Husa, A.; Rye, P. D. Differential Susceptibility of Transferrin Glycoforms to Chymotrypsin: A Proteomics Approach to the Detection of Carbohydrate-Deficient Transferrin. Biochemistry 2005, 44 (49), 16007–13. (30) Renfrow, M. B.; Mackay, C. L.; Chalmers, M. J.; Julian, B. A.; Mestecky, J.; Kilian, M.; Poulsen, K.; Emmett, M. R.; Marshall, A. G.; Novak, J. Analysis of O-glycan heterogeneity in IgA1 myeloma proteins by Fourier transform ion cyclotron resonance mass spectrometry: implications for IgA nephropathy. Anal. Bioanal. Chem. 2007, 389 (5), 1397–407.

PR900956X

Journal of Proteome Research • Vol. 9, No. 2, 2010 1075

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.