Comprehensive molecular profiling of lung adenocarcinoma

Share Embed


Descrição do Produto

ARTICLE

OPEN doi:10.1038/nature13385

Comprehensive molecular profiling of lung adenocarcinoma The Cancer Genome Atlas Research Network*

Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.

Lung cancer is the most common cause of global cancer-related mortality, leading to over a million deaths each year and adenocarcinoma is its most common histological type. Smoking is the major cause of lung adenocarcinoma but, as smoking rates decrease, proportionally more cases occur in never-smokers (defined as less than 100 cigarettes in a lifetime). Recently, molecularly targeted therapies have dramatically improved treatment for patients whose tumours harbour somatically activated oncogenes such as mutant EGFR1 or translocated ALK, RET, or ROS1 (refs 2–4). Mutant BRAF and ERBB2 (ref. 5) are also investigational targets. However, most lung adenocarcinomas either lack an identifiable driver oncogene, or harbour mutations in KRAS and are therefore still treated with conventional chemotherapy. Tumour suppressor gene abnormalities, such as those in TP53 (ref. 6), STK11 (ref. 7), CDKN2A8, KEAP1 (ref. 9), and SMARCA4 (ref. 10) are also common but are not currently clinically actionable. Finally, lung adenocarcinoma shows high rates of somatic mutation and genomic rearrangement, challenging identification of all but the most frequent driver gene alterations because of a large burden a

Male

Female

NA

Ever-smoker

We analysed tumour and matched normal material from 230 previously untreated lung adenocarcinoma patients who provided informed consent (Supplementary Table 1). All major histologic types of lung adenocarcinoma were represented: 5% lepidic, 33% acinar, 9% papillary, 14% micropapillary, 25% solid, 4% invasive mucinous, 0.4% colloid and 8% unclassifiable adenocarcinoma (Supplementary Fig. 1)14. Median follow-up was 19 months, and 163 patients were alive at the time of last follow-up. Eighty-one percent of patients reported past or present smoking. Supplementary Table 2 summarizes demographics. DNA, RNA and protein were extracted from specimens and quality-control assessments were performed as described previously15. Supplementary Table 3 summarizes molecular estimates of tumour cellularity16.

Transversion high Number of mutations 150 100 50 0

Never-smoker 46 33 17 17 14 11 10 9 8 8 7 7 7 6 4 4 3 2

Missense Nonsense

Splice site In-frame indel

c

Males Number of mutations 60 40 20 0

Transitions

Females Number of mutations 0 20 40 60 EGFR STK11 SMARCA4 RBM10

Frameshift

Q < 0.05 P < 0.05 Transversions

Transversion low Number of mutations 0 20 40 60 TP53 KRAS EGFR STK11 KEAP1 NF1 SMARCA4 RBM10 PIK3CA RB1 U2AF1 ERBB2

Frequency (%)

TP53 KRAS KEAP1 STK11 EGFR NF1 BRAF SETD2 RBM10 MGA MET ARID1A PIK3CA SMARCA4 RB1 CDKN2A U2AF1 RIT1

Percentage

Clinical samples and histopathologic data

b

Gender Smoking status

100 80 60 40 20 0

of passenger events per tumour genome11–13. Our efforts focused on comprehensive, multiplatform analysis of lung adenocarcinoma, with attention towards pathobiology and clinically actionable events.

Missense Splice site Nonsense

Frameshift In-frame indel Other non-synonymous

Indels, other

Figure 1 | Somatic mutations in lung adenocarcinoma. a, Co-mutation plot from whole exome sequencing of 230 lung adenocarcinomas. Data from TCGA samples were combined with previously published data12 for statistical analysis. Co-mutation plot for all samples used in the statistical analysis (n 5 412) can be found in Supplementary Fig. 2. Significant genes with a corrected P value less than 0.025 were identified using the MutSig2CV algorithm and are ranked in order of decreasing prevalence. b, c, The differential patterns of mutation between samples classified as transversion high and transversion low samples (b) or male and female patients (c) are shown for all samples used in the statistical analysis (n 5 412). Stars indicate statistical significance using the Fisher’s exact test (black stars: q , 0.05, grey stars: P , 0.05) and are adjacent to the sample set with the higher percentage of mutated samples.

*A list of authors and affiliations appears at the end of the paper. 3 1 J U LY 2 0 1 4 | V O L 5 1 1 | N AT U R E | 5 4 3

©2014 Macmillan Publishers Limited. All rights reserved

RESEARCH ARTICLE MDM2, KRAS, EGFR, MET, CCNE1, CCND1, TERC and MECOM (Supplementary Table 6), as previously described24, 8q24 near MYC, and a novel peak containing CCND3 (Supplementary Table 6). The CDKN2A locus was the most significant deletion (Supplementary Table 6). Supplementary Table 7 summarizes molecular and clinical characteristics by sample. Low-pass whole-genome sequencing on a subset (n 5 93) of the samples revealed an average of 36 gene–gene and gene–inter-gene

a

Exon 13

Exon 20 EML4–ALK

6

20

6

20

EML4–ALK EML4–ALK 11

12 TRIM33–RET

1

12 CCDC6–RET

10

34 EZR–ROS1

6

34 CD74–ROS1 31

35 CLTC–ROS1

14

32–34 SLC34A2–ROS1 Portion of original transcripts not in fusion transcript: Normalized, exonic mRNA expression: Low

Intermediate (60–80% skipping)

Full (90–100% skipping)

199

0

0

1

0

High

TCGA-99-7458

Number of samples

29

0

1

TCGA-75-6205

0 11

TCGA-44-6775

0 27

1

0

5

1

0 Y1003*

None (0% skipping)

ss del

Exon 14 skipping

WT

b

Normalized RNA-seq read coverage

We performed whole-exome sequencing (WES) on tumour and germline DNA, with a mean coverage of 97.63 and 95.83, respectively, as performed previously17. The mean somatic mutation rate across the TCGA cohort was 8.87 mutations per megabase (Mb) of DNA (range: 0.5–48, median: 5.78). The non-synonymous mutation rate was 6.86 per Mb. MutSig2CV18 identified significantly mutated genes among our 230 cases along with 182 similarly-sequenced, previously reported lung adenocarcinomas12. Analysis of these 412 tumour/normal pairs highlighted 18 statistically significant mutated genes (Fig. 1a shows co-mutation plot of TCGA samples (n 5 230), Supplementary Fig. 2 shows co-mutation plot of all samples used in the statistical analysis (n 5 412) and Supplementary Table 4 contains complete MutSig2CV results, which also appear on the TCGA Data Portal along with many associated data files (https://tcga-data.nci.nih.gov/docs/publications/luad_2014/). TP53 was commonly mutated (46%). Mutations in KRAS (33%) were mutually exclusive with those in EGFR (14%). BRAF was also commonly mutated (10%), as were PIK3CA (7%), MET (7%) and the small GTPase gene, RIT1 (2%). Mutations in tumour suppressor genes including STK11 (17%), KEAP1 (17%), NF1 (11%), RB1 (4%) and CDKN2A (4%) were observed. Mutations in chromatin modifying genes SETD2 (9%), ARID1A (7%) and SMARCA4 (6%) and the RNA splicing genes RBM10 (8%) and U2AF1 (3%) were also common. Recurrent mutations in the MGA gene (which encodes a Max-interacting protein on the MYC pathway19) occurred in 8% of samples. Loss-of-function (frameshift and nonsense) mutations in MGA were mutually exclusive with focal MYC amplification (Fisher’s exact test P 5 0.04), suggesting a hitherto unappreciated potential mechanism of MYC pathway activation. Coding single nucleotide variants and indel variants were verified by resequencing at a rate of 99% and 100%, respectively (Supplementary Fig. 3a, Supplementary Table 5). Tumour purity was not associated with the presence of false negatives identified in the validation data (P 5 0.31; Supplementary Fig. 3b). Past or present smoking associated with cytosine to adenine (C .A) nucleotide transversions as previously described both in individual genes and genome-wide12,13. C . A nucleotide transversion fraction showed two peaks; this fraction correlated with total mutation count (R2 5 0.30) and inversely correlated with cytosine to thymine (C . T) transition frequency (R2 5 0.75) (Supplementary Fig. 4). We classified each sample (Supplementary Methods) into one of two groups named transversionhigh (TH, n 5 269), and transversion-low (TL, n 5 144). The transversionhigh group was strongly associated with past or present smoking (P , 2.2 3 10216), consistent with previous reports13. The transversion-high and transversion-low patient cohorts harboured different gene mutations. Whereas KRAS mutations were significantly enriched in the transversionhigh cohort (P 5 2.13 10213), EGFR mutations were significantly enriched in the transversion-low group (P 5 3.3 3 1026). PIK3CA and RB1 mutations were likewise enriched in transversion-low tumours (P , 0.05). Additionally, the transversion-low tumours were specifically enriched for in-frame insertions in EGFR and ERBB2 (ref. 5) and for frameshift indels in RB1 (Fig. 1b). RB1 is commonly mutated in small-cell lung carcinoma (SCLC). We found RB1 mutations in transversion-low adenocarcinomas were enriched for frameshift indels versus single nucleotide substitutions compared to SCLC (P , 0.05)20,21 suggesting a mutational mechanism in transversion-low adenocarcinoma that is probably distinct from smoking in SCLC. Gender is correlated with mutation patterns in lung adenocarcinoma22. Only a fraction of significantly mutated genes from the complete set reported in this study (Fig. 1a) were enriched in men or women (Fig. 1c). EGFR mutations were enriched in tumours from the female cohort (P 5 0.03) whereas loss-of-function mutations within RBM10, an RNA-binding protein located on the X chromosome23 were enriched in tumours from men (P 5 0.002). When examining the transversion-high group, 16 out of 21 RBM10 mutations were observed in males (P 5 0.003, Fisher’s exact test). Somatic copy number alterations were very similar to those previously reported for lung adenocarcinoma24 (Supplementary Fig. 5, Supplementary Table 6). Significant amplifications included NKX2-1, TERT,

ss mut

Somatically acquired DNA alterations

0 Y1003 13

14

15

MET mutations

c Observed splicing across all tumours (total events = 29,867)

*

Associated with U2AF1 S34F mutation (total events = 129; q value < 0.05 ) 0.0

0.2

0.4

0.6

1.0

0.8

Proportion

Cassette exon

Coordinate cassette exons

Mutually exclusive exon

*P < 0.001

Alternative 5′ splice site Alternative 3′ splice site Alternative first exon Alternative last exon

Figure 2 | Aberrant RNA transcripts in lung adenocarcinoma associated with somatic DNA translocation or mutation. a, Normalized exon level RNA expression across fusion gene partners. Grey boxes around genes mark the regions that are removed as a consequence of the fusion. Junction points of the fusion events are also listed in Supplementary Table 9. Exon numbers refer to reference transcripts listed in Supplementary Table 9. b, MET exon 14 skipping observed in the presence of exon 14 splice site mutation (ss mut), splice site deletion (ss del) or a Y1003* mutation. A total of 22 samples had insufficient coverage around exon 14 for quantification. The percentage skipping is (total expression minus exon 14 expression)/total expression. c, Significant differences in the frequency of 129 alternative splicing events in mRNA from tumours with U2AF1 S34F tumours compared to U2AF1 WT tumours (q value ,0.05). Consistent with the function of U2AF1 in 39 splice site recognition, most splicing differences involved cassette exon and alternative 39 splice site events (chi-squared test, P , 0.001).

5 4 4 | N AT U R E | V O L 5 1 1 | 3 1 J U LY 2 0 1 4

©2014 Macmillan Publishers Limited. All rights reserved

ARTICLE RESEARCH Candidate driver genes

rearrangements per tumour. Chromothripsis25 occurred in six of the 93 samples (6%) (Supplementary Fig. 6, Supplementary Table 8). Lowpass whole genome sequencing-detected rearrangements appear in Supplementary Table 9.

The receptor tyrosine kinase (RTK)/RAS/RAF pathway is frequently mutated in lung adenocarcinoma. Striking therapeutic responses are often achieved when mutant pathway components are successfully inhibited. Sixty-two per cent (143/230) of tumours harboured known activating mutations in known driver oncogenes, as defined by others30. Cancerassociated mutations in KRAS (32%, n 5 74), EGFR (11%, n 5 26) and BRAF (7%, n 5 16) were common. Additional, previously uncharacterized KRAS, EGFR and BRAF mutations were observed, but were not classified as driver oncogenes for the purposes of our analyses (see Supplementary Fig. 9a for depiction of all mutations of known and unknown significance); explaining the differing mutation frequencies in each gene between this analysis and the overall mutational analysis described above. We also identified known activating ERBB2 in-frame insertion and point mutations (n 5 5)6, as well as mutations in MAP2K1 (n 5 2), NRAS and HRAS (n 5 1 each). RNA sequencing revealed the aforementioned MET exon 14 skipping (n 5 10) and fusions involving ROS1 (n 5 4), ALK (n 5 3) and RET (n 5 2). We considered these tumours collectively as oncogene-positive, as they harboured a known activating RTK/RAS/ RAF pathway somatic event. DNA amplification events were not considered to be driver events before the comparisons described below. We sought to nominate previously unrecognized genomic events that might activate this critical pathway in the 38% of samples without a RTK/RAS/RAF oncogene mutation. Tumour cellularity did not differ between oncogene-negative and oncogene-positive samples (Supplementary Fig. 9b). Analysis of copy number alterations using GISTIC31 identified unique focal ERBB2 and MET amplifications in the oncogene-negative subset (Fig. 3a, Supplementary Table 6); amplifications in other wild-type proto-oncogenes, including KRAS and EGFR, were not significantly different between the two groups. We next analysed WES data independently in the oncogene-negative and oncogene-positive subsets. We found that TP53, KEAP1, NF1 and RIT1 mutations were significantly enriched in oncogene-negative tumours (P , 0.01; Fig. 3b, Supplementary Table 12). NF1 mutations have previously been reported in lung adenocarcinoma11, but this is the first study, to our knowledge, capable of identifying all classes of loss-of-function

Description of aberrant RNA transcripts Gene fusions, splice site mutations or mutations in genes encoding splicing factors promote or sustain the malignant phenotype by generating aberrant RNA transcripts. Combining DNA with mRNA sequencing enabled us to catalogue aberrant RNA transcripts and, in many cases, to identify the DNA-encoded mechanism for the aberration. Seventyfive per cent of somatic mutations identified by WES were present in the RNA transcriptome when the locus in question was expressed (minimum 53) (Supplementary Fig. 7a) similar to prior analyses15. Previously identified fusions involving ALK (3/230 cases), ROS1 (4/230) and RET (2/230) (Fig. 2a, Supplementary Table 10), all occurred in transversionlow tumours (P 5 1.85 3 1024, Fisher’s exact test). MET activation can occur by exon 14 skipping, which results in a stabilized protein26. Ten tumours had somatic MET DNA alterations with MET exon 14 skipping in RNA. In nine of these samples, a 59 or 39 splice site mutation or deletion was identified27. MET exon 14 skipping was also found in the setting of a MET Y1003* stop codon mutation (Fig. 2b, Supplementary Fig. 8a). The codon affected by the Y1003* mutation is predicted to disrupt multiple splicing enhancer sequences, but the mechanism of skipping remains unknown in this case. S34F mutations in U2AF1 have recently been reported in lung adenocarcinoma12 but their contribution to oncogenesis remains unknown. Eight samples harboured U2AF1S34F. We identified 129 splicing events strongly associated with U2AF1S34F mutation, consistent with the role of U2AF1 in 39-splice site selection28. Cassette exons and alternative 39 splice sites were most commonly affected (Fig. 2c, Supplementary Table 11)29. Among these events, alternative splicing of the CTNNB1 proto-oncogene was strongly associated with U2AF1 mutations (Supplementary Fig. 8b). Thus, concurrent analysis of DNA and RNA enabled delineation of both cis and trans mechanisms governing RNA processing in lung adenocarcinoma. a

b

10–16

0.6 Oncogene-positive

Oncogene-positive 0.5

Oncogene-negative

MET

10–4

Per cent mutated

FDR q

10–8 ERBB2

10–2

Oncogene-negative

0.4 0.3 0.2 0.1 0.0

c KRAS

Previously oncogene-negative (13%, n = 31) 32

EGFR

11

NF1

7 4 7

ERBB2

3

RIT1

2

2

NF1

RIT1 (2.2%) ERBB2 amp (0.9%) MET amp (2.2%) Frequency (%)

BRAF ROS1/ALK/RET MAP2K1 / HRAS / NRAS MET

11 Amplification

Fusion

Missense mutation

Exon skipping

In-frame indel

Nonsense mutation / frameshift indel / splice-site mutation

Figure 3 | Identification of novel candidate driver genes. a, GISTIC analysis of focal amplifications in oncogene-negative (n 5 87) and oncogene-positive (n 5 143) TCGA samples identifies focal gains of MET and ERBB2 that are specific to the oncogene-negative set (purple). b, TP53, KEAP1, NF1 and RIT1 mutations are significantly enriched in samples otherwise lacking oncogene mutations (adjusted P , 0.05 by Fisher’s exact test). c, Co-mutation plot of variants of known significance within the RTK/RAS/RAF pathway in lung

KEAP1

d

Chromosome Oncogene-positive (62%, n = 143)

TP53

14 15 16 17 18 19 20 21 22 X

13

12

11

9

10

8

7

6

5

4

3

2

1

0.1

RIT1

HRAS (0.4%) NRAS (0.4%) RET fusion (0.9%) MAP2K1 (0.9%) ALK fusion (1.3%) ROS1 fusion (1.7%) ERBB2 (1.7%) MET ex14 (4.3%)

NF1 (8.3%) BRAF (7.0%) EGFR (11.3%)

None (24.4%) KRAS (32.2%)

adenocarcinoma. Not shown are the 63 tumours lacking an identifiable driver lesion. Only canonical driver events, as defined in Supplementary Fig. 9, and proposed driver events, are shown; hence not every alteration found is displayed. d, New candidate driver oncogenes (blue: 13% of cases) and known somatically activated drivers events (red: 63%) that activate the RTK/RAS/RAF pathway can be found in the majority of the 230 lung adenocarcinomas.

3 1 J U LY 2 0 1 4 | V O L 5 1 1 | N AT U R E | 5 4 5

©2014 Macmillan Publishers Limited. All rights reserved

RESEARCH ARTICLE NF1 defects and to statistically demonstrate that NF1 mutations, as well as KEAP1 and TP53 mutations are enriched in the oncogene-negative subset of lung adenocarcinomas (Fig. 3c). All RIT1 mutations occurred in the oncogene-negative subset and clustered around residue Q79 (homologous to Q61 in the switch II region of RAS genes). These mutations transform NIH3T3 cells and activate MAPK and PI(3)K signalling32, supporting a driver role for mutant RIT1 in 2% of lung adenocarcinomas. This analysis increases the rate at which putative somatic lung adenocarcinoma driver events can be identified within the RTK/RAS/RAF pathway to 76% (Fig. 3d). a

EGFR 11% PTEN 3%

ERBB2 3%

MET 7%

ALK 1%

KRAS 32%

NRAS
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.