Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes--A Study of 130 Invasive Ductal Breast Carcinomas

Share Embed


Descrição do Produto

Research Article

Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas 1,2

1

1

3

Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, Anne Vincent-Salomon, 4 9 9 9 9 Yann de Rycke, Paul Elvin, Andrew Cassidy, Alexander Graham, Carolyn Spraggon, 1 5 2 6 7 Yoann De´sille, Alain Fourquet, Claude Nos, Pierre Pouillart, Henri Magdele´nat, 3 3 3 Dominique Stoppa-Lyonnet, Je´roˆme Couturier, Brigitte Sigal-Zafrani, 4 3 8 Bernard Asselain, Xavier Sastre-Garau, Olivier Delattre, 1,7 1 Jean Paul Thiery, and Franc¸ois Radvanyi 1

Unite´ Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2Surgery, 3Tumor Biology, 4Biostatistics, Radiotherapy, 6Medical Oncology, and 7Translational Research; and 8U509, Institut National de la Sante´ et de la Recherche Me´dicale, Institut Curie, Paris, France and 9Astra Zeneca, Alderley Park, United Kingdom 5

Abstract

Introduction

Completion of the working draft of the human genome has made it possible to analyze the expression of genes according to their position on the chromosomes. Here, we used a transcriptome data analysis approach involving for each gene the calculation of the correlation between its expression profile and those of its neighbors. We used the U133 Affymetrix transcriptome data set for a series of 130 invasive ductal breast carcinomas to construct chromosomal maps of gene expression correlation (transcriptome correlation map). This highlighted nonrandom clusters of genes along the genome with correlated expression in tumors. Some of the gene clusters identified by this method probably arose because of genetic alterations, as most of the chromosomes with the highest percentage of correlated genes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also the most frequent sites of genomic alterations in breast cancer. Our analysis showed that several known breast tumor amplicons (at 8p11-p12, 11q13, and 17q12) are located within clusters of genes with correlated expression. Using hierarchical clustering on samples and a Treeview representation of whole chromosome arms, we observed a higher-order organization of correlated genes, sometimes involving very large chromosomal domains that could extend to a whole chromosome arm. Transcription correlation maps are a new way of visualizing transcriptome data. They will help to identify new genes involved in tumor progression and new mechanisms of gene regulation in tumors. (Cancer Res 2005; 65(4): 1376-83)

Large-scale transcriptome analyses based on DNA microarrays have facilitated the classification of cancers into biologically distinct categories, some of which may explain the clinical behaviors of tumors (1–3). Such analyses may also help to find new prognostic and predictive markers (4, 5). Completion of the initial working draft of the human genome has made it possible to interpret transcriptome data in a new way, by directly assigning the genome-wide, high-throughput gene expression profiles to the human genome sequence (6). A few recent studies have explored the relationship between the transcriptome and the positions of genes on chromosomes. Using data from serial analysis of gene expression (SAGE) in a range of human normal and tumor tissues, Caron et al. (7) showed that highly expressed genes are often found in clusters in specific chromosomal regions, called regions of increased gene expression. By expressed sequence tag collection data analysis, Zhou et al. (8) have compared normal and tumor tissues in 10 different tissue types and showed clusters of genes exhibiting increased expression along chromosomes in tumors. Many of these genomic regions corresponded to known amplicons. By comparing serial analysis of gene expression data for normal bronchial epithelium, adenocarcinomas, and squamous cell carcinomas of the lung, Fujii et al. (9) identified clusters of overexpressed and underexpressed genes. They also showed that in squamous cell carcinomas of the lung, about half of these clusters were located in imbalanced chromosomal regions previously identified by cytogenetic, comparative genomic hybridization, or loss of heterozygosity studies. Two studies have directly explored the relationship between DNA copy number alterations and gene expression in breast tumors using cDNA microarrays and found that 40% to 60% of the highly amplified genes were overexpressed (10, 11). Therefore, clustering of overexpressed genes could be attributable, at least in part, to underlying gene copy number alterations. The different approaches mentioned above established a relationship between the transcriptome and the chromosomal location of the genes by analyzing the transcriptome considering the expression of each gene individually. The genes are subsequently grouped into chromosomal domains according to the expression data. A second approach to explore the relationship between the transcriptome and the organization

Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work. B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (see Acknowledgments). Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). The original figures of the article as well as the supplementary data can be found at http://microarrays.curie.fr/publications/ oncologie_moleculaire/breast_TCM/. Requests for reprints: Franc¸ois Radvanyi, Unite´ Mixte de Recherche 144, Institut Curie-CNRS, 26 rue dVUlm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax: 33-1-42-34-63-49; E-mail: [email protected]. I2005 American Association for Cancer Research.

Cancer Res 2005; 65: (4). February 15, 2005

1376

www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Transcriptome Correlation Map of Breast Carcinomas

of the genome was developed first on budding yeast transcriptome data (12) and subsequently on Drosophila transcriptome data (13). This transcriptome data analysis approach searches for groups of neighboring genes that show correlated expression profiles. Both groups identified chromosomal domains of similarly expressed neighboring genes. In normal cells, these chromosomal domains of co-regulated genes may represent chromatin-regulated regions or groups of genes regulated by the same transcription factor(s). We developed a similar computational method for the analysis of transcriptome data to identify chromosomal domains containing co-expressed genes in cancer and applied it to a series of 130 invasive ductal breast carcinomas.

Materials and Methods Patients and Breast Tumor Samples. We analyzed the gene expression profiles of 130 infiltrating ductal primary breast carcinomas. These carcinomas were obtained from 130 patients who were included in the prospective database initiated in 1981 by the Institut Curie Breast Cancer Group between 1989 and 1999. The flash-frozen tumor samples were stored at 80jC immediately after lumpectomy or mastectomy. All tumor samples contained >50% cancer cells, as assessed by H&E staining of histologic sections adjacent to the samples used for the transcriptome analysis. The clinical data for the patients and the histologic characteristics of the tumors are summarized in Supplementary Table S1. This study was approved by the institutional review boards of Institut Curie. RNA Extraction and Microarray Data Collection. RNA was extracted from all tumor samples by the cesium chloride protocol (14, 15). The concentration and the integrity/purity of each RNA sample were measured using RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA) and the Agilent 2100 bioanalyzer. The DNA microarrays used in this study were the Human Genome U133 set (HG-U133, Affymetrix, Santa Clara, CA), consisting of two GeneChip arrays (A and B), and containing almost 45,000 probe sets. Each probe set consisted of 22 different oligonucleotides (11 of which are a perfect match with the target transcript and 11 of which harbor a one-nucleotide mismatch in the middle). These 22 oligonucleotides were used to measure the level of a given transcript. Details of the RNA amplification, labeling, and hybridization steps are available from http:// www.affymetrix.com. Chips were scanned and the intensities for each probe set were calculated using Affymetrix MAS5.0 default settings. The mean intensity of the probe sets for each array was set to a constant target value (500) by linearly scaling the array signal intensities. Selection of Probe Sets: Attribution of a Unique Probe Set to Each Gene. The fact that some genes are represented by several probe sets could introduce artifacts when looking for neighboring genes with similar expression patterns because the intensities of these probe sets are highly correlated. To avoid this artifact, probe sets corresponding to uncharacterized expressed sequence tags were removed from the analysis, and when several probe sets corresponded to the same gene (i.e., if different probe sets had the same title, the same GenBank ID or belonged to the same UniGene Cluster), a single probe set was kept for the analysis. Probe sets with an ‘‘_at’’ extension were preferentially kept because they tend to be more specific according to Affymetrix probe set design algorithms. Probe sets with an ‘‘_s_at’’ extension were the second best choice followed by all other extensions. When several probe sets with the same extension were available for one gene, the one with the highest median value was kept. From an initial list of f45,000 probe sets, we kept 16,215 ‘‘unique’’ probe sets corresponding to a unique gene. Due to the univocal correspondence between these 16,215 probe sets and genes, we will use the terms ‘‘probe set’’ and ‘‘gene’’ indifferently. Chromosomal Location of the Probe Sets. Each of the 16,215 probe sets represents a unique gene. Their genomic locations were obtained from the U133 annotation files from Affymetrix. When the position of the

www.aacrjournals.org

probe set was not available, it was obtained using the basic local alignment search tool–like alignment tool program (16) for sequencematching searches of probe set–specific target sequences (an f600 nucleotide sequence from which probe set oligonucleotides are derived) against the University of California Santa Cruz Human Genome Working Draft sequence, or by using the position of their corresponding UniGene Cluster (Homo sapiens UniGene Build 164). All positions refer to the July 2003 Human Genome Working Draft. Calculation of Transcription Correlation Scores and Identification of Neighboring Genes with Correlated Expression. A similar method to those described in yeast and Drosophila (12, 13), based on transcriptome array data, has been developed in our laboratory (by N. Stransky) to evaluate the correlation between the expression profile of each gene and those of its neighbors. For each probe set (gene), we calculated a score, which we called the transcriptome correlation score. This score is the sum of the Spearman rank order correlation values in the tumor samples between the RNA levels of this gene and the RNA levels of each of the physically nearest 2n genes (n centromeric genes and n telomeric genes). To determine a significance threshold (i.e., a score above which a gene is considered to have a similar expression pattern to its neighbors), we created 100 random data sets of the same size by randomly ordering the 16,215 probe sets on the genome. For each random set, transcriptome correlation scores were calculated for each probe set. The significance threshold was the 500th quantile of the distribution (i.e., the value for which 1 of 500 probe sets in the random data sets were above this value). Probe sets with a score exceeding the threshold are consequently significantly correlated with their neighbors within a number of 2n probe sets at P < 0.002 and are called ‘‘correlated probe sets.’’ To determine the appropriate number (2n) of neighboring genes needed to calculate the transcriptome correlation score for each gene of our data set, the total number of genes with a score above the threshold was calculated as a function of 2n (13) for several values ranging from 2 to 34 (Supplementary Fig. S1). Above 2n = 20, this number reaches a plateau. Therefore, 20 neighboring genes were used to calculate the transcriptome correlation score. For each of the 16,215 probe sets used, we calculated the transcriptome correlation score. For each chromosome, we obtained a diagram (the transcriptome correlation map) representing this score for each probe set, organized according to its chromosomal position. Correlation between Adjacent Groups of Correlated Genes. To determine if there was a correlation between adjacent groups of correlated genes, we analyzed the correlated probe sets by one-way unsupervised hierarchical clustering on samples (17) and by leaving the probe sets organized according to their chromosomal position. Software Used for Data Analyses. Statistical analysis and all calculations were done using R 1.9.0 (http://www.r-project.org). We used the HKIS software (http://isoft.free.fr/hkis/) to look up gene positions in public databases and for data formatting. We used Java TreeView 1.0.5 (http://jtreeview.sourceforge.net/) to make representations of Eisen clusters (17) obtained with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/ software/cluster/).

Results Generation of a Transcriptome Map Based on the Correlated Expression of Neighboring Genes (Transcriptome Correlation Map) in Infiltrating Ductal Breast Carcinoma. The RNA expression data of 130 ductal invasive breast carcinomas were obtained using the Affymetrix U133 set. Of the 45,000 probes in the initial set, 16,215 probe sets, each corresponding to a unique gene, were kept to establish chromosomal transcriptome maps (see Materials and Methods). These maps highlight genes that show a correlated expression with their neighbors. The transcriptome correlation maps for genes located on chromosomes 1 and 2 are shown in Fig. 1. The significance threshold was defined as the 500th quantile of the

1377

Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Cancer Research

Figure 1. Transcriptome correlation map of chromosomes 1 and 2 in infiltrating ductal breast carcinoma. The transcriptome correlation score (TC score) for a given gene indicates the strength of the correlation between the expression of this gene and the expression of the 20 neighboring genes (see Materials and Methods). The transcriptome correlation map shows the scores for the different genes as a function of their position along the chromosome. Left, transcriptome correlation maps of chromosome 1 and chromosome 2. Dashed line, significance threshold: Genes with scores above this threshold are considered to have a significantly correlated expression with that of their neighbors. This significant threshold was calculated using random data sets (see Materials and Methods). Right, examples of random data set for chromosome 1 and chromosome 2.

distribution of the resulting 1.6 million transcriptome correlation scores in the random data sets and was equal to 3.38. Examples of a random data set for chromosomes 1 and 2 are given in Fig. 1 (right panels). The transcriptome correlation maps of the different chromosomes, including chromosomes 1 and 2, showed that genes with a transcriptome correlation score above the threshold were not distributed uniformly along the chromosomes: groups of adjacent genes with correlated expression could be seen (Fig. 1 and Supplementary Fig. S2 and Table S2). Interestingly, the genes corresponding to the three well-known amplification regions in breast cancer [8p11-p12 (FGFR1 locus), 11q13 (CCDN1 locus), and 17p12 (ERBB2 locus)] were present in chromosomal regions containing genes with transcriptome correlation scores higher than the threshold (Fig. 2). Large Chromosomal Domains of Co-expressed Genes. Overall, 20% of the genes had a significant transcriptome correlation score (i.e., their expression was correlated with that of their neighbors). The percentage of genes with a significant transcriptome

Cancer Res 2005; 65: (4). February 15, 2005

correlation score differed significantly between the different chromosome arms (Fig. 3). The chromosome arms with the highest percentages (higher than 30%) were 1q (243 of 750 genes), 8p (82 of 207 genes), 8q (185 of 364 genes), 14q (174 of 527 genes), 16p (179 of 379 genes), 16q (110 of 318 genes), 17q (303 of 704 genes), and 20q (101 of 304 genes). Beside chromosome arm 14q, all these chromosome arms are also the most frequent locations of genomic alterations in breast cancer. We analyzed chromosome arms 1q, 8p, 8q, 16p, 16q, and 17q in more detail (Figs 4–6 and Supplementary Fig. S4). For this analysis, we considered only genes with a transcriptome correlation score above the threshold (Figs. 4A-6A and Supplementary Fig. S4A). The expression patterns of these genes, ordered according to their chromosomal location, were examined in the 130 breast cancer samples by using an unsupervised hierarchical clustering representation (one-way clustering on samples). Very similar expression patterns were obtained for the 243 genes on chromosome 1q with a significant transcriptome correlation score (Fig. 4B). These genes extended from PEX11B

1378

www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Transcriptome Correlation Map of Breast Carcinomas

(1q21.1) to ELYS (1q44), spanning a 100,677-kb region. A group of genes with a similar expression pattern was also apparent on the long arm of chromosome 8 (Fig. 5B) and extended to the centromeric region of chromosome 8p [213 genes, spanning 108,140 kb from FLJ14299 (8p12) to KIAA0014 (8q24.3)]. The 8p11p12 amplified region containing the FGFR1 locus was located at the edge of this domain. A very different expression pattern was observed for the genes telomeric to the FGFR1 locus [54 genes, spanning 25,203 kb from LOC84549 (8p12) to DKFZp761P0423 (8p23.1)]. The 289 genes on chromosome 16 (Supplementary Fig. S4B) with a transcriptome correlation score higher than the threshold corresponded to two different sets of genes. One was a group of 110 genes all localized on 16q and spanning 43,127 kb from VPS35 (16q11.2) to FANCA (16q24.3) and the second was a group of 179 genes all localized on 16p, spanning 30,901 kb from BCKDK (16p11.2) to RGS11 (16p13.3). In contrast, numerous different expression patterns were observed on chromosome 17q (Fig. 6B). At least six different sets of genes were found: A group of 36 genes (a) spanning 4,034 kb from TNFAIP1 (17q11.2) to ZNF207 (17q11.2), a group of 37 genes (b) spanning 4,110 kb from FLJ22865 (17q12) to SMARCE1 (17q21.2), a group of 64 genes (c) spanning 7,045 kb from KRTAP4-15 (17q21.2) to SCAP1 (17q21.32), a group of four genes (d) spanning 78 kb from HOXB2 (17q21.32) to HOXB7 (17q21.32), a group of 74 genes (e) spanning 19,006 kb from

KIAA0924 (17q21.32) to LOC51321 (17q24.2), and a group of 88 genes ( f ) spanning 14,219 kb from SLC16A6 (17q24.2) to MGC4368 (17q25.3 (17925.3) (Fig. 6).

Discussion Microarray technology makes it possible to monitor the expression of thousand of genes simultaneously. Completion of the initial working draft of the human genome has made it possible to analyze the expression of the genes according to their position on the genome. Comparison of the expression patterns of adjacent genes in different expression array data sets in yeast (12, 18), Drosophila (13), and worms (19) has led to the identification of groups of physically adjacent genes that share similar expression profiles. In this work, we used this new approach to analyze large-scale transcriptome data concerning cancer. To identify systematically co-expressed adjacent genes along the chromosomes, we calculated an expression correlation score for each given gene. This score was the sum of the correlation values between the RNA expression levels of this gene and the RNA expression levels of each of the 20 neighboring genes (10 centromeric genes and 10 telomeric genes). Using the U133 Affymetrix transcriptome data for a series of 130 invasive ductal

Figure 2. Transcriptome correlation maps of regions localized on chromosomes 8, 11, and 17 containing known amplicons in infiltrating ductal breast carcinoma. The transcriptome correlation maps of chromosome 8 (region p21.2 to q11.21; left), chromosome 11 (region q11 to q14.1; middle ), and chromosome 17 (region q11.2 to q21.31; right ) are shown. Vertical bars, position of the centromeres. Bold squares, transcriptome correlation scores (TC score) for FGFR1, CCND1, and ERBB2.

www.aacrjournals.org

1379

Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Cancer Research

Figure 3. Percentage of genes with a transcriptome correlation score (TC score) higher than the threshold for each chromosomal arm in infiltrating ductal breast carcinoma. For each chromosomal arm, the number of genes with a transcriptome correlation score higher than the significant correlation score divided by the total number of considered genes on the chromosome arm was calculated and expressed as a percentage.

breast carcinomas, we constructed a genome-wide map that we called the transcriptome correlation map. This map highlights the ‘‘correlated genes’’ (i.e., genes that share a similar expression profile with their neighbors). We have found in this series of breast tumors that f20% of the genes showed a significant correlation with their neighbors. The transcription correlation maps of the different chromosomes revealed regions with a high percentage of correlated genes separated by regions containing genes not significantly correlated with their neighbors. The precise physical definition of the correlated regions was not always straightforward as a continuum between the correlated regions was observed in some cases. We addressed the effect of the gene densities on the percentage of correlated genes in the transcription correlation map. Unlike for the regions of increased gene expression described by Caron et al. (7), we observed no systematic correlation between gene density and the percentage of genes that are over the threshold on the transcription correlation map (Supplementary Fig. S3). Genetic or nongenetic mechanisms could account for the correlation in expression between neighboring genes. Aneusomy will affect in the same way the expression of a series of adjacent genes not subjected to gene dosage compensation. This has been described both for DNA losses and gains in yeast (20) and in humans (10, 11, 21, 22). Nongenetic mechanisms could also affect the expression of neighboring genes. Several models or combinations of models have been proposed: long range effect of transcription factors, chromatin structure modification, and increased concentration of components of the transcriptional machinery due to a particular subnuclear location of chromosomal segments (23). Genomic alterations most likely explain part of the correlation between neighboring genes in breast tumors because, except for chromosome arm 14q, the chromosome arms presenting the highest percentage of correlated genes (8q, 51%; 16p, 47%; 17q, 43%; 8p, 40%; 16q, 35%; 20q, 33%; 1q, 32%) were also known to harbor frequent chromosome imbalance, as shown by karyotypic, comparative genomic hybridization, or loss of heterozygosity studies (24–30). Two well-characterized amplicons, the FGFR1 amplicon at 8p11-p12 and the ERBB2 amplicon at 17q12, corresponded to regions presenting a high percentage of correlated genes. Very recent data on breast cancer cell lines

Cancer Res 2005; 65: (4). February 15, 2005

suggest that FGFR1 is not the driving oncogene in the 8p11-p12 amplicon. Several new candidate oncogenes located in this region have been suggested to play a causal role in breast cancer progression in tumors with an amplification in the 8p11-p12

Figure 4. Chromosomal domains of co-expressed genes on chromosome 1q. A , transcriptome correlation map of chromosome 1. B, unsupervised hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using the genes on the long arm of chromosome 1 with a significant transcriptome correlation score (TC score ). Each row corresponds to one tumor sample and each column corresponds to one gene. The genes are arranged in cytogenetic order from the centromere to the q telomere: red, high level of expression relative to the mean expression level in the 130 tumor samples; green, low level of gene expression. The genes under line a correspond to the genes contained in rectangle a of A.

1380

www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Transcriptome Correlation Map of Breast Carcinomas

correlated in these regions. This percentage is in good agreement with the results of Hyman et al. (10) and Pollack et al. (11), who showed that f50% of the genes within an amplified region are overexpressed. We found chromosomal regions harboring genes with correlated expression in regions known to contain amplicons like 8p11-p12, 11q13, and 17q12 in regions exhibiting frequent single chromosome arm gain like 1q, 8q, and 16p, and in regions exhibiting chromosomal losses like 16q. To determine the exact genetic mechanisms responsible for the correlation in expression between adjacent genes, it will be necessary to obtain, in parallel with transcriptome data, genome data like those obtained by comparative genomic hybridization arrays (35). The combination of comparative genomic hybridization array data and transcription correlation map analysis should help us to identify genes involved in tumor progression. Groups of genes with correlated expression were rarer but also present in regions not affected by genetic alterations in breast cancer like chromosome arms 2p or 9q. Our new approach for analyzing the transcriptome in tumors may pinpoint genes involved in tumor progression, the expression of which is altered by nongenetic mechanisms. Using hierarchical clustering and a Treeview representation, we were able to show that regions with genes with similar expression patterns can extend over very large chromosomal Figure 5. Chromosomal domains of co-expressed genes on chromosome 8. A, transcriptome correlation map of chromosome 8. B, unsupervised hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using the genes of chromosome 8 with a significant transcriptome correlation score (TC score). Each row, one tumor sample; each column, one gene. The genes are arranged in cytogenetic order from the p telomere to the q telomere: red, high level of expression relative to the mean expression level in the 130 tumor samples; green, low level of gene expression. The genes under line a (or b) correspond to the genes contained in rectangle a (or b) of A.

region. Of the 15 genes with a significant transcriptome correlation score from this region, nine (HTPAP, KIAA0725, FGFR1, BAG4, LSM1, RCP, BRF2, PROSC, and FLJ14299) have already been proposed to be potential oncogenes (31). The ERBB2 region at 17q12 is of particular interest in breast cancer. Amplification and overexpression of ERBB2 is associated with poor clinical outcome and is found in 15% of infiltrating breast carcinomas. Moreover, ERBB2 gene amplification determines the response to specific antibody-based therapy (trastuzumab; ref. 32). All seven of the genes located in the minimal ERBB2 amplification region (280 kb) that presented an overexpression associated with amplification (STARD3 , PNMT, TCAP, CAB2 , ERBB2 , MGC14832, and GRB7; ref. 33) corresponded to correlated genes in the 17q12 region. The 11q13 region is often amplified in breast cancer, and CCND1 was initially thought to be the main oncogene in this region. Although CCND1 was one of the correlated genes, many other genes in this region had similar or higher transcriptome correlation scores, possibly because multiple mechanisms for increasing the expression of CCND1 occur frequently in addition to amplification (33) and/or due to the complexity of the 11p13 amplification region, which probably contains several different amplicons (34). It should be noted that within the different regions rich in correlated genes, in particular 11q13 and 17q12, f50% of the genes were found to be correlated, meaning that the other 50% of genes are not

www.aacrjournals.org

Figure 6. Chromosomal domains of co-expressed genes on chromosome 17q. A, transcriptome correlation map of chromosome 17. B, unsupervised hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using the genes on the long arm of chromosome 17 with a significant transcriptome correlation score (TC score ). Each row corresponds to one tumor sample and each column corresponds to one gene. The genes are arranged in cytogenetic order from the centromere to the q telomere: red, high level of expression relative to the mean expression level in the 130 tumor samples; green, low level of gene expression. The genes under line a (or b , c , d, e, f) correspond to the genes under line a (or b, c , d, e, f ) of A .

1381

Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Cancer Research

domains, sometimes involving a whole chromosome arm. This phenomenon could be due to genetic or nongenetic mechanisms. Changes in whole chromosome gene expression due to aneuploidy have already been described in yeast and in human (20–22). Modification of gene expression involving large chromosomal domains or even entire chromosomes could also be due to epigenetic mechanisms (36). The ‘‘transcriptome correlation map’’ approach has two strong points: (a) although potentially very useful, knowledge about gene expression in normal tissue is not compulsory. This is particularly useful for tumors for which the normal counterparts are difficult to obtain, such as breast or ovarian carcinomas, or when the cellular origin of the tumor is unknown, like Ewing tumors, synovialosarcoma, or medulloblastomas; (b) it will be possible to compare the transcriptome correlation maps between different groups of samples even if the data for the different groups are obtained on different platforms because transcriptome correlation map does not compare the gene expression in different groups but the correlation of expression between neighboring genes within a group of samples. Additionally, subsets of genes with a significant correlation score could be used to classify tumors into meaningful anatomoclinical groups. It should be noted that the relationship between regions of correlated genes can occur not only between adjacent regions but also between regions on different chromosome arms or even different chromosomes (e.g., higher levels of expression of the genes included in the ERBB2 cluster are associated with lower expression

levels of the genes included in the CCND1 cluster). The systematic investigation of such relationship could help to identify combinations of events that occur during tumor progression. Transcriptome correlation maps are a new way of interpreting transcriptome data. Combined with other molecular data (i.e., chromosomal alteration data obtained by comparative genomic hybridization arrays or large-scale methylation analysis; refs. 35, 37, 38), transcriptome correlation map can be applied to any tumor type and will help to identify new genes involved in tumor progression and new mechanisms of gene regulation in tumors.

Acknowledgments Received 7/29/2004; revised 10/13/2004; accepted 12/9/2004. Grant support: Centre National de la Recherche Scientifique, Institut Curie Breast Cancer program, European FP5 IST HKIS project, and Comite´ de Paris Ligue Nationale Contre le Cancer (Laboratoire associe´); Ligue Nationale Contre le Cancer fellowship (F. Reyal and I. Bernard-Pierrot) and French Ministry of Education and Research fellowship (N. Stransky). The Institut Curie Breast Cancer Group: Bernard Asselain, Alain Aurias, Emmanuel Barillot, Franc¸ois Campana, Patricia De Cremoux, Olivier Delattre, Veronique Die´ras, Jean-Marc Extra, Alain Fourquet, Henri Magdele´nat, Martine Meunier, Claude Nos, Thao Palangie´, Pierre Pouillart, Marie-France Poupon, Franc¸ois Radvanyi, Xavier Sastre-Garau, Brigitte Sigal-Zafrani, Dominique Stoppa-Lyonnet, Anne Tardivon, Fabienne Thibault, Jean Paul Thiery, and Anne Vincent-Salomon. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1. Perou CM, Sorlie T, Eisen MB, et al. Molecular portraits of human breast tumours. Nature 2000; 406:747–52. 2. Sorlie T, Perou CM, Tibshirani R, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001;98:10869–74. 3. Sorlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 2003;100:8418–23. 4. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415:530–6. 5. van de Vijver MJ, He YD, van’t Veer LJ, et al. A geneexpression signature as a predictor of survival in breast cancer. N Engl J Med 2002;347:1999–2009. 6. Collins FS, Morgan M, Patrinos A. The Human Genome Project: lessons from large-scale biology. Science 2003;300:286–90. 7. Caron H, van Schaik B, van der Mee M, et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 2001;291:1289–92. 8. Zhou Y, Luoh SM, Zhang Y, et al. Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis. Cancer Res 2003;63:5781–4. 9. Fujii T, Dracheva T, Player A, et al. A preliminary transcriptome map of non-small cell lung cancer. Cancer Res 2002;62:3340–6. 10. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res 2002;62:6240–5. 11. Pollack JR, Sorlie T, Perou CM, et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of

human breast tumors. Proc Natl Acad Sci U S A 2002;99: 12963–8. 12. Cohen BA, Mitra RD, Hughes JD, Church GM. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet 2000;26:183–6. 13. Spellman PT, Rubin GM. Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol 2002;1:5. 14. Coombs LM, Pigott D, Proctor A, Eydmann M, Denner J, Knowles MA. Simultaneous isolation of DNA, RNA, and antigenic protein exhibiting kinase activity from small tumor samples using guanidine isothiocyanate. Anal Biochem 1990;188:338–43. 15. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ. Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 1979;18:5294–9. 16. Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res 2002;12:656–64. 17. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998;95:14863–8. 18. Kruglyak S, Tang H. Regulation of adjacent yeast genes. Trends Genet 2000;16:109–11. 19. Lercher MJ, Blumenthal T, Hurst LD. Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res 2003;13:238–43. 20. Hughes TR, Roberts CJ, Dai H, et al. Widespread aneuploidy revealed by DNA microarray expression profiling. Nat Genet 2000;25:333–7. 21. Virtaneva K, Wright FA, Tanner SM, et al. Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. Proc Natl Acad Sci U S A 2001;98: 1124–9. 22. Phillips JL, Hayward SW, Wang Y, et al. The consequences of chromosomal aneuploidy on gene

Cancer Res 2005; 65: (4). February 15, 2005

1382

References

expression profiles in a cell line model for prostate carcinogenesis. Cancer Res 2001;61:8143–9. 23. Oliver B, Parisi M, Clark D. Gene expression neighborhoods. J Biol 2002;1:4. 24. Tirkkonen M, Tanner M, Karhu R, Kallioniemi A, Isola J, Kallioniemi OP. Molecular cytogenetics of primary breast cancer by CGH. Genes Chromosomes Cancer 1998;21:177–84. 25. Forozan F, Mahlamaki EH, Monni O, et al. Comparative genomic hybridization analysis of 38 breast cancer cell lines: a basis for interpreting complementary DNA microarray data. Cancer Res 2000;60:4519–25. 26. Cingoz S, Altungoz O, Canda T, Saydam S, Aksakoglu G, Sakizli M. DNA copy number changes detected by comparative genomic hybridization and their association with clinicopathologic parameters in breast tumors. Cancer Genet Cytogenet 2003;145: 108–14. 27. Janssen EA, Baak JP, Guervos MA, van Diest PJ, Jiwa M, Hermsen MA. In lymph node-negative invasive breast carcinomas, specific chromosomal aberrations are strongly associated with high mitotic activity and predict outcome more accurately than grade, tumour diameter, and oestrogen receptor. J Pathol 2003;201: 555–61. 28. Jong YJ, Li LH, Tsou MH, et al. Chromosomal comparative genomic hybridization abnormalities in early- and late-onset human breast cancers: correlation with disease progression and TP53 mutations. Cancer Genet Cytogenet 2004;148:55–65. 29. Kirchweger R, Zeillinger R, Schneeberger C, Speiser P, Louason G, Theillet C. Patterns of allele losses suggest the existence of five distinct regions of LOH on chromosome 17 in breast cancer. Int J Cancer 1994;56:193–9. 30. Dutrillaux B, Gerbault-Seureau M, Zafrani B. Characterization of chromosomal anomalies in human breast cancer. A comparison of 30 paradiploid cases

www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Transcriptome Correlation Map of Breast Carcinomas with few chromosome changes. Cancer Genet Cytogenet 1990;49:203–17. 31. Ray ME, Yang ZQ, Albertson D, et al. Genomic and expression analysis of the 8p11-12 amplicon in human breast cancer cell lines. Cancer Res 2004; 64:40–7. 32. Nahta R, Hung MC, Esteva FJ. The HER-2-targeting antibodies trastuzumab and pertuzumab synergistically inhibit the survival of breast cancer cells. Cancer Res 2004;64:2343–6.

www.aacrjournals.org

33. Kauraniemi P, Kuukasjarvi T, Sauter G, Kallioniemi A. Amplification of a 280-kilobase core region at the ERBB2 locus leads to activation of two hypothetical proteins in breast cancer. Am J Pathol 2003;163: 1979–84. 34. Ormandy CJ, Musgrove EA, Hui R, Daly RJ, Sutherland RL. Cyclin D1, EMS1 and 11q13 amplification in breast cancer. Breast Cancer Res Treat 2003;78:323–35. 35. Pinkel D, Segraves R, Sudar D, et al. High resolution analysis of DNA copy number variation using compar-

ative genomic hybridization to microarrays. Nat Genet 1998;20:207–11. 36. Grewal SI, Moazed D. Heterochromatin and epigenetic control of gene expression. Science 2003;301:798–802. 37. Zardo G, Tiirikainen MI, Hong C, et al. Integrated genomic and epigenomic analyses pinpoint biallelic gene inactivation in tumors. Nat Genet 2002; 32:453–8. 38. Huang TH, Perry MR, Laux DE. Methylation profiling of CpG islands in human breast cancer cells. Hum Mol Genet 1999;8:459–70.

1383

Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes−−A Study of 130 Invasive Ductal Breast Carcinomas Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, et al. Cancer Res 2005;65:1376-1383.

Updated version Supplementary Material

Cited articles Citing articles

E-mail alerts Reprints and Subscriptions Permissions

Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/65/4/1376 Access the most recent supplemental material at: http://cancerres.aacrjournals.org/content/suppl/2005/10/12/65.4.1376.DC1.html

This article cites 38 articles, 18 of which you can access for free at: http://cancerres.aacrjournals.org/content/65/4/1376.full.html#ref-list-1 This article has been cited by 17 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/65/4/1376.full.html#related-urls

Sign up to receive free email-alerts related to this article or journal. To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at [email protected]. To request permission to re-use all or part of this article, contact the AACR Publications Department at [email protected].

Downloaded from cancerres.aacrjournals.org on February 16, 2016. © 2005 American Association for Cancer Research.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.