skip to main content
article

A methodology for analyzing SAGE libraries for cancer profiling

Published: 01 January 2005 Publication History

Abstract

Serial Analysis of Gene Expression (SAGE) has proven to be an important alternative to microarray techniques for global profiling of mRNA populations. We have developed preprocessing methodologies to address problems in analyzing SAGE data due to noise caused by sequencing error, normalization methodologies to account for libraries sampled at different depths, and missing tag imputation methodologies to aid in the analysis of poorly sampled SAGE libraries. We have also used subspace selection using the Wilcoxon rank sum test to exclude tags that have similar expression levels regardless of source. Using these methodologies we have clustered, using the OPTICS algorithm, 88 SAGE libraries derived from cancerous and normal tissues as well as cell line material. Our results produced eight dense clusters representing ovarian cancer cell line, brain cancer cell line, brain cancer bulk tissue, prostate tissue, pancreatic cancer, breast cancer cell line, normal brain, and normal breast bulk tissue. The ovarian cancer and brain cancer cell lines clustered closely together, leading to a further investigation on possible associations between these two cancer types. We also investigated the utility of gene expression data in the classification between normal and cancerous tissues. Our results indicate that brain and breast cancer libraries have strong identities allowing robust discrimination from their normal counterparts. However, the SAGE expression data provide poor predictive accuracy in discriminating between prostate and ovarian cancers and their respective normal tissues.

References

[1]
Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Welsenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., and Staudt, L. M. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 3 (Feb.), 503--511.
[2]
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci USA, 96, 6745--6750.
[3]
Ankerst, M., Breunig, M., Kriegel, H.-P., and Sander, J. 1999. OPTICS: Ordering Points to identify the clustering structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, June 1999, ACM Press, New York, NY, 49--60.
[4]
Ben-Dor, A., Shamir, R., and Yahkini, Z. 1999. Clustering gene expression patterns. J. Comput. Biol. 6, 281--297.
[5]
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. 2000. Tissue classification with gene expression profiles. J. Comput. Biol. 7, 559--584.
[6]
Boon, K., Osório, E. C., Greenhut, S. F., Schaefer, C. F., Shoemaker, J., Polyak, K., Morin, P. J., Buetow, K. H., Strausberg, R. L., de Souza, S. J., and Riggins, G. J. 2002. An anatomy of normal and malignant gene expression. Proc. Natl. Acad. Sci. USA 99, 11287--11292.
[7]
Buckhaults, P., Zhang, Z., Chen, Y. C., Wang, T. L., St Croix, B., Saha, S., Bardelli, A., Morin, P. J., Polyak, K., Hruban, R. H., Velculescu, V. E., and Shih, IeM. 2003. Identifying tumor origin using a gene expression-based classification map. Cancer Res. 15, 63, 14, 4144--4149.
[8]
Edgar, R., Domrachev, M., and Lash, A. E. 2002. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207--210.
[9]
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 25, 14863--14868.
[10]
Gray, J. W. and Collins, C. 2000. Genome changes and gene expression in human solid tumors. Carcinogenesis 21, 443--52.
[11]
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531--537.
[12]
Hamosh, A., Scott, A. F., Amberger, J., Bocchini, C., Valle, D., and McKusick, V. A. 2002. Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52--55.
[13]
Han, J. and Kamber, M. 2000. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA.
[14]
Hashimoto, S., Nagai, S., Sese, J., Suzuki, T., Obata, A., Sato, T., Toyoda, N., Dong, H. Y., Kurachi, M., Nagahata, T., Shizuno, K., Morishita, S., and Matsushima, K. 2003. Gene expression profile in human leukocytes. Blood 101, 9, 3509--3513.
[15]
Higashi, T., Sasagawa, T., Inoue, M., Oka, R., Shuangying, L., and Saijoh, K. 2001. Overexpression of latent transforming growth factor-beta 1 (TGF-beta 1) binding protein (LTBP-1) in association with TGF-beta 1 in ovarian carcinoma. Jpn. J. Cancer Res. 92, 2, 506--515.
[16]
Lal, A., Lash, A. E., Altschul, S. F., Velculescu, V., Zhang, L., McLendon, R. E., Marra, M. A., Prange, C., Morin, P. J., Polyak, K., Papadopoulos, N., Vogelstein, B., Kinzler, K. W., Strausberg, R. L., and Riggins, G. J. 1999. A public database for gene expression in human cancers. Cancer Res. 59, 5403--5407.
[17]
Lash, A. E., Tolstoshev, C. M., Wagner, L., Schuler, G. D., Strausberg, R. L., Riggins, G. J., and Altschul, S. F. 2000. SAGEmap: A public gene expression resource. Genome Res, 10, 7, 1051--1060.
[18]
Leung, T. W., Lin, S. S., Tsang, A. C., Tong, C. S., Ching, J. C., Leung, W. Y., Gimlich, R., Wong, G. G., and Yao, K. M. 2001. Over-expression of FoxM1 stimulates cyclin B1 expression. FEBS Lett. 507, 59--66.
[19]
Nacht, M., Dracheva, T., Gao, Y., Fujii, T., Chen, Y., Player, A., Akmaev, V., Cook, B., Dufault, M., Zhang, M., Zhang, W., Guo, M., Curran, J., Han, S., Sidransky, D., Buetow, K., Madden, S. L., and Jen, J. 2001. Molecular characteristics of non-small cell lung cancer. Proc. Natl. Acad. Sci. USA. 98, 26, 15203--15208.
[20]
NCBI (National Center for Biotechnology Information) SAGE: Measuring Gene Expression, http://www.ncbi.nlm.nih.gov/SAGE.
[21]
Nagasaki, K., Manabe, T., Hanzawa, H., Maass, N., Tsukada, T., and Yamaguchi, K. 1999. Identification of a novel gene, LDOC1, down-regulated in cancer cell lines. Cancer Lett. 140, 227--234.
[22]
Ng, R. T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, September 1994, Morgan Kaufmann Publishers, San Francisco, CA, 144--155.
[23]
Ng, R. T., Sander, J., and Sleumer, M. 2001. Hierarchical cluster analysis of SAGE data for cancer profiling. Workshop on Data Mining in Bioinformatics. In Conjunction with 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001.
[24]
Oklu, R. and Hesketh, R. 2000. The latent transforming growth factor beta binding protein (LTBP) family. Biochem J. 352, Pt 3, 601--610.
[25]
Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C. F., Lashkari, D., Shalon, D., Brown, P. O., and Botstein, D. 1999. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Natl. Acad. Sci USA 96, 9212--9217.
[26]
Porter, D. A., Krop, I. E., Nasser, S., Sgroi, D., Kaelin, C. M., Marks, J. R., Riggins, G., and Polyak, K. 2001. A SAGE (serial analysis of gene expression) view of breast tumor progression. Cancer Res. 61, 15, 5697--702.
[27]
Sander, J., Qin, X., Lu, Z., Niu, N., and Kovarsky, A. 2003. Automatic extraction of clusters from hierarchical clustering representations. In Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Seoul, Korea, April/May 2003. Lecture Notes in Artificial Intelligence 2637, Springer, Berlin, Germany, 75--87.
[28]
Stollberg, J., Urschitz, J., Urban, Z., and Boyd, C. D. 2000. A Quantitative Evaluation of SAGE. Genome Res. 10, 1241--1248.
[29]
Strausberg, R. L, Buetow, K. H., Emmert-Buck, M. R., and Klausner, R. D. 2000. The cancer genome anatomy project: Building an annotated index. Trends Genet. 16, 3, 103--106.
[30]
Tanner, M. M., Grenman, S., Koul, A., Johannsson, O., Meltzer, P., Pejovic, T., Borg, Å., and Isola, J. J. 2000. Frequent Amplification of Chromosomal Regoin 20q12-q13 in Ovarian Cancer. Clin. Cancer Res. 6, 1833--1839.
[31]
Tavazoie, S, Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. 1999. Systematic determination of genetic network architecture. Nature Genetics 22, 281--285.
[32]
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P. Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6, 520--525.
[33]
van Ruissen, F., Jansen, B. J., de Jongh, G. J., van Vlijmen-Willems, I. M., and Schalkwijk, J. 2002. Differential gene expression in premalignant human epidermis revealed by cluster analysis of serial analysis of gene expression (SAGE) libraries. FASEB J. 16, 2, 246--248.
[34]
Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. 1995. Serial analysis of gene expression. Science 270, 484--487.
[35]
Wilcoxon, F. 1945. Individual Comparisons by Ranking Methods. Biometrics 1, 80--83.
[36]
Yarden, R. I., Pardo-Reoyo, S., Sgagias, M., Cowan, K. H., and Brody, L. C. 2002. BRCA1 regulates the G2/M checkpoint by activating Chk1 kinase upon DNA damage. Nature Genetics 30, 285--289.
[37]
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977--987.
[38]
Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R., Vogelstein, B., and Kinzler, K. W. 1997. Gene expression profiles in normal and cancer cells. Science 276, 1268--1272.

Cited By

View all
  • (2017)Using LongSAGE to Detect Biomarkers of Cervical Cancer Potentially Amenable to Optical Contrast Agent LabellingBiomarker Insights10.1177/1177271907002000202(117727190700200)Online publication date: 8-Nov-2017
  • (2009)Investigation of users’ preferences in interactive multimedia learning systems: a data mining approachInteractive Learning Environments10.1080/1049482080198831517:2(151-163)Online publication date: Jun-2009
  • (2008)Clustering-based approaches to SAGE data miningBioData Mining10.1186/1756-0381-1-51:1Online publication date: 17-Jul-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 23, Issue 1
January 2005
145 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1055709
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2005
Published in TOIS Volume 23, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Gene expression
  2. cancer profiling
  3. classification
  4. clustering

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Using LongSAGE to Detect Biomarkers of Cervical Cancer Potentially Amenable to Optical Contrast Agent LabellingBiomarker Insights10.1177/1177271907002000202(117727190700200)Online publication date: 8-Nov-2017
  • (2009)Investigation of users’ preferences in interactive multimedia learning systems: a data mining approachInteractive Learning Environments10.1080/1049482080198831517:2(151-163)Online publication date: Jun-2009
  • (2008)Clustering-based approaches to SAGE data miningBioData Mining10.1186/1756-0381-1-51:1Online publication date: 17-Jul-2008
  • (2008)Cancer classification from serial analysis of gene expression with event modelsApplied Intelligence10.1007/s10489-007-0079-629:1(35-46)Online publication date: 1-Aug-2008
  • (2008)An Improved Median Filtering System and Its Application of Calcified Lesions' Detection in Digital MammogramsMedical Imaging and Informatics10.1007/978-3-540-79490-5_28(223-232)Online publication date: 1-Jan-2008
  • (2007)The role of human factors in stereotyping behavior and perception of digital library users: a robust clustering approachUser Modeling and User-Adapted Interaction10.1007/s11257-007-9028-717:3(305-337)Online publication date: 1-Jul-2007
  • (2007)Multinomial event naive Bayesian modeling for SAGE data classificationComputational Statistics10.1007/s00180-007-0029-022:1(133-143)Online publication date: 1-Apr-2007
  • (2006)Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression dataBMC Bioinformatics10.1186/1471-2105-7-1167:1Online publication date: 8-Mar-2006
  • (2006)Event models for tumor classification with SAGE gene expression dataProceedings of the 6th international conference on Computational Science - Volume Part II10.1007/11758525_104(775-782)Online publication date: 28-May-2006
  • (2006)Mining outliers in spatial networksProceedings of the 11th international conference on Database Systems for Advanced Applications10.1007/11733836_13(156-170)Online publication date: 12-Apr-2006
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media