Skip to main content

Text Mining with Hybrid Biclustering Algorithms

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

Abstract

Text data mining is the process of extracting valuable information from a dataset consisting of text documents. Popular clustering algorithms do not allow detection of the same words appearing in multiple documents. Instead, they discover general similarity of such documents. This article presents the application of a hybrid biclustering algorithm for text mining documents collected from Twitter and symbolic analysis of knowledge spreadsheets. The proposed method automatically reveals words appearing together in multiple texts. The proposed approach is compared to some of the most recognized clustering algorithms and shows the advantage of biclustering over clustering in text mining. Finally, the method is confronted with other biclustering methods in the task of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    twitter.com.

  2. 2.

    catalog.data.gov/dataset/consumer-complaint-database.

  3. 3.

    .

  4. 4.

    membres-lig.imag.fr/grimal/code/XSim.tar.gz.

References

  1. Bouchet-Valat, M.: SnowballC: Snowball stemmers based on the C libstemmer UTF-8 library (2014). http://CRAN.R-project.org/package=SnowballC. r package version 0.5.1

  2. Broder, A., Fontoura, M., Josifovski, V., Riedel, L.: A semantic approach to contextual advertising. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 559–566. ACM (2007)

    Google Scholar 

  3. Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. de Castro, P.A.D., de França, F.O., Ferreira, H.M., Von Zuben, F.J.: Applying biclustering to text mining: an immune-inspired approach. In: de Castro, L.N., Von Zuben, F.J., Knidel, H. (eds.) ICARIS 2007. LNCS, vol. 4628, pp. 83–94. Springer, Heidelberg (2007). http://dl.acm.org/citation.cfm?id=1776274.1776284

    Chapter  Google Scholar 

  5. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)

    Google Scholar 

  6. Feinerer, I., Hornik, K.: tm: Text Mining Package (2014). http://CRAN.R-project.org/package=tm. r package version 0.6

  7. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Stat. Softw. 25(5), 1–54 (2008). http://www.jstatsoft.org/v25/i05/

    Article  Google Scholar 

  8. Fellows, I.: wordcloud: Word Clouds (2014). http://CRAN.R-project.org/package=wordcloud. r package version 2.5

  9. Filippone, M., Masulli, F., Rovetta, S., Mitra, S., Banka, H.: Possibilistic approach to biclustering: an application to oligonucleotide microarray data analysis. In: Priami, C. (ed.) CMSB 2006. LNCS (LNBI), vol. 4210, pp. 312–322. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Franca, F.O.D.: Scalable Overlapping Co-clustering of Word-Document Data, pp. 464–467. IEEE (2012). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6406666

  11. Gentry, J.: twitteR: R Based Twitter Client (2015). http://CRAN.R-project.org/package=twitteR. r package version 1.1.8

  12. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  13. Henriques, R., Madeira, S.: Biclustering with flexible plaid models to unravel interactions between biological processes. IEEE/ACM Trans. Comput. Biol. Bioinf. PP(99), 1–1 (2015)

    Google Scholar 

  14. Horzyk, A.: Information freedom and associative artificial intelligence. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part I. LNCS, vol. 7267, pp. 81–89. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-29347-4_10

    Chapter  Google Scholar 

  15. Horzyk, A.: How does human-like knowledge come into being in artificial associative systems?. In: Proceedings of the 8-th International Conference on Knowledge, Information and Creativity Support Systems, Krakow, Poland (2013)

    Google Scholar 

  16. Hothorn, T., Everitt, B.S.: A Handbook of Statistical Analyses using R, 3rd edn. Chapman and Hall/CRC, Boca Raton (2014)

    MATH  Google Scholar 

  17. Hussain, S.F., Bisson, G., Grimal, C.: An improved co-similarity measure for document clustering. In: Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications, ICMLA 2010, pp. 190–197 (2010). http://dx.doi.org/10.1109/ICMLA.2010.35

  18. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  19. Jiang, Z., Li, L., Huang, D., Jin, L.: Training word embeddings for deep learning in biomedical text mining tasks. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 625–628. IEEE (2015)

    Google Scholar 

  20. Kaiser, S.: Biclustering: Methods, Software and Application. Ph.D. thesis, Ludwig-Maximilians-Universitt Mnchen (2011)

    Google Scholar 

  21. Liang, T.P., Lai, H.J., Ku, Y.C.: Personalized content recommendation and user satisfaction: theoretical synthesis and empirical findings. J. Manag. Inf. Syst. 23(3), 45–70 (2006)

    Article  Google Scholar 

  22. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)

    Article  Google Scholar 

  23. Mimaroglu, S., Uehara, K.: Bit sequences and biclustering of text documents. In: icdmw, pp. 51–56. IEEE (2007)

    Google Scholar 

  24. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. Proc. Pacific Symp. Biocomputing 3, 77–88 (2003)

    MATH  Google Scholar 

  25. Murtagh, F., Legendre, P.: Wards hierarchical agglomerative clusteringmethod: which algorithms implement wards criterion? J. Classif. 31(3), 274–295 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Orzechowski, P., Boryczko, K.: Propagation-based biclustering algorithm for extracting inclusion-maximal motifs. Computing and Informatics (2016), in print

    Google Scholar 

  27. Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Comput. Inform. 29(6+), 1221–1231 (2010). http://www.cai.sk/ojs/index.php/cai/article/view/140

    Google Scholar 

  28. Orzechowski, P., Boryczko, K.: Hybrid biclustering algorithms for data mining. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 156–168. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31204-0_11

    Chapter  Google Scholar 

  29. Peters, G., Crespo, F., Lingras, P., Weber, R.: Soft clustering fuzzy and rough approaches and their extensions and derivatives. Int. J. Approximate Reasoning 54(2), 307–322 (2013). http://www.sciencedirect.com/science/article/pii/S0888613X12001739

    Article  MathSciNet  Google Scholar 

  30. Poikolainen, I., Neri, F., Caraffini, F.: Cluster-based population initialization for differential evolution frameworks. Inf. Sci. 297, 216–235 (2015)

    Article  Google Scholar 

  31. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  32. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, Boston, MA, pp. 525–526 (2000)

    Google Scholar 

  33. Travers, M., Paley, S.M., Shrager, J., Holland, T.A., Karp, P.D.: Groups: knowledge spreadsheets for symbolic biocomputing. Database 2013, bat061 (2013)

    Google Scholar 

  34. Zhang, K., Katona, Z.: Contextual advertising. Mark. Sci. 31(6), 980–994 (2012)

    Article  Google Scholar 

  35. Zhao, Y.: R and Data mining: examples and case studies. Elsevier Science (2012). http://books.google.com.au/books?id=FEOh08LBD9UC

Download references

Acknowledgments

This research was funded by the Polish National Science Center (NCN), grant No. 2013/11/N/ST6/03204. This research was supported in part by PL-Grid Infrastructure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patryk Orzechowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Orzechowski, P., Boryczko, K. (2016). Text Mining with Hybrid Biclustering Algorithms. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39384-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39383-4

  • Online ISBN: 978-3-319-39384-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics