Skip to main content

On the Abstraction of a Categorical Clustering Algorithm

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9729))

Abstract

Despite being one of the most common approach in unsupervised data analysis, a very small literature exists on the formalization of clustering algorithms. This paper proposes a semiring-based methodology, named Feature-Cluster Algebra, which is applied to abstract the representation of a labeled tree structure representing a hierarchical categorical clustering algorithm, named CCTree. The elements of the feature-cluster algebra are called terms. We prove that a specific kind of a term, under some conditions, fully abstracts a labeled tree structure. The abstraction methodology maps the original problem to a new representation by removing unwanted details, which makes it simpler to handle. Moreover, we present a set of relations and functions on the algebraic structure to shape the requirements of a term to represent a CCTree structure. The proposed formal approach can be generalized to other categorical clustering (classification) algorithms in which features play key roles in specifying the clusters (classes).

This research has been supported by Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Benavides, D., Segura, S., Ruiz-Cortés, A.: Automated analysis of feature models 20 years later: A literature review. Inf. Syst. 35(6), 615–636 (2010)

    Article  Google Scholar 

  2. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 161–168. ACM, NY (2006)

    Google Scholar 

  4. Giunchiglia, F., Walsh, T.: A theory of abstraction. Artif. Intell. 57(2–3), 323–389 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gross, J.L., Yellen, J.: Graph Theory and Its Applications. Discrete Mathematics and Its Applications, 2nd edn. Chapman & Hall/CRC (2005)

    Google Scholar 

  6. Hell, P., Nesetil, J.: Graphs and homomorphisms. Oxford lecture series in mathematics and its applications. Oxford University Press, Oxford, New York (2004)

    Book  Google Scholar 

  7. Höfner, P., Khedri, R., Möller, B.: Feature algebra. In: Misra, J., Nipkow, T., Sekerinski, E. (eds.) FM 2006. LNCS, vol. 4085, pp. 300–315. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Höfner, P., Khédri, R., Möller, B.: An algebra of product families. Software and System Modeling 10(2), 161–182 (2011)

    Article  Google Scholar 

  9. Kang, K.C., Kim, S., Lee, J., Kim, K., Shin, E., Huh, M.: Form: A feature-oriented reuse method with domain-specific reference architectures. Ann. Softw. Eng. 5, 143–168 (1998)

    Article  Google Scholar 

  10. Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: Massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2(2), 1426–1437 (2009)

    Article  Google Scholar 

  11. Sheikhalishahi, M., Mejri, M., Tawbi, N.: Clustering spam emails into campaigns. In: Library, S.D. (ed.) 1st International Conference on Information Systems Security and Privacy (2015)

    Google Scholar 

  12. Sheikhalishahi, M., Saracino, A., Mejri, M., Tawbi, N., Martinelli, F.: Fast and effective clustering of spam emails based on structural similarity. In: Garcia-Alfaro, J., et al. (eds.) FPS 2015. LNCS, vol. 9482, pp. 195–211. Springer, Heidelberg (2016). doi:10.1007/978-3-319-30303-1_12

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mina Sheikhalishahi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sheikhalishahi, M., Mejri, M., Tawbi, N. (2016). On the Abstraction of a Categorical Clustering Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41920-6_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41919-0

  • Online ISBN: 978-3-319-41920-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics