Skip to main content
Log in

A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The groupization aims to enrich the individual preferences using similar individual’s data. It may efficiently adapt the query results to the user expectations. In this paper, we aim to optimally identify the analyst’ groups in a data warehouse. For that reason, we study the similarity between the selected queries in the analytical history. To enhance the quality of derived groups of analysts, we introduce a new method of semi-supervised hierarchical clustering under constraints ranking for handling cases when some constraints are more important than others and must be firstly enforced during the groupization process. Four axis for group identification are distinguished: (i) the function exerted, (ii) the granted responsibilities to accomplish goals, (iii) the source of groups identification, (iv) the dynamicity of discovered groups. Carried out experiments on real log files used for decision-maker groupization in data warehouse confirm the soundness of our approach. Our findings demonstrate that groupization improves upon personalization for several group types, mainly for function-based groupization and explicitly identified groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The data warehouse is built using the available information at http://www.bvmt.com.tn/publications/?view=cours.

  2. http://www.cs.waikato.ac.nz/~ml/weka/.

References

  1. Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2011) Mining preferences from OLAP query logs for proactive personalization. In: Proceedings of the 15th advances in databases and information systems. LNCS, pp 84–97

    Chapter  Google Scholar 

  2. Bade K, Hermkes M, Nürnberger A (2007) User oriented hierarchical information organization and retrieval. In: Proceedings of the 18th European conference on machine learning. LNCS, pp 518–526

    Google Scholar 

  3. Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: International conference on machine learning, pp 27–34

    Google Scholar 

  4. Bellatreche L, Giacometti A, Marcel P, Mouloudi H, Laurent D (2005) A personalization framework for OLAP queries. In: International workshop on data warehousing and OLAP, pp 9–18

    Chapter  Google Scholar 

  5. Ben Ahmed E, Nabli A, Gargouri F (2011) A survey of user-centric data warehouses: from personalization to recommendation. Int J Database Manag Syst 3(2):59–71

    Article  Google Scholar 

  6. Ben Ahmed E, Nabli A, Gargouri F (2012) Building MultiView analyst profile from multidimensional query logs: from consensual to conflicting preferences. Int J Comput Sci Issues 9(1):124–131

    Google Scholar 

  7. Ben Ahmed E, Nabli A, Gargouri F (2012) Performing groupization in data warehouses: which discriminating criterion to select? In: Proceedings of the 17th international conference on applications of natural language to databases (NLDB). LNCS, pp 234–240

    Google Scholar 

  8. Ben Ahmed E, Nabli A, Gargouri F (2012) \(\mathcal{SHACUN}\): semi-supervised hierarchical active clustering based on ranking constraints. In: 12th industrial conference on data mining (ICDM’12). LNCS, Germany, pp 194–208

    Google Scholar 

  9. Benitez E, Collet C, Adiba M (2001) Entrepôts de données: caractéristiques et problématique. Revue TSI 20(2):145–178

    Google Scholar 

  10. Böhm C, Plant C (2008) Hissclu: a hierarchical density-based method for semi-supervised clustering. In: Proceedings of the 11th international conference on extending database technology, New York, NY, USA, pp 440–451

    Google Scholar 

  11. Daud A, Muhammad F (2012) Group topic modeling for academic knowledge discovery. Appl Intell J 36(4):870–886

    Article  Google Scholar 

  12. Dasgupta S, Ng V (2010) Which clustering do you want? Inducing your ideal clustering with minimal feedback. J Artif Intell Res 39:581–632

    MathSciNet  MATH  Google Scholar 

  13. Davidson I, Ravi SS (2009) Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. In: Data mining and knowledge discovery, pp 257–282

    Google Scholar 

  14. Favre C, Bentayed F, Boussaid O (2007) Evolution et personnalisation des analyses dans les entrepôts de données: une approche orientée utilisateur. In: XXVème congrès informatique des organisations et systèmes d’information et de décision, Perros-Guirec, pp 308–323

    Google Scholar 

  15. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  MATH  Google Scholar 

  16. Giacometti A, Marcel P, Negre E (2008) A framework for recommending OLAP queries. In: ACM eleventh international workshop on data warehousing and OLAP, California, US, pp 307–314

    Google Scholar 

  17. Huang R, Lam W (2009) An active learning framework for semi-supervised document clustering with language modeling. Data Knowl Eng 68(1):49–67

    Article  Google Scholar 

  18. Golfarelli M, Maio D, Rizzi S (1998) Conceptual design of data warehouses from E/R schemes. In: 31st Hawaii international conference on system sciences

    Google Scholar 

  19. Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44:223–270

    Google Scholar 

  20. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  21. Jerbi H, Ravat F, Teste O, Zurfluh G (2009) Applying recommendation technology in OLAP systems. In: International conference on enterprise information systems, Milan, Italy, pp 220–233

    Chapter  Google Scholar 

  22. Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: International joint conference on artificial intelligence, pp 561–566

    Google Scholar 

  23. Kestler HA, Kraus JM, Palm G, Schwenker F (2006) On the effects of constraints in semi-supervised hierarchical clustering. In: Artificial neural networks in pattern recognition. LNCS, pp 57–66

    Chapter  Google Scholar 

  24. Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning, CA, USA, pp 307–314

    Google Scholar 

  25. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symp Math Statist Prob

    Google Scholar 

  26. Martino FDi, Loia V, Sessa S (2011) Fuzzy transforms method in prediction data analysis. Fuzzy Sets Syst 180(1):146–163

    Article  MATH  Google Scholar 

  27. Morris MR, Teevan J (2008) Understanding groups’ properties as a means of improving collaborative search systems. In: 8th workshop on collaborative information retrieval, Pittsburgh, USA

    Google Scholar 

  28. Morris MR, Teevan J, Bush S (2008) Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting. In: Proceedings of the ACM conference on computer, supported cooperative work

    Google Scholar 

  29. Pedrycz W, Senatore S (2010) Fuzzy clustering with viewpoints. IEEE Trans Fuzzy Syst 18(2):274–284

    Google Scholar 

  30. Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceeding of the 50th international conference on machine learning, Madison, Wisconsin, USA, pp 445–453

    Google Scholar 

  31. Nogueira BM, Jorge AM, Rezende SO (2012) Hierarchical confidence-based active clustering. In: The 27th symposium on applied computing, pp 535–537

    Google Scholar 

  32. Quinlan JR (1986) Induction of decision trees. Mach Learn 81–106

  33. Ravat F, Teste O (2008) Personalization and OLAP databases. In: New trends in data warehousing and data analysis. Annals of information systems, vol 3, pp 71–92

    Google Scholar 

  34. Rizzi S (2010) New frontiers in business intelligence: distribution and personalization. In: Advances in databases and information systems (ADBIS’10). LNCS, pp 23–30

    Chapter  Google Scholar 

  35. Teevan J, Morris RM, Bush S (2009) Discovering and using groups to improve personalized search. In: Proceedings of web search and data mining (WSDM), pp 15–24

    Chapter  Google Scholar 

  36. Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: AAAI-2000: workshop on artificial intelligence for web search

    Google Scholar 

  37. Tung AKH, Han J, Lakshmanan LVS, Ng RT (2001) Constraint-based clustering in large databases. In: Proceedings of the international conference on database theory (ICDT’01), London, UK

    Google Scholar 

  38. Wagstaff K, Cardie C, Rogers S, Schroedel S (2001) Constrained k-means clustering with background knowledge. In: International conference on machine learning, pp 577–584

    Google Scholar 

  39. Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eya Ben Ahmed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Ahmed, E., Nabli, A. & Gargouri, F. A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization. Appl Intell 39, 236–250 (2013). https://doi.org/10.1007/s10489-012-0407-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0407-3

Keywords

Navigation