Abstract
The groupization aims to enrich the individual preferences using similar individual’s data. It may efficiently adapt the query results to the user expectations. In this paper, we aim to optimally identify the analyst’ groups in a data warehouse. For that reason, we study the similarity between the selected queries in the analytical history. To enhance the quality of derived groups of analysts, we introduce a new method of semi-supervised hierarchical clustering under constraints ranking for handling cases when some constraints are more important than others and must be firstly enforced during the groupization process. Four axis for group identification are distinguished: (i) the function exerted, (ii) the granted responsibilities to accomplish goals, (iii) the source of groups identification, (iv) the dynamicity of discovered groups. Carried out experiments on real log files used for decision-maker groupization in data warehouse confirm the soundness of our approach. Our findings demonstrate that groupization improves upon personalization for several group types, mainly for function-based groupization and explicitly identified groups.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The data warehouse is built using the available information at http://www.bvmt.com.tn/publications/?view=cours.
References
Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2011) Mining preferences from OLAP query logs for proactive personalization. In: Proceedings of the 15th advances in databases and information systems. LNCS, pp 84–97
Bade K, Hermkes M, Nürnberger A (2007) User oriented hierarchical information organization and retrieval. In: Proceedings of the 18th European conference on machine learning. LNCS, pp 518–526
Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: International conference on machine learning, pp 27–34
Bellatreche L, Giacometti A, Marcel P, Mouloudi H, Laurent D (2005) A personalization framework for OLAP queries. In: International workshop on data warehousing and OLAP, pp 9–18
Ben Ahmed E, Nabli A, Gargouri F (2011) A survey of user-centric data warehouses: from personalization to recommendation. Int J Database Manag Syst 3(2):59–71
Ben Ahmed E, Nabli A, Gargouri F (2012) Building MultiView analyst profile from multidimensional query logs: from consensual to conflicting preferences. Int J Comput Sci Issues 9(1):124–131
Ben Ahmed E, Nabli A, Gargouri F (2012) Performing groupization in data warehouses: which discriminating criterion to select? In: Proceedings of the 17th international conference on applications of natural language to databases (NLDB). LNCS, pp 234–240
Ben Ahmed E, Nabli A, Gargouri F (2012) \(\mathcal{SHACUN}\): semi-supervised hierarchical active clustering based on ranking constraints. In: 12th industrial conference on data mining (ICDM’12). LNCS, Germany, pp 194–208
Benitez E, Collet C, Adiba M (2001) Entrepôts de données: caractéristiques et problématique. Revue TSI 20(2):145–178
Böhm C, Plant C (2008) Hissclu: a hierarchical density-based method for semi-supervised clustering. In: Proceedings of the 11th international conference on extending database technology, New York, NY, USA, pp 440–451
Daud A, Muhammad F (2012) Group topic modeling for academic knowledge discovery. Appl Intell J 36(4):870–886
Dasgupta S, Ng V (2010) Which clustering do you want? Inducing your ideal clustering with minimal feedback. J Artif Intell Res 39:581–632
Davidson I, Ravi SS (2009) Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. In: Data mining and knowledge discovery, pp 257–282
Favre C, Bentayed F, Boussaid O (2007) Evolution et personnalisation des analyses dans les entrepôts de données: une approche orientée utilisateur. In: XXVème congrès informatique des organisations et systèmes d’information et de décision, Perros-Guirec, pp 308–323
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Giacometti A, Marcel P, Negre E (2008) A framework for recommending OLAP queries. In: ACM eleventh international workshop on data warehousing and OLAP, California, US, pp 307–314
Huang R, Lam W (2009) An active learning framework for semi-supervised document clustering with language modeling. Data Knowl Eng 68(1):49–67
Golfarelli M, Maio D, Rizzi S (1998) Conceptual design of data warehouses from E/R schemes. In: 31st Hawaii international conference on system sciences
Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44:223–270
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
Jerbi H, Ravat F, Teste O, Zurfluh G (2009) Applying recommendation technology in OLAP systems. In: International conference on enterprise information systems, Milan, Italy, pp 220–233
Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: International joint conference on artificial intelligence, pp 561–566
Kestler HA, Kraus JM, Palm G, Schwenker F (2006) On the effects of constraints in semi-supervised hierarchical clustering. In: Artificial neural networks in pattern recognition. LNCS, pp 57–66
Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning, CA, USA, pp 307–314
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symp Math Statist Prob
Martino FDi, Loia V, Sessa S (2011) Fuzzy transforms method in prediction data analysis. Fuzzy Sets Syst 180(1):146–163
Morris MR, Teevan J (2008) Understanding groups’ properties as a means of improving collaborative search systems. In: 8th workshop on collaborative information retrieval, Pittsburgh, USA
Morris MR, Teevan J, Bush S (2008) Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting. In: Proceedings of the ACM conference on computer, supported cooperative work
Pedrycz W, Senatore S (2010) Fuzzy clustering with viewpoints. IEEE Trans Fuzzy Syst 18(2):274–284
Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceeding of the 50th international conference on machine learning, Madison, Wisconsin, USA, pp 445–453
Nogueira BM, Jorge AM, Rezende SO (2012) Hierarchical confidence-based active clustering. In: The 27th symposium on applied computing, pp 535–537
Quinlan JR (1986) Induction of decision trees. Mach Learn 81–106
Ravat F, Teste O (2008) Personalization and OLAP databases. In: New trends in data warehousing and data analysis. Annals of information systems, vol 3, pp 71–92
Rizzi S (2010) New frontiers in business intelligence: distribution and personalization. In: Advances in databases and information systems (ADBIS’10). LNCS, pp 23–30
Teevan J, Morris RM, Bush S (2009) Discovering and using groups to improve personalized search. In: Proceedings of web search and data mining (WSDM), pp 15–24
Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: AAAI-2000: workshop on artificial intelligence for web search
Tung AKH, Han J, Lakshmanan LVS, Ng RT (2001) Constraint-based clustering in large databases. In: Proceedings of the international conference on database theory (ICDT’01), London, UK
Wagstaff K, Cardie C, Rogers S, Schroedel S (2001) Constrained k-means clustering with background knowledge. In: International conference on machine learning, pp 577–584
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ben Ahmed, E., Nabli, A. & Gargouri, F. A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization. Appl Intell 39, 236–250 (2013). https://doi.org/10.1007/s10489-012-0407-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-012-0407-3