Abstract
The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we develop a dual mining framework to explore tagging behavior. This framework is centered around two opposing measures, similarity and diversity, applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags or diverse users tagging similar items with diverse tags. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design four sets of efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.
Similar content being viewed by others
Notes
Since the user and item dimensions share the same characteristics in the dual mining framework, we present here only the user dimension for simplicity.
Two tagging action groups are neighbors if they are directly connected in the lattice.
References
Amer-Yahia, S., Huang, J., Yu, C.: Building community-centric information exploration applications on social content sites. In: SIGMOD, pp. 947–952 (2009)
Amer-Yahia, S., Huang, J., Yu, C.: Jelly: a language for building community-centric information exploration applications. In: ICDE, pp. 1588–1594 (2009)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: SIGMOD, pp. 455–466 (2004)
Bundschus, M., Yu, S., Tresp, V., Rettinger, A., Dejori, M., Kriegel, H.-P.: Hierarchical bayesian models for collaborative tagging systems. In: ICDM, pp. 728–733 (2009)
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)
Chen, Y., Harper, F.M., Konstan, J.A., Li, S.X.: Social comparisons and contributions to online communities: a field experiment on movielens. In: Computational Social Systems and the Internet (2007)
Das, M., Amer-Yahia, S., Das, G., Yu, C.: Mri: meaningful interpretations of collaborative ratings. In: PVLDB, pp. 1063–1074 (2011)
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: CIKM, pp. 669–678 (2008)
Erkut, E., Baptie, T., Hohenbalken, B.V.: The discrete p-maxian location problem. Comput. OR 17(1), 51–61 (1990)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)
Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR, abs/cs/0508082 (2005)
Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)
Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econ. Soc. 34 (1966)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
Guo, Y., Joshi, J.B.D.: Topic-based personalized recommendation for collaborative tagging system. In: HT, pp. 61–66 (2010)
Gwizdka, J.: Of kings, traffic signs and flowers: exploring navigation of tagged documents. In: HT, pp. 167–172 (2010)
Handler, G., Mirchandani, P.: Location on networks: theory and algorithms. MIT Press series in signal processing, optimization, and, control (1979)
Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations. In: CSCW, pp. 241–250 (2000)
Heymann, P., Paepcke, A., Garcia-Molina, H.: Tagging human knowledge. In: WSDM, pp. 51–60 (2010)
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: SIGIR, pp. 531–538 (2008)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Johnson, D.S.: The np-completeness column: an ongoing guide. J. Algorithms 8(3), 438–448 (1987)
Körner, C., Kern, R., Grahsl, H.-P., Strohmaier, M.: Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. In: HT, pp. 157–166 (2010)
Liang, H., Xu, Y., Li, Y., Nayak, R., Tao, X.: Connecting users and items with weighted tags for personalized item recommendations. In: HT, pp. 51–60 (2010)
Liu, K., Fang, B., Zhang, W.: Speak the same language with your friends: augmenting tag recommenders with social relations. In: HT, pp. 45–50 (2010)
Lu, C., Hu, X., Chen, X., Park, J.-R., He, T., Li, Z.: The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp. 683–692. ACM, New York (2010)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), 331–340 (2009)
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)
Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, AAAI (2010)
Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: WSDM, pp. 54–63 (2009)
Ramakrishnan, R., Chen, B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007)
Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: Heuristics and special cases (extended abstract). In: WADS, pp. 355–366 (1991)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional olap data. In: VLDB, pp. 531–540 (2001)
Sen, S., Lam, S.K., Rashid, A.M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F.M., Riedl, J.: Tagging, communities, vocabulary, evolution. In: CSCW, pp. 181–190 (2006)
Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)
Vaughan, D.E., Jacobson, S.H., Kaul, H.: Analyzing the performance of simultaneous generalized hill climbing algorithms. Comput. Opt. Appl. 37, 103–119 (2007)
Venetis, P., Koutrika, G., Garcia-Molina, H.: On the selection of tags for tag clouds. In: WSDM, pp. 835–844 (2011)
Wu, P., Sismanis, Y., Reinwald, B.: Towards keyword-driven analytical processing. In: SIGMOD Conference, pp. 617–628 (2007)
Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 368–378. ACM, New York (2009)
Zhou, D., Bian, J., Zheng, S., Zha, H., Giles, C.L.: Exploring social annotations for information retrieval. In: WWW, pp. 715–724 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Das, M., Thirumuruganathan, S., Amer-Yahia, S. et al. An expressive framework and efficient algorithms for the analysis of collaborative tagging. The VLDB Journal 23, 201–226 (2014). https://doi.org/10.1007/s00778-013-0341-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-013-0341-y