Fast algorithms for online construction of web tag clouds

https://doi.org/10.1016/j.engappai.2017.06.023Get rights and content

Highlights

  • Construction of tag clouds for web pages is considered.

  • Tag clouds must be built on volatile and computationally restricted browser platform.

  • Solution involves modeling aesthetics, discrete optimization, software engineering.

  • Tabu search and collections of greedy algorithms are recommended.

Abstract

In this paper tag cloud construction for web exposition is studied. Construction of a tag cloud must simultaneously solve at least three interdisciplinary engineering problems: modeling and controlling graphics aesthetics, solving discrete two-dimensional layout optimization problem, and all these must be done on computationally constrained browser platform. We analyze the design choices in the earlier tag cloud studies and provide a taxonomy of algorithmic approaches to tag cloud building. Then, the design requirements for tag clouds on websites are defined. We propose to quantify tag cloud aesthetics by use of a novel objective function based on the rules of typography. Tag cloud construction is formalized as a combinatorial optimization problem with an irregular objective function. A set of algorithms is proposed and evaluated on a collection of tag sets from popular web pages. The methods that meet constraints of the browser platform are chosen.

Introduction

Running web systems and developing web applications are new branches of industry offering a host of engineering and research challenges. These cover, e.g., classical performance tuning problems (Marszałkowski et al., 2011), novel e-business applications (Lopez-Loces et al., 2016), website layout optimization for good structure and advertisement fit (Marszałkowski and Drozdowski, 2013), content analysis and fast delivery Kudelka et al. (2014), Marszałkowski et al. (2016), techniques for content interpretation and exploitation (Spyrou and Mylonas, 2016).

In this paper we analyze the problem of constructing visually acceptable tag clouds for web pages. Basically, tags are phrases representing textually some set of objects. Tags can be, e.g., words and phrases summarizing content of a web page or a photograph, labels for best-sellers, keywords in news, social media or scientific publications. Each tag has certain importance which is expressed in relation to other tags. Typically, tag importance is given as a number. A tag cloud is a graphical depicting of the tags projected onto a plane. A key requirement is that tags with high importance should be prominently visible in the tag cloud. Commonly, important tags are simply bigger. An example tag cloud from Flickr website is shown in Fig. 1. There are various forms of tags and tag clouds. For instance, there are hashtags, data clouds, text clouds. A hashtag does not have to be a proper word or a phrase in some language. It can be any sequence of characters. Hashtags originated from tags and tagging popularized by Twitter. Hashtag was even chosen a “Word of the year 2012” by American Dialect Society (2013). Tag clouds can be built from hashtags as well. Data clouds or text clouds are specialized forms of tag clouds visualizing numerical data or word frequencies. For the rest of the paper we will use generic terms of a tag and tag clouds.

The first step in tag cloud creation is preparation of tags themselves: phrase selection, weighting, clustering Fujimura et al. (2008), Lohmann et al. (2015), Nesi et al. (2016), Spyrou and Mylonas (2016), etc. Methods of digesting the text and extracting the tags rest in text mining area and are beyond the scope of this paper. Here it is assumed that the set of tags is given and their rendering in two dimensions is studied. Tag clouds have been considered in the scientific circles for more than 10 years. In the early stage tag clouds could have been managed with direct researcher attention for better applicability and visual results. However, it is not possible for a mass application of web engineering and the user-dedicated content. Therefore, tag construction must be delegated to automatic tools tailored to the capability of browser clients. Tag clouds are used by web designers all over the world, often with poor results. The methods of tag cloud construction are often coined ad hoc, resulting in bad aesthetics or low usability. In this paper we analyze this problem and propose new methods to solve it for good aesthetics. A solution of the tag cloud construction problem must address at least three interdisciplinary challenges:

  • Modeling and algorithmically controlling tag cloud aesthetics,

  • constructing the layout of tags by solving a 2D packing problem,

  • advanced software engineering which meets soft real-time performance requirements and resource constraints of the browser platform.

The main contributions of the paper are as follows:

  • algorithms for building tag clouds are surveyed and their taxonomy is proposed;

  • usability studies are analyzed to identify areas of practical tag cloud application;

  • the requirements and restrictions of tag clouds for browser exposition are analyzed resulting in formulation of the tag cloud construction problem as a combinatorial optimization problem with a dedicated irregular objective function;

  • the invention of the objective function as a pivotal element linking the optimization process with tag cloud aesthetics;

  • using rules of typography to control aesthetics;

  • algorithms solving the problem are implemented, evaluated and selected for the target browser platform;

  • proof of the concept that the role of a web designer may be automated by advanced optimization algorithms generating web pages online, on the client sides.

Overall, we solve the tag cloud construction for web exposition problem starting from the analysis of the context up to providing practical algorithms.

Further organization of this text is the following. In Section 2 taxonomy of tag clouds in general is proposed. Approaches, algorithms, design options and the choices taken in the past are surveyed. Requirements for tag clouds in web usage are determined in Section 3. Section 4 provides a formulation of the tag cloud construction problem as a combinatorial problem with constraints and irregular objective functions. Algorithms solving the problem are introduced in Section 5. These algorithms are tested for quality of solutions and conformance with the performance constraints of the browser platform. Results of the computational experiments are outlined in Section 6. The notations used in the paper are summarized in Table 2.

Section snippets

Related work survey

Although tag clouds seem to be a modern invention, their origins can be traced back at least to 1976 (Milgram and Jodelet, 1976). Early tag clouds history is outlined in Viégas and Wattenberg (2008). Around 2003 they gained a wide usage over the Internet. In 2006–2009 they became bloated, overused by many web-designers without considering whether they fit the purpose. Consequently, they were criticized and their application declined. Currently, a new generation of tag cloud approaches is

Problem analysis

In this section we discuss requirements for tag clouds to be used over the Internet, a corresponding 2D packing problem is identified, and finally, the status quo in web browsers as a platform for rendering tag clouds is studied.

Problem formulation

In this section we formulate tag cloud construction problem (TCCP). What is novel in our approach, is resorting to the rules of typography used to typeset readable and aesthetic text. We address here the issue of modeling tag cloud visual quality in rules of maths. Unfortunately, mathematical models for canons of beauty are rare. Still, we will model tag clouds construction as a discrete optimization problem with a particular objective function.

We assume that set of tags T={t1,,tn} is given.

Algorithms for tag cloud optimization

In this section algorithms constructing tag clouds are introduced. All these algorithms must meet the requirement of very light computational demands imposed by the browser platform. Before proceeding to the details of the algorithms let us explain their position in the tag preparation workflow (see Fig. 5). The tags and their weights are obtained by periodically analyzing the documents, or other data sources for the considered field of application. A web designer composes a web page, and in

Evaluation of the algorithms

In this section we assess performance of the algorithms in solving TCCP. Test instances are introduced first. Then, the desirable objective function is elected. Tuning of the tabu method is outlined next. Finally, we compare performance of the heuristics in quality and runtime. The goal is to verify practical usability of the algorithms on the very demanding browser platform. Unless stated to be otherwise, all tests were performed on a PC with Intel [email protected] GHz CPU, 32GB of RAM, Windows

Conclusions and future work

In this paper we analyzed the tag cloud construction problem for the websites. While other tag cloud building problems met some interest in the past, the website application had only a few ad hoc approaches. We formulated the tag cloud construction as a 2D strip packing problem with irregular objective function using rules of typography to control aesthetics of the generated clouds. The requirements and restrictions of the field of application force building tag clouds on the client side which

Acknowledgment

The research was partially supported by the FNR (Luxembourg) and NCBiR (Poland), through IShOP project, INTER/POLLUX/13/6466384.

References (42)

  • BatemanS. et al.

    Seeing things in the clouds: the effect of visual features on tag cloud selections

  • BłazewiczJ. et al.

    Two-dimensional cutting problem: Basic complexity results and algorithms for irregular shapes

    Found. Control Eng.

    (1989)
  • BringhurstR.

    The Elements of Typographic Style

    (1996)
  • BurchM. et al.

    Prefix tag clouds

  • BurkeE.K. et al.

    Evolving bin packing heuristics with genetic programming

  • Cheng, C., Angustia, T., Ching, M.H., Cristobal, C.A., Gabuyo, G.M., 2014. Synonym based tag cloud generation. In: DLSU...
  • ChiM.T. et al.

    Morphable word clouds for time-varying text data visualization

    IEEE Trans. Vis. Comput. Graphics

    (2015)
  • CoffmanE.G. et al.

    Performance bounds for level-oriented two-dimensional packing algorithms

    SIAM J. Comput.

    (1980)
  • CuiW. et al.

    Context preserving dynamic word cloud visualization

  • EckersleyR. et al.

    Glossary of Typesetting Terms

    (2008)
  • Fenn, J., Raskino, M., 2008. Mastering the hype cycle. How to choose the right innovation at the right time....
  • Cited by (7)

    • Framework of algorithm portfolios for strip packing problem

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      Typical greedy algorithms sort the items according to some property and place them one by one using some placement rule. Three important classes of greedy methods are shelf (Baker & Schwarz, 1983; Burke et al., 2004; Marszałkowski et al., 2017; Ntene & van Vuuren, 2009), bottom-left (Chazelle, 1983; Chen et al., 2019) and skyline (Burke et al., 2004; Wei et al., 2017) algorithms. Applying metaheuristics is a state-of-the-art approach in solving NP-hard problems like ours.

    • A Framework to Guide the Instruction of Industrial Programmable Logic Controllers in Undergraduate Engineering Education

      2020, Education for Chemical Engineers
      Citation Excerpt :

      Of the 84 respondents 55 offered “positive or negative comments regarding the programmable logic activity.” A wordcloud (Fig. 8) graphically emphasized the frequency and connectivity of words within the results (Marszakowski et al., 2017), wherein “actually helped” featured prominently. Additionally, select alumni comments highlighted connection between the programmable logic activity and industrial practice:

    • Utility and usability of intrinsic tag maps

      2020, Cartography and Geographic Information Science
    View all citing articles on Scopus
    View full text