Fast algorithms for online construction of web tag clouds
Introduction
Running web systems and developing web applications are new branches of industry offering a host of engineering and research challenges. These cover, e.g., classical performance tuning problems (Marszałkowski et al., 2011), novel e-business applications (Lopez-Loces et al., 2016), website layout optimization for good structure and advertisement fit (Marszałkowski and Drozdowski, 2013), content analysis and fast delivery Kudelka et al. (2014), Marszałkowski et al. (2016), techniques for content interpretation and exploitation (Spyrou and Mylonas, 2016).
In this paper we analyze the problem of constructing visually acceptable tag clouds for web pages. Basically, tags are phrases representing textually some set of objects. Tags can be, e.g., words and phrases summarizing content of a web page or a photograph, labels for best-sellers, keywords in news, social media or scientific publications. Each tag has certain importance which is expressed in relation to other tags. Typically, tag importance is given as a number. A tag cloud is a graphical depicting of the tags projected onto a plane. A key requirement is that tags with high importance should be prominently visible in the tag cloud. Commonly, important tags are simply bigger. An example tag cloud from Flickr website is shown in Fig. 1. There are various forms of tags and tag clouds. For instance, there are hashtags, data clouds, text clouds. A hashtag does not have to be a proper word or a phrase in some language. It can be any sequence of characters. Hashtags originated from tags and tagging popularized by Twitter. Hashtag was even chosen a “Word of the year 2012” by American Dialect Society (2013). Tag clouds can be built from hashtags as well. Data clouds or text clouds are specialized forms of tag clouds visualizing numerical data or word frequencies. For the rest of the paper we will use generic terms of a tag and tag clouds.
The first step in tag cloud creation is preparation of tags themselves: phrase selection, weighting, clustering Fujimura et al. (2008), Lohmann et al. (2015), Nesi et al. (2016), Spyrou and Mylonas (2016), etc. Methods of digesting the text and extracting the tags rest in text mining area and are beyond the scope of this paper. Here it is assumed that the set of tags is given and their rendering in two dimensions is studied. Tag clouds have been considered in the scientific circles for more than 10 years. In the early stage tag clouds could have been managed with direct researcher attention for better applicability and visual results. However, it is not possible for a mass application of web engineering and the user-dedicated content. Therefore, tag construction must be delegated to automatic tools tailored to the capability of browser clients. Tag clouds are used by web designers all over the world, often with poor results. The methods of tag cloud construction are often coined ad hoc, resulting in bad aesthetics or low usability. In this paper we analyze this problem and propose new methods to solve it for good aesthetics. A solution of the tag cloud construction problem must address at least three interdisciplinary challenges:
-
Modeling and algorithmically controlling tag cloud aesthetics,
-
constructing the layout of tags by solving a 2D packing problem,
-
advanced software engineering which meets soft real-time performance requirements and resource constraints of the browser platform.
The main contributions of the paper are as follows:
-
algorithms for building tag clouds are surveyed and their taxonomy is proposed;
-
usability studies are analyzed to identify areas of practical tag cloud application;
-
the requirements and restrictions of tag clouds for browser exposition are analyzed resulting in formulation of the tag cloud construction problem as a combinatorial optimization problem with a dedicated irregular objective function;
-
the invention of the objective function as a pivotal element linking the optimization process with tag cloud aesthetics;
-
using rules of typography to control aesthetics;
-
algorithms solving the problem are implemented, evaluated and selected for the target browser platform;
-
proof of the concept that the role of a web designer may be automated by advanced optimization algorithms generating web pages online, on the client sides.
Overall, we solve the tag cloud construction for web exposition problem starting from the analysis of the context up to providing practical algorithms.
Further organization of this text is the following. In Section 2 taxonomy of tag clouds in general is proposed. Approaches, algorithms, design options and the choices taken in the past are surveyed. Requirements for tag clouds in web usage are determined in Section 3. Section 4 provides a formulation of the tag cloud construction problem as a combinatorial problem with constraints and irregular objective functions. Algorithms solving the problem are introduced in Section 5. These algorithms are tested for quality of solutions and conformance with the performance constraints of the browser platform. Results of the computational experiments are outlined in Section 6. The notations used in the paper are summarized in Table 2.
Section snippets
Related work survey
Although tag clouds seem to be a modern invention, their origins can be traced back at least to 1976 (Milgram and Jodelet, 1976). Early tag clouds history is outlined in Viégas and Wattenberg (2008). Around 2003 they gained a wide usage over the Internet. In 2006–2009 they became bloated, overused by many web-designers without considering whether they fit the purpose. Consequently, they were criticized and their application declined. Currently, a new generation of tag cloud approaches is
Problem analysis
In this section we discuss requirements for tag clouds to be used over the Internet, a corresponding 2D packing problem is identified, and finally, the status quo in web browsers as a platform for rendering tag clouds is studied.
Problem formulation
In this section we formulate tag cloud construction problem (TCCP). What is novel in our approach, is resorting to the rules of typography used to typeset readable and aesthetic text. We address here the issue of modeling tag cloud visual quality in rules of maths. Unfortunately, mathematical models for canons of beauty are rare. Still, we will model tag clouds construction as a discrete optimization problem with a particular objective function.
We assume that set of tags is given.
Algorithms for tag cloud optimization
In this section algorithms constructing tag clouds are introduced. All these algorithms must meet the requirement of very light computational demands imposed by the browser platform. Before proceeding to the details of the algorithms let us explain their position in the tag preparation workflow (see Fig. 5). The tags and their weights are obtained by periodically analyzing the documents, or other data sources for the considered field of application. A web designer composes a web page, and in
Evaluation of the algorithms
In this section we assess performance of the algorithms in solving TCCP. Test instances are introduced first. Then, the desirable objective function is elected. Tuning of the tabu method is outlined next. Finally, we compare performance of the heuristics in quality and runtime. The goal is to verify practical usability of the algorithms on the very demanding browser platform. Unless stated to be otherwise, all tests were performed on a PC with Intel [email protected] GHz CPU, 32GB of RAM, Windows
Conclusions and future work
In this paper we analyzed the tag cloud construction problem for the websites. While other tag cloud building problems met some interest in the past, the website application had only a few ad hoc approaches. We formulated the tag cloud construction as a 2D strip packing problem with irregular objective function using rules of typography to control aesthetics of the generated clouds. The requirements and restrictions of the field of application force building tag clouds on the client side which
Acknowledgment
The research was partially supported by the FNR (Luxembourg) and NCBiR (Poland), through IShOP project, INTER/POLLUX/13/6466384.
References (42)
- et al.
A novel approach for comparing web sites by using MicroGenres
Eng. Appl. Artif. Intell.
(2014) - et al.
Backtracking based iterated tabu search for equitable coloring
Eng. Appl. Artif. Intell.
(2015) - et al.
Two-dimensional packing problems: A survey
European J. Oper. Res.
(2002) - et al.
Optimization of column width in website layout for advertisement fit
European J. Oper. Res.
(2013) - et al.
Geographical localization of web domains and organization addresses recognition by employing natural language processing, pattern matching and clustering
Eng. Appl. Artif. Intell.
(2016) - et al.
A survey and comparison of guillotine heuristics for the 2D oriented offline strip packing problem
Discrete Optim.
(2009) - et al.
A survey on Flickr multimedia research challenges
Eng. Appl. Artif. Intell.
(2016) - et al.
Learning and navigating in hypertext: Navigational support by hierarchical menu or tag cloud?
Comput. Hum. Behav.
(2015) - American Dialect Society, 2013. Hashtag is the 2012 Word of the Year. http://www.americandialect.org/hashtag-2012...
- et al.
Shelf algorithms for two-dimensional packing problems
SIAM J. Comput.
(1983)
Seeing things in the clouds: the effect of visual features on tag cloud selections
Two-dimensional cutting problem: Basic complexity results and algorithms for irregular shapes
Found. Control Eng.
The Elements of Typographic Style
Prefix tag clouds
Evolving bin packing heuristics with genetic programming
Morphable word clouds for time-varying text data visualization
IEEE Trans. Vis. Comput. Graphics
Performance bounds for level-oriented two-dimensional packing algorithms
SIAM J. Comput.
Context preserving dynamic word cloud visualization
Glossary of Typesetting Terms
Cited by (7)
Framework of algorithm portfolios for strip packing problem
2022, Computers and Industrial EngineeringCitation Excerpt :Typical greedy algorithms sort the items according to some property and place them one by one using some placement rule. Three important classes of greedy methods are shelf (Baker & Schwarz, 1983; Burke et al., 2004; Marszałkowski et al., 2017; Ntene & van Vuuren, 2009), bottom-left (Chazelle, 1983; Chen et al., 2019) and skyline (Burke et al., 2004; Wei et al., 2017) algorithms. Applying metaheuristics is a state-of-the-art approach in solving NP-hard problems like ours.
Time–energy trade-offs in processing divisible loads on heterogeneous hierarchical memory systems
2020, Journal of Parallel and Distributed ComputingA Framework to Guide the Instruction of Industrial Programmable Logic Controllers in Undergraduate Engineering Education
2020, Education for Chemical EngineersCitation Excerpt :Of the 84 respondents 55 offered “positive or negative comments regarding the programmable logic activity.” A wordcloud (Fig. 8) graphically emphasized the frequency and connectivity of words within the results (Marszakowski et al., 2017), wherein “actually helped” featured prominently. Additionally, select alumni comments highlighted connection between the programmable logic activity and industrial practice:
Text visualization for geological hazard documents via text mining and natural language processing
2022, Earth Science InformaticsEXPERT RECOMMENDATION THROUGH TAG RELATIONSHIP IN COMMUNITY QUESTION ANSWERING
2022, Malaysian Journal of Computer ScienceUtility and usability of intrinsic tag maps
2020, Cartography and Geographic Information Science