Abstract
A common information representation task in research as well as educational and statistical practice is to comprehensively and intuitively express data in two-dimensional tables. Examples include tables in scientific papers, as well as reports and the popular press.
Data is often simple enough for users to reorder. In many other cases though, there are complex data patterns that make finding the best re-arrangement of rows and columns for optimum readability a tough problem.
We propose that row and column ordering should be regarded as a combinatorial optimization problem and solved using evolutionary computation techniques. The use of genetic algorithms has already been proposed in the literature. This paper proposes for the first time the use of estimation of distribution algorithms for table ordering. We also propose alternative ways of representing the problem in order to reduce its dimensionality. By learning a selective naive Bayes classifier, we can find out how to jointly combine the parameters of these algorithms to get good table orderings. Experimental examples in this paper are on 2D tables.
Similar content being viewed by others
References
Banfield, R., Raferty, A.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–822 (1992)
Bengoetxea, E., Larrañaga, P., Bloch, I., Perchant, A., Boeres, C.: Learning and simulation of Bayesian networks applied to inexact graph matching. Pattern Recognit. 35(12), 2867–2880 (2002)
Bertin, J.: Graphics and Graphic Information Processing. Walter de Gruyter, Berlin (1981)
Bielza, C., Fernández, J., Larrañaga, P., Bengoetxea, E.: Multidimensional statistical analysis of the parameterization of a genetic algorithm for the optimal ordering of tables. Expert Syst. Appl. 48(4)
Cabrera, J., McDougall, A.: Statistical Consulting. Springer, New York (2002)
Cesar, J., Bengoetxea, E., Bloch, I., Larrañaga, P.: Inexact graph matching for model-based recognition: Evaluation and comparison of optimization algorithms. Pattern Recognit. 38(11), 2099–2113 (2005)
Consortium, E.: Elvira: An environment for creating and using probabilistic graphical models. In: Proceedings of the 1st European Workshop on Probabilistic Graphical Models, pp. 222–230. Cuenca, Spain (2002)
de Bonet, J., Isbell, C., Viola, P.: MIMIC: Finding optima by estimating probability densities. In: Mozer, M.J.M., Petsche, T. (eds.), Advances in Neural Information Processing Systems, vol. 9, pp. 424–431. Cambridge, MA (1997)
Etxeberria, R., Larrañaga, P.: Global optimization with Bayesian networks. In: Special Session on Distributions and Evolutionary Optimization, pp. 332–339. II Symposium on Artificial Intelligence, CIMAF99, La Habana, Cuba (1999)
Friedman, J., Rafsky, L.: Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. 7, 679–717 (1979)
Friendly, M.: Corrgramms: Exploratory displays for correlation matrices. Am. Stat. 56(4), 316–324 (2002)
Garcia, C., Perez, D., Campos, V., Marti, R.: Variable neighborhood search for the linear ordering problem. Comput. Oper. Res. 33(12), 3549–3565 (2006)
Gómez, M., Bielza, C.: Node deletion sequences in influence diagrams using genetic algorithms. Stat. Comput. 14, 181–198 (2004)
Henrion, M.: Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In: Lemmer, J., Kanal, L. (eds.) Uncertainty in Artificial Intelligence. vol. 2, pp. 149–163. North-Holland, Amsterdam (1988)
Inza, I., Larrañaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by Bayesian networks based optimization. Artif. Intell. 123(1–3), 157–184 (2000)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Koschat, M.: A case for simple tables. Am. Stat. 59(1), 31–40 (2005)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Seattle, WA (1994)
Larrañaga, P., Etxeberria, R., Lozano, J.A., Peña, J.M.: Optimization in continuous domains by learning and simulation of Gaussian networks. In: Proceedings of the Workshop in Optimization by Building and Using Probabilistic Models, pp. 201–204. A Workshop within the 2000 Genetic and Evolutionary Computation Conference, GECCO 2000, Las Vegas, Nevada, USA (2000)
Larrañaga, P., Kuijpers, C.M.H., Murga, R.H., Inza, I., Dizdarevich, S.: Evolutionary algorithms for the travelling salesman problem: A review of representations and operators. Artif. Intell. Rev. 13, 120–170 (1999)
Larrañaga, P., Kuijpers, C.M.H., Murga, R.H., Yurramendi, Y.: Searching for the best ordering in the structure learning of Bayesian networks. IEEE Trans. Syst. Man Cybern. 41(4), 487–493 (1996)
Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic, Amsterdam (2001)
Liu, K., Feng, J., Young, S.: PowerMV: A software environment for molecular viewing, descriptor generation, data analysis and hit evaluation J. Chem. Inf. Model. 45(2), 515–522 (2005)
Miller, G.: The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol. Rev. 62, 81–97 (1956)
Minsky, M.: Steps toward artificial intelligence. Trans. Inst. Radio Eng. 49, 8–30 (1961)
Mühlenbein, H.: The equation for response to selection and its use for prediction. Evol. Comput. 5(3), 303–346 (1998)
Niermann, S.: Optimizing the ordering of tables with evolutionary computation. Am. Stat. 59(1), 41–46 (2005)
Niermann, S.: Letters to the editor. Am. Stat. 59, 354 (2005)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, Palo Alto (1988)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 7(2), 461–464 (1978)
Shachter, R., Kenley, C.: Gaussian influence diagrams. Manag. Sci. 35(5), 527–550 (1989)
Walker, H., Durost, W.: Statistical Tables: Their Structure and Use. Bureau of Publications, Teachers College, Columbia University, New York (1936)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bengoetxea, E., Larrañaga, P., Bielza, C. et al. Optimal row and column ordering to improve table interpretation using estimation of distribution algorithms. J Heuristics 17, 567–588 (2011). https://doi.org/10.1007/s10732-010-9145-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10732-010-9145-z