skip to main content
10.1145/2934732.2934739acmotherconferencesArticle/Chapter ViewAbstractPublication PagesceriConference Proceedingsconference-collections
research-article

Automatic classification of web images as UML diagrams

Published: 14 June 2016 Publication History

Abstract

Our purpose in this research is to develop a methodology to automatically and efficiently classify web images as UML static diagrams, and to produce a computer tool that implements this function. The tool receives as input a bitmap file (in different formats) and tells whether the image corresponds to a diagram. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19000 web images manually classified by experts. In this work we do not consider the textual contents of the images.

References

[1]
Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Springer.
[2]
Breiman, L. 1996. Bagging predictors. Machine Learning 24(2):123--140.
[3]
Cendrowska, J. 1987. PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies 7(4):349--370.
[4]
Clark, P. and Niblett, T. 1989. The CN2 induction algorithm. Machine Learning 3(4):261--283.
[5]
Dietterich, T.G. 1997. Machine learning research: four current directions. Artificial Intelligence Magazine 18(4):97--136.
[6]
Dietterich, T.G. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40(2):139--157.
[7]
Foley, J.D. 1995. Computer Graphics: Principles and Practice. Addison-Wesley.
[8]
Frank, E. and Witten, I. H. 1998. Generating accurate rule sets without global optimization. Proceedings of the 15th International Conference on Machine Learning ICML-98 (Madison, WI, USA, July 24-27, 1998), pp. 144--151.
[9]
Hong, J., Mozetic, I. and Michalski, R.S. 1986. AQ15: Incremental learning of attribute-based descriptions from examples, the method and user's guide. Report ISG 85-5, UIUCDCS-F-86-949, Department of Computer Science, University of Illinois at Urbana-Champaign.
[10]
Ho-Quang, T., Chaudron, M.R.V., Samúelsson, I., Hjaltason, J., Karasneh, B. and Osman, H. 2014. Automatic Classification of UML Class Diagrams from Images. Proceedings of the 21st Asia-Pacific Software Engineering Conference APSEC 2014 (Jeju, Korea, December 1-4, 2014), pp. 399--406.
[11]
Hjaltason, J. and Samúelsson, I. 2014. Automatic classification of UML Class diagrams through image feature extraction and machine learning. Department of Computer Science and Engineering, University of Gothenburg, Chalmers University of Technology, Sweden, June 2014.
[12]
Karasneh, B. and Chaudron, M.R.V. 2013. Extracting UML models from images. Proceedings of the 5th International Conference on Computer Science and Information Technology CSIT--2013 (Amman, Jordan, March 27-28, 2013), pp. 169--178.
[13]
Karasneh, B. and Chaudron, M.R.V. 2013. Img2UML: A System for Extracting UML Models from Images. Proceedings of the 39th Euromicro Conference on Software Engineering and Advanced Applications SEAA 2013 (Santander, Spain, September 4-6, 2013), pp. 134--137.
[14]
Karasneh, B. and Chaudron, M.R.V. 2013. Online Img2UML Repository: An Online Repository for UML Models. Proceedings of the 3rd International Workshop on Experiences and Empirical Studies in Software Modeling EESSMOD 2013 (Miami, USA, October 1, 2013), pp. 61--66.
[15]
Kohavi, R (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence IJCAI-95 (Montreal, Quebec, Canada, August 20-25, 1995), vol. 2, pp. 1137--1143.
[16]
Major, J.A. and Mangano, J.J. 1995. Selecting among rules induced from a hurricane database. Journal of Intelligent Information Systems 4(1):39--52.
[17]
Moreno, V., Ledezma, A. and Sanchis, A. 2006. A static images based-system for traffic signs detection. In Proceedings of the Twenty-Fourth IASTED International Multi-Conference on Applied Informatics (Innsbruck, Austria, February 13-15, 2006), pp. 445--450.
[18]
Quinlan, J.R. 1986. Induction of Decision Trees (ID3 algorithm). Machine Learning 1(1):81--106.
[19]
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, CA.
[20]
Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall.
[21]
Schapire, R.E. 1990. The strength of weak learnability. Machine Learning 5(2):197--227.
[22]
Weiss, S.M. and Indurkhya, N. 1998. Predictive data mining: a practical guide. Morgan Kaufmann.
[23]
Witten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.
[24]
Wolpert, D.H. 1992. Stacked generalization. Neural Networks 5(2):241--259.

Cited By

View all
  • (2022)Xamã : Optical character recognition for multi-domain model managementInnovations in Systems and Software Engineering10.1007/s11334-022-00453-720:3(225-249)Online publication date: 27-Apr-2022
  • (2021)Classification of UML Diagrams to Support Software Engineering Education2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)10.1109/ASEW52652.2021.00030(102-107)Online publication date: Nov-2021
  • (2020)Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning TechniquesApplied Sciences10.3390/app1007240610:7(2406)Online publication date: 1-Apr-2020
  • Show More Cited By
  1. Automatic classification of web images as UML diagrams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CERI '16: Proceedings of the 4th Spanish Conference on Information Retrieval
    June 2016
    146 pages
    ISBN:9781450341417
    DOI:10.1145/2934732
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Granada: University of Granada

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Image processing
    2. Rule induction
    3. UML diagram recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CERI '16

    Acceptance Rates

    CERI '16 Paper Acceptance Rate 18 of 27 submissions, 67%;
    Overall Acceptance Rate 36 of 51 submissions, 71%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Xamã : Optical character recognition for multi-domain model managementInnovations in Systems and Software Engineering10.1007/s11334-022-00453-720:3(225-249)Online publication date: 27-Apr-2022
    • (2021)Classification of UML Diagrams to Support Software Engineering Education2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)10.1109/ASEW52652.2021.00030(102-107)Online publication date: Nov-2021
    • (2020)Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning TechniquesApplied Sciences10.3390/app1007240610:7(2406)Online publication date: 1-Apr-2020
    • (2020)Exploring the efficacy of transfer learning in mining image-based software artifactsJournal of Big Data10.1186/s40537-020-00335-47:1Online publication date: 8-Aug-2020
    • (2020)Towards the optical character recognition of DSLsProceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3426425.3426937(126-132)Online publication date: 16-Nov-2020
    • (2020)Automatic Support for Multi-Domain Model Management2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME46990.2020.00104(830-833)Online publication date: Sep-2020
    • (2020)Suitability of Optical Character Recognition (OCR) for Multi-domain Model ManagementSystems Modelling and Management10.1007/978-3-030-58167-1_11(149-162)Online publication date: 17-Oct-2020
    • (2019)Exploring the applicability of low-shot learning in mining software repositoriesJournal of Big Data10.1186/s40537-019-0198-z6:1Online publication date: 6-May-2019
    • (2019)An LSTM-Based Neural Network Architecture for Model Transformations2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS.2019.00013(294-299)Online publication date: Sep-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media