skip to main content
10.1145/1860559.1860624acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
poster

On helmholtz's principle for documents processing

Published: 21 September 2010 Publication History

Abstract

Keyword extraction is a fundamental problem in text data mining and document processing. A large number of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, a novel approach to rapid change detection in data stream.
and documents is developed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.

References

[1]
}}A. N. Srivastava and M. Sahami, Eds., Text Mining: classification, clustering, and applications, CRC Press, 2009.
[2]
}}K. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval," Journal of Documentation, vol. 28, no. 1, pp. 11--21, 1972.
[3]
}}S. Robertson, "Understanding inverse document frequency: On theoretical arguments for idf," Journal of Documentation, vol. 60, no. 5, pp. 503--520, 2004.
[4]
}}J. Kleinberg, "Bursty and hierarchical structure in streams," in Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
[5]
}}A. Desolneux, L. Moisan, and J.-M. Morel, From Gestalt Theory to Image Analysis: A Probabilistic Approach, vol. 34 of Interdisciplinary Applied Mathematics, Springer, 2008.
[6]
}}State of the Union Addresses from 1790 till 2009 http://stateoftheunion.onetwothree.net/, 2009.
[7]
}}D. Lowe, Perceptual Organization and Visual Recognition, Kluwer Academic Publishers, Amsterdam, 1985.
[8]
}}Numerical results for State of the Union Addresses from 1790 till 2009: http://www.cf.ac.uk/maths/subsites/balinskya/union.html.

Cited By

View all
  • (2024)Die Informatik und die KriseInformatik Spektrum10.1007/s00287-024-01567-xOnline publication date: 3-Jun-2024
  • (2018)İngilizce Dokümanlarda Tema ve Alt Kavramlar Tespit ModeliTopic and Sub-Topic Detection Model in English DocumentsDüzce Üniversitesi Bilim ve Teknoloji Dergisi10.29130/dubited.4201046:4(754-764)Online publication date: 1-Aug-2018
  • (2018)Helmholtz Principle on word embeddings for automatic document segmentationProceedings of the ACM Symposium on Document Engineering 201810.1145/3209280.3229103(1-4)Online publication date: 28-Aug-2018
  • Show More Cited By

Index Terms

  1. On helmholtz's principle for documents processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '10: Proceedings of the 10th ACM symposium on Document engineering
    September 2010
    298 pages
    ISBN:9781450302319
    DOI:10.1145/1860559
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. gestalt
    2. helmholtz principle
    3. keyword extraction
    4. meaningful words
    5. rapid change detection

    Qualifiers

    • Poster

    Conference

    DocEng2010
    Sponsor:
    DocEng2010: ACM Symposium on Document Engineering
    September 21 - 24, 2010
    Manchester, United Kingdom

    Acceptance Rates

    DocEng '10 Paper Acceptance Rate 13 of 42 submissions, 31%;
    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Die Informatik und die KriseInformatik Spektrum10.1007/s00287-024-01567-xOnline publication date: 3-Jun-2024
    • (2018)İngilizce Dokümanlarda Tema ve Alt Kavramlar Tespit ModeliTopic and Sub-Topic Detection Model in English DocumentsDüzce Üniversitesi Bilim ve Teknoloji Dergisi10.29130/dubited.4201046:4(754-764)Online publication date: 1-Aug-2018
    • (2018)Helmholtz Principle on word embeddings for automatic document segmentationProceedings of the ACM Symposium on Document Engineering 201810.1145/3209280.3229103(1-4)Online publication date: 28-Aug-2018
    • (2018)Frequent Itemsets as Meaningful Events in Graphs for Summarizing Biomedical Texts2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE.2018.8566651(135-140)Online publication date: Oct-2018
    • (2018)Different approaches for identifying important concepts in probabilistic biomedical text summarizationArtificial Intelligence in Medicine10.1016/j.artmed.2017.11.00484(101-116)Online publication date: Jan-2018
    • (2017)Instance labeling in semi-supervised learning with meaning values of wordsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2017.04.00362:C(152-163)Online publication date: 1-Jun-2017
    • (2017)On the Helmholtz Principle for Data MiningUncertainty Modeling10.1007/978-3-319-51052-1_2(15-35)Online publication date: 1-Feb-2017
    • (2016)A new hybrid semi-supervised algorithm for text classification with class-based semanticsKnowledge-Based Systems10.1016/j.knosys.2016.06.021108:C(50-64)Online publication date: 15-Sep-2016
    • (2016)Detecting Unusual Behaviour and Mining Unstructured DataUK Success Stories in Industrial Mathematics10.1007/978-3-319-25454-8_23(181-187)Online publication date: 5-Feb-2016
    • (2015)A novel classifier based on meaning for text classification2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA.2015.7276788(1-5)Online publication date: Sep-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media