Skip to main content

Enhancing Concept Extraction from Polish Texts with Rule Management

  • Conference paper
  • First Online:
Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery (BDAS 2015, BDAS 2016)

Abstract

This paper presents a system for extraction of concepts from unstructured Polish texts. Here concepts are understood as n-grams, whose words satisfy specific grammatical constraints. Detection and transformation of concepts to their normalized form are performed with rules defined in a language, which combines elements of colored and fuzzy Petri nets. We apply a user friendly method for specification of samples of transformation patterns that are further compiled to rules. To improve accuracy and performance, we recently introduced rule management mechanisms, which are based on two relations between rules: partial refinement and covering. The implemented methods include filtering with metarules and removal of redundant rules (i.e. these covered by other rules). We report results of experiments, which aimed at extracting specific concepts (actions) using a ruleset refactored with the developed rule management techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Blake, C., Pratt, W.: Better rules, fewer features: a semantic approach to selecting features from text. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 59–66. IEEE (2001)

    Google Scholar 

  3. Bloehdorn, S., Cimiano, P., Hotho, A.: Learning ontologies to improve text clustering and classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 334–341. Springer, Heidelberg (2006). http://dx.doi.org/10.1007/3-540-31314-1_40

    Chapter  Google Scholar 

  4. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons, New York (2004)

    Book  MATH  Google Scholar 

  5. Challis, J.: Lateral thinking in information retrieval white paper. Technical report, Concept Searching (2003)

    Google Scholar 

  6. Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans. Knowl. Data Eng. 2(3), 311–319 (1990)

    Article  Google Scholar 

  7. Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. (JAIR) 24, 305–339 (2005)

    MATH  Google Scholar 

  8. Daciuk, J.: Incremental construction of finite-state automata and transducers, and their use in the natural language processing. Ph.D. thesis, Gdansk University of Technology, ETI faculty, Gabriela Narutowicza 11(12), pp. 80–233 Gdansk Poland (1998)

    Google Scholar 

  9. Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–12. ACM (2009)

    Google Scholar 

  10. Graliński, F., Jassem, K., Junczys-Dowmunt, M.: PSI-Toolkit: A natural language processing pipeline. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds.) Computational Linguistics. SCI, vol. 458, pp. 27–40. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-34399-5_2

    Chapter  Google Scholar 

  11. Jasiul, B., Szpyrka, M., Sliwa, J.: Detection and modeling of cyber attacks with Petri nets. Entropy 16(12), 6602–6623 (2014). http://dx.doi.org/10.3390/e16126602

    Article  Google Scholar 

  12. Jensen, K.: Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use, vol. 1. Springer, Berlin Heidelberg (1996)

    Book  MATH  Google Scholar 

  13. Ligeza, A.: Logical Foundations for Rule-Based Systems. Studies in Computational Intelligence, vol. 11, 2nd edn. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  14. Maedche, A., Staab, S.: Ontology learning for the semantic web. Intell. Syst. IEEE 16(2), 72–79 (2001)

    Article  Google Scholar 

  15. Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. ManMach. Stud. 7(1), 1–13 (1975). http://linkinghub.elsevier.com/retrieve/pii/S0020737375800022

    Article  MATH  Google Scholar 

  16. Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Softw.: Pract. Exp. 40(7), 543–566 (2010)

    Google Scholar 

  17. Miłkowski, M.: Morfologik (2015). http://morfologik.blogspot.com/. Accessed May 2015

  18. Naber, D.: Language tool style and grammar check (2015). https://www.languagetool.org/. Accessed May 2015

  19. Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. Intell. Syst. IEEE 20(3), 48–54 (2005)

    Article  Google Scholar 

  20. Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: Extracting concepts from large datasets. Proc. VLDB Endow. 3(1–2), 566–577 (2010)

    Article  Google Scholar 

  21. Ross, T.: Fuzzy Logic with Engineering Applications. Wiley, New York (2009)

    Google Scholar 

  22. Smith, B.: Beyond concepts: ontology as reality representation. In: Proceedings of the Third International Conference on Formal Ontology in Information Systems (FOIS 2004), pp. 73–84 (2004)

    Google Scholar 

  23. Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)

    Article  Google Scholar 

  24. Szwed, P.: Application of fuzzy ontological reasoning in an implementation of medical guidelines. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 342–349, June 2013

    Google Scholar 

  25. Szwed, P.: Video event recognition with fuzzy semantic petri nets. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 431–439. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-02309-0_47

    Chapter  Google Scholar 

  26. Szwed, P.: Concepts extraction from unstructured Polish texts: A rule based approach. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 355–364, September 2015

    Google Scholar 

  27. Szwed, P., Komorkiewicz, M.: Object tracking and video event recognition with fuzzy semantic petri nets. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, 8–11 September 2013, pp. 167–174 (2013)

    Google Scholar 

  28. Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: Polimorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Szwed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Szwed, P. (2016). Enhancing Concept Extraction from Polish Texts with Rule Management. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics