Abstract
This paper presents a system for extraction of concepts from unstructured Polish texts. Here concepts are understood as n-grams, whose words satisfy specific grammatical constraints. Detection and transformation of concepts to their normalized form are performed with rules defined in a language, which combines elements of colored and fuzzy Petri nets. We apply a user friendly method for specification of samples of transformation patterns that are further compiled to rules. To improve accuracy and performance, we recently introduced rule management mechanisms, which are based on two relations between rules: partial refinement and covering. The implemented methods include filtering with metarules and removal of redundant rules (i.e. these covered by other rules). We report results of experiments, which aimed at extracting specific concepts (actions) using a ruleset refactored with the developed rule management techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)
Blake, C., Pratt, W.: Better rules, fewer features: a semantic approach to selecting features from text. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 59–66. IEEE (2001)
Bloehdorn, S., Cimiano, P., Hotho, A.: Learning ontologies to improve text clustering and classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 334–341. Springer, Heidelberg (2006). http://dx.doi.org/10.1007/3-540-31314-1_40
Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons, New York (2004)
Challis, J.: Lateral thinking in information retrieval white paper. Technical report, Concept Searching (2003)
Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans. Knowl. Data Eng. 2(3), 311–319 (1990)
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. (JAIR) 24, 305–339 (2005)
Daciuk, J.: Incremental construction of finite-state automata and transducers, and their use in the natural language processing. Ph.D. thesis, Gdansk University of Technology, ETI faculty, Gabriela Narutowicza 11(12), pp. 80–233 Gdansk Poland (1998)
Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–12. ACM (2009)
Graliński, F., Jassem, K., Junczys-Dowmunt, M.: PSI-Toolkit: A natural language processing pipeline. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds.) Computational Linguistics. SCI, vol. 458, pp. 27–40. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-34399-5_2
Jasiul, B., Szpyrka, M., Sliwa, J.: Detection and modeling of cyber attacks with Petri nets. Entropy 16(12), 6602–6623 (2014). http://dx.doi.org/10.3390/e16126602
Jensen, K.: Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use, vol. 1. Springer, Berlin Heidelberg (1996)
Ligeza, A.: Logical Foundations for Rule-Based Systems. Studies in Computational Intelligence, vol. 11, 2nd edn. Springer, Heidelberg (2006)
Maedche, A., Staab, S.: Ontology learning for the semantic web. Intell. Syst. IEEE 16(2), 72–79 (2001)
Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. ManMach. Stud. 7(1), 1–13 (1975). http://linkinghub.elsevier.com/retrieve/pii/S0020737375800022
Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Softw.: Pract. Exp. 40(7), 543–566 (2010)
Miłkowski, M.: Morfologik (2015). http://morfologik.blogspot.com/. Accessed May 2015
Naber, D.: Language tool style and grammar check (2015). https://www.languagetool.org/. Accessed May 2015
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. Intell. Syst. IEEE 20(3), 48–54 (2005)
Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: Extracting concepts from large datasets. Proc. VLDB Endow. 3(1–2), 566–577 (2010)
Ross, T.: Fuzzy Logic with Engineering Applications. Wiley, New York (2009)
Smith, B.: Beyond concepts: ontology as reality representation. In: Proceedings of the Third International Conference on Formal Ontology in Information Systems (FOIS 2004), pp. 73–84 (2004)
Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)
Szwed, P.: Application of fuzzy ontological reasoning in an implementation of medical guidelines. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 342–349, June 2013
Szwed, P.: Video event recognition with fuzzy semantic petri nets. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 431–439. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-02309-0_47
Szwed, P.: Concepts extraction from unstructured Polish texts: A rule based approach. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 355–364, September 2015
Szwed, P., Komorkiewicz, M.: Object tracking and video event recognition with fuzzy semantic petri nets. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, 8–11 September 2013, pp. 167–174 (2013)
Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: Polimorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Szwed, P. (2016). Enhancing Concept Extraction from Polish Texts with Rule Management. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)