Skip to main content
Log in

Categorizing relational facts from the web with fuzzy rough sets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Significant advances have been made in automatically constructing knowledge bases of relational facts derived from web corpora. These relational facts are linguistic in nature and are represented as ordered pairs of nouns (Winnipeg, Canada) belonging to a category (City_Country). One major problem is that these facts are abundant but mostly unlabeled. Hence, semi-supervised learning approaches have been successful in building knowledge bases where a small number of labeled examples are used as seed (training) instances and a large number of unlabeled instances are learnt in an iterative fashion. In this paper, we propose a novel fuzzy rough set-based semi-supervised learning algorithm (FRL) for categorizing relational facts derived from a given corpus. The proposed FRL algorithm is compared with a tolerance rough set-based learner (TPL) and the coupled pattern learner (CPL). The same ontology derived from a subset of corpus from never ending language learner system was used in all of the experiments. This paper has demonstrated that the proposed FRL outperforms both TPL and CPL in terms of precision. The paper also addresses the concept drift problem by using mutual exclusion constraints. The contributions of this paper are: (i) introduction of a formal fuzzy rough model for relations, (ii) a semi-supervised learning algorithm, (iii) experimental comparison with other machine learning algorithms: TPL and CPL, and (iv) a novel application of fuzzy rough sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Banko M, Cafarella M, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: Proceedings of IJCAI, pp 2670–2676

  2. Bharadwaj A, Ramanna S (2017) Fuzzy rough set-based unstructured text categorization. In: Mouhoub M, Langlais P (eds) Canadian AI 2017, LNAI 10233, pp 335–340

    Chapter  Google Scholar 

  3. Brin S (1999) Extracting patterns and relations from the world wide web. In: Selected papers from the international workshop on the world wide web and databases, WebDB’98, pp 172–183

    Google Scholar 

  4. Carlson A, Betteridge J, Wang RC, Hruschka Jr ER, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Proceedings of the 3rd ACM international conference on web search and data mining, pp 101–110

  5. Cock MD, Cornelis C, Kerre EE (2004) Fuzzy rough sets: beyond the obvious. In: Proceedings of the 2004 IEEE international conference on fuzzy systems, vol 1, pp 103–108

  6. Cornelis C, De Cock M, Radzikowska AM (2008) Fuzzy rough sets: from theory into practice. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 533–552

    Chapter  Google Scholar 

  7. Curran J, Murphy T, Scholz B (2007) Minimising semantic drift with mutual exclusion bootstrapping. In: Proc. of PACLING, pp 172–180

  8. De Cock M, Cornelis C (2005) Fuzzy rough set based web query expansion. In: Proceedings of rough sets and soft computing in intelligent agent and web technology, pp 9–16

  9. Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, New York, pp 601–610

  10. Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gener Syst 17(2–3):191–209

    Article  Google Scholar 

  11. Etzioni O, Fader A, Christensen J, Soderland S, Mausam (2011) Open information extraction: the second generation. In: International joint conference on artificial intelligence, pp 3–10

  12. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 1(1):1–44

    Article  Google Scholar 

  13. Ghahramani Z, Heller KA (2005) Bayesian sets. In: Advances in neural information processing systems, vol 18

  14. Ho TB, Nguyen NB (2002) Nonhierarchical document clustering based on a tolerance rough set model. Int J Intell Syst 17:199–212

    Article  Google Scholar 

  15. Jensen R, Shen Q (2008) Computational intelligence and feature selection: rough and fuzzy approaches, vol 8. Wiley, London

    Book  Google Scholar 

  16. Kawasaki S, Nguyen NB, Ho TB (2000) Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 458–463

    Chapter  Google Scholar 

  17. Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial conference on innovative data systems research (CIDR 2015)

  18. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(22):39–41

    Article  Google Scholar 

  19. Mitchell T, Cohen W, Hruschka E, Talukdar P, Betteridge J, Carlson A, Dalvi B, Gardner M, Kisiel B, Krishnamurthy J, Lao N, Mazaitis K, Mohamed T, Nakashole N, Platanios E, Ritter A, Samadi M, Settles B, Wang R, Wijaya D, Gupta A, Chen X, Saparov A, Greaves M, Welling J (2018) Never-ending learning. Commun ACM 61(5):103–115

    Article  Google Scholar 

  20. Ngo CL (2003) A tolerance rough set approach to clustering web search results. Master’s thesis, Warsaw University

  21. Nguyen H, Ho TB (2008) Rough document clustering and the internet. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 987–1003

    Chapter  Google Scholar 

  22. Nguyen S, Swieboda W, Jaskiewicz G (2012) Extended document representation for search result clustering. In: Bembenik R, Skonieczny L, Rybinski H, Niezgodka M (eds) Intelligent tools for building a scient. Info. Plat. SCI, vol 390, pp 77–95

  23. Pal SK, Skowron A (eds) (1999) Rough-fuzzy hybridization: a new trend in decision making, 1st edn. Springer, Secaucus

    MATH  Google Scholar 

  24. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  Google Scholar 

  25. Polkowski L, Skowron A, Zytkow J (1994) Tolerance based rough sets. In: Lin TY, Wildberger M (eds) Soft computing: rough sets, fuzzy logic, neural networks, uncertainty management, knowledge discovery. Simulation Councils Inc., San Diego, pp 55–58

    Google Scholar 

  26. Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–156

    Article  MathSciNet  Google Scholar 

  27. Ramanna S, Peters J, Sengoz C (2017) Application of tolerance rough sets in structured and unstructured text categorization: a survey. In: Wang G (ed) Thriving rough sets, studies in computational intelligence, vol 708. Springer, Cham, pp 119–137

    Chapter  Google Scholar 

  28. Rebele T, Suchanek F, Hoffart J, Biega J, Kuzey E, Weikum G (2016) YAGO: a multilingual knowledge base from wikipedia, wordnet, and geonames. Springer, Cham, pp 177–185

    Google Scholar 

  29. Sengoz C (2014) A granular-based approach for semi-supervised web information labeling. Master’s thesis, University of Winnipeg

  30. Sengoz C, Ramanna S (2014) A semi-supervised learning algorithm for web information extraction with tolerance rough sets. In: Active media technology 2014, Web Intelligence Conference 2014, LNCS 8610, pp 1–10

  31. Sengoz C, Ramanna S (2015) Learning relational facts from the web: a tolerance rough set approach. Pattern Recogn Lett 67(P2):130–137

    Article  Google Scholar 

  32. Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306

    Article  Google Scholar 

  33. Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundam Inf 27(2,3):245–253

    MathSciNet  MATH  Google Scholar 

  34. Srinivasan P, Ruiz ME, Kraft DH, Chen J (2001) Vocabulary mining for information retrieval: rough sets and fuzzy sets. Inf Process Manag 37(1):15–38

    Article  Google Scholar 

  35. Suchanek FM (2009) Automated construction and growth of a large ontology. PhD thesis, Natural Sciences and Technology of Saarland University

  36. Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: 16th international world wide web conference (WWW 2007). ACM Press, New York, pp 697–706

  37. Swieboda W, Meina M, Nguyen H (2013) Weight learning for document tolerance rough set model. In: RSKT 2013, LNAI 8171. Springer, Berlin, pp 386–396

    Chapter  Google Scholar 

  38. Thanh NC, Yamada K, Unehara M (2011) A similarity rough set model for document representation and document clustering. J Adv Comput Intell Intell Inf 15(2):125–133

    Article  Google Scholar 

  39. Verma S, Hruschka Jr ER (2012) Coupled Bayesian sets algorithm for semi-supervised learning and information extraction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 307–322

  40. Virginia G, Nguyen HS (2013) Lexicon-based document representation. Fundam Inf 124(1–2):27–46

    MathSciNet  MATH  Google Scholar 

  41. Virginia G, Nguyen HS (2015) A semantic text retrieval for indonesian using tolerance rough sets models. Trans Rough Sets LNCS 8988(XIX):138–224

    MathSciNet  MATH  Google Scholar 

  42. Zadeh L (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 177(19):111–127

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Special thanks to Cenker Sengoz for sharing the dataset and for discussions regarding TPL. We are very grateful to Prof. Estevam R. Hruschka Jr. for the NELL dataset and Prof. Andrzej Skowron for helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheela Ramanna.

Additional information

This research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant. Special thanks to Cenker Sengoz for sharing the data set and for discussions regarding TPL. We are very grateful to Prof. Estevam R. Hruschka Jr. for the NELL dataset.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bharadwaj, A., Ramanna, S. Categorizing relational facts from the web with fuzzy rough sets. Knowl Inf Syst 61, 1695–1713 (2019). https://doi.org/10.1007/s10115-018-1250-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1250-6

Keywords

Navigation