Skip to main content

A Genetic Programming Approach for Learning Semantic Information Extraction Rules from News

  • Conference paper
Web Information Systems Engineering – WISE 2014 (WISE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

Abstract

Due to the increasing amount of data provided by news sources and the user specific information needs, recently, many news personalization systems have been proposed. Often, these systems process news data automatically into information, while relying on underlying knowledge bases, containing concepts and their relations for specific domains. For this, information extraction rules are frequently used, yet they are usually manually constructed. As it is difficult to efficiently maintain a balance between precision and recall, while using a manual approach, we present a genetic programming-based approach for automatically learning semantic information extraction rules from (financial) news that extract events. Our evaluation results show that compared to information extraction rules constructed by expert users, our rules yield a 27% higher F 1-measure after the same amount of rules construction time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angeline, P.J.: Subtree Crossover: Building Block Engine or Macromutation? In: 2nd Ann. Conf. on Genetic Programming (GP 1997), pp. 9–17. Morgan Kaufmann (1997)

    Google Scholar 

  2. Black, W.J., Mc Naught, J., Vasilakopoulos, A., Zervanou, K., Theodoulidis, B., Rinaldi, F.: CAFETIERE: Conceptual Annotations for Facts, Events, Terms, Individual Entities, and RElations. Technical Report TR–U4.3.1, UMIST (2005)

    Google Scholar 

  3. Borg, C., Rosner, M., Pace, G.J.: Automatic Grammar Rule Extraction and Ranking for Definitions. In: 7th Int. Conf. of Language Resources and Evaluation (LREC 2010). European Language Resources Association (2010)

    Google Scholar 

  4. Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled Semi-Supervised Learning for Information Extraction. In: 3rd Int. Conf. on Web Search and Data Mining (WSDM 2010), pp. 101–110. ACM (2010)

    Google Scholar 

  5. Castellanos, M., Gupta, C., Wang, S., Dayal, U.: Leveraging Web Streams for Contractual Situational Awareness in Operational BI. In: Int. Workshop on Business intelligencE and the WEB (BEWEB 2010) in Conjunction with EDBT/ICDT 2010 Joint Conf., pp. 1–8. ACM (2010)

    Google Scholar 

  6. Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)

    Article  Google Scholar 

  7. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), pp. 168–175. Association for Computational Linguistics (2002)

    Google Scholar 

  8. Domingue, J., Motta, E.: PlanetOnto: From News Publishing to Integrated Knowledge Management Support. IEEE Intelligent Systems 15(3), 26–32 (2000)

    Article  Google Scholar 

  9. Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised Named-Entity Extraction From The Web: An Experimental Study. Artificial Intelligence 165(1), 91–134 (2005)

    Article  Google Scholar 

  10. Frasincar, F., Borsje, J., Hogenboom, F.: Personalizing News Services Using Semantic Web Technologies. In: E-Business Applications for Product Development and Competitive Growth: Emerging Technologies, pp. 261–289. IGI Global (2011)

    Google Scholar 

  11. Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conf. on Computational Linguistics (COLING 1992), vol. 2, pp. 539–545 (1992)

    Google Scholar 

  12. IJntema, W., Sangers, J., Hogenboom, F., Frasincar, F.: A Lexico-Semantic Pattern Language for Learning Ontology Instances from Text. J. of Web Semantics: Science, Services and Agents on the World Wide Web 15(1), 37–50 (2012)

    Article  Google Scholar 

  13. Jones, T.: Crossover Macromutation and Population-based Search. In: 6th Int. Conf. on Genetic Algorithms (ICGA 1995), pp. 73–80. Morgan Kaufmann (1995)

    Google Scholar 

  14. Maynard, D., Saggion, H., Yankova, M., Bontcheva, K., Peters, W.: Natural Language Technology for Information Integration in Business Intelligence. In: Abramowicz, W. (ed.) BIS 2007. LNCS, vol. 4439, pp. 366–380. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Sangers, J., Hogenboom, F., Frasincar, F.: Event-Driven Ontology Updating. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 44–57. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Snow, R., Jurafsky, D., Ng, A.Y.: Learning Syntactic Patterns for Automatic Hypernym Discovery. In: 18th Ann. Conf. on Neural Information Processing Systems (NIPS 2004). Advances in Neural Information Processing Systems, vol. 17, pp. 1297–1304. MIT Press (2004)

    Google Scholar 

  17. Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)

    Article  MATH  Google Scholar 

  18. Thompson, D.R., Bilbro, G.L.: Comparison of a Genetic Algorithm with a Simulated Annealing Algorithm for the Design of an ATM Network. IEEE Communications Letters 4(8), 267–269 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

IJntema, W., Hogenboom, F., Frasincar, F., Vandic, D. (2014). A Genetic Programming Approach for Learning Semantic Information Extraction Rules from News. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics