Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

Abstract

Web pages are usually unstructured and Information Extraction from them is not trivial. In the paper we describe the process of Information Extraction on the example of researchers’ home pages. For this reason we applied SVM, CRF, and MLN models. Performed analysis concerns texts in English language only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Synat system ontology, http://wizzar.ii.pw.edu.pl/passim-ontology/:

  2. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)

    Google Scholar 

  5. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) ICML, pp. 282–289. Morgan Kaufmann (2001)

    Google Scholar 

  6. Domingos, P.: Real-World Learning with Markov Logic Networks. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 17–17. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Kim, S., Alani, H., Hall, W., Lewis, P.H., Millard, D.E., Shadbolt, N.R., Weal, M.J.: Artequakt: Generating tailored biographies with automatically annotated fragments from the web. Presented at the Semantic Authoring, Annotation and Knowledge Markup (SAAKM) 2002 Workshop at the 15th European Conference on Artificial Intelligence (ECAI 2002), pp. 1–6 (2002)

    Google Scholar 

  8. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Li, Y., Liu, B., Sarawagi, S. (eds.) KDD, pp. 990–998. ACM (2008)

    Google Scholar 

  9. Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. TKDD 5(1), 2 (2010)

    Article  Google Scholar 

  10. Ghahramani, Z., Jordan, M.I.: Factorial hidden markov models. Machine Learning 29(2-3), 245–273 (1997)

    Article  MATH  Google Scholar 

  11. Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2), 107–136 (2006)

    Article  Google Scholar 

  12. Zhu, J., Nie, Z., Wen, J.R., Zhang, B., Ma, W.Y.: Simultaneous record detection and attribute labeling in web data extraction. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) KDD, pp. 494–503. ACM (2006)

    Google Scholar 

  13. Yao, L., Tang, J., Li, J.Z.: A unified approach to researcher profiling. In: Web Intelligence, pp. 359–366. IEEE Computer Society (2007)

    Google Scholar 

  14. Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Proceedings of the ICML 2004 Workshop on Statistical Relational Learning and its Connections to Other Fields, pp. 49–54 (2004)

    Google Scholar 

  15. Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006 Proceedings of the Sixth International Conference on Data Mining, pp. 572–582. IEEE Computer Society Press (2006)

    Google Scholar 

  16. Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering (2008)

    Google Scholar 

  17. Poon, H., Domingos, P.: Joint inference in information extraction. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 913–918. AAAI Press (2007)

    Google Scholar 

  18. http://arnetminer.org/labdatasets/profiling

  19. http://keg.cs.tsinghua.edu.cn/persons/tj/software/KEG_CRF/

  20. http://mallet.cs.umass.edu/grmm/index.php

  21. http://mallet.cs.umass.edu/

  22. Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: Raedt, L.D., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)

    Google Scholar 

  23. http://research.cs.wisc.edu/hazy/tuffy/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Andruszkiewicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Andruszkiewicz, P., Nachyła, B. (2013). Automatic Extraction of Profiles from Web Pages. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35647-6_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35646-9

  • Online ISBN: 978-3-642-35647-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics