Skip to main content

RDFa Based Annotation of Web Pages through Keyphrases Extraction

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2011 (OTM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7045))

Abstract

The goal of the Semantic Web is the creation of a linked mesh of information that is easily processable by machines, on a global scale. The process of upgrading current Web pages to machine-understandable units of information relies on semantic annotation. A typical process of semantic annotation includes three main tasks: (i) the identification of an ontology describing the domain of interest, (ii) the discovering of the concepts of the ontology in the target Web pages, and (iii) the annotations of each page with links to Web resources describing the content of the page. The goal is to support an ontology-aware agent in the interpretation of target documents. In this paper, we present an approach to the automatic annotation of Web pages. Exploiting a data reverse engineering technique, our approach is capable of: recognizing entities in Web pages, extracting keyphrases from them, and annotating such pages with RDFa tags that map discovered entities to Linked data repositories matching the extracted keyphrases. We have implemented the approach and evaluated its accuracy of on real Web sites for e-commerce.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)

    Article  Google Scholar 

  2. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM, pp. 233–242 (2007)

    Google Scholar 

  3. Milne, D.N., Witten, I.H.: Learning to link with wikipedia. In: CIKM, pp. 509–518 (2008)

    Google Scholar 

  4. Gardner, J.J., Xiong, L.: Automatic link detection: a sequence labeling approach. In: CIKM, pp. 1701–1704 (2009)

    Google Scholar 

  5. Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-Automatic Creation of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 379–391. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R.V., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A case for automated large-scale semantic annotation. J. Web Sem. 1(1), 115–132 (2003)

    Article  Google Scholar 

  8. Adida, B., Birbeck M.: RDFa Primer: Bridging the Human and Data Webs (2008), http://www.w3.org/TR/xhtml-rdfa-primer/

  9. Laender, A., Ribeiro-Neto, B., Silva, A.D., Teixeira, J.S.: A brief survey of web data extraction tools. ACM SIGMOD Record 31(2), 84–93 (2002)

    Article  Google Scholar 

  10. De Virgilio, R., Torlone, R.: A Structured Approach to Data Reverse Engineering of Web Applications. In: Gaedke, M., Grossniklaus, M., Díaz, O. (eds.) ICWE 2009. LNCS, vol. 5648, pp. 91–105. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: ACM DL, pp. 254–255 (1999)

    Google Scholar 

  12. De Virgilio, R., Cappellari, P., Miscione, M.: Cluster-Based Exploration for Effective Keyword Search Over Semantic Datasets. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 205–218. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Kahan, J., Koivunen, M.R., Prud’hommeaux, E., Swick, R.R.: Annotea: an open rdf infrastructure for shared web annotations. Computer Networks 39(5), 589–608 (2002)

    Article  Google Scholar 

  14. Ciravegna, F., Dingli, A., Wilks, Y., Petrelli, D.: Amilcare: adaptive information extraction for document annotation. In: SIGIR, pp. 367–368 (2002)

    Google Scholar 

  15. Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology based access to distributed and semi-structured information. In: Proceedings of the IFIP TC2/WG2.6 Eighth Working Conference on Database Semantics- Semantic Issues in Multimedia Systems, vol. DS-8, pp. 351–369 (1998)

    Google Scholar 

  16. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2(1), 49–79 (2004)

    Article  Google Scholar 

  17. De Virgilio, R., Torlone, R.: A Meta-Model Approach to the Management of Hypertexts in Web Information Systems. In: Song, I.-Y., Piattini, M., Chen, Y.-P.P., Hartmann, S., Grandi, F., Trujillo, J., Opdahl, A.L., Ferri, F., Grifoni, P., Caschera, M.C., Rolland, C., Woo, C., Salinesi, C., Zimányi, E., Claramunt, C., Frasincar, F., Houben, G.-J., Thiran, P. (eds.) ER Workshops 2008. LNCS, vol. 5232, pp. 416–425. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  18. Allison, L., Wallace, C.S., Yee, C.N.: When is a string like a string? AI & Maths (1990)

    Google Scholar 

  19. Tomberg, V., Laanpere, M.: RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments. In: Second International Workshop on Mashup Personal Learning Environments, M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, RWTH Aachen, pp. 102–109 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Virgilio, R. (2011). RDFa Based Annotation of Web Pages through Keyphrases Extraction. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2011. OTM 2011. Lecture Notes in Computer Science, vol 7045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25106-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25106-1_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25105-4

  • Online ISBN: 978-3-642-25106-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics