Skip to main content
Log in

A semi-structured information semantic annotation method for Web pages

  • Multi-Source Data Understanding (MSDU)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

There is a large amount of semi-structured information on Web pages. Comprehensive and accurate annotation of Web page information with uniform semantics can enhance the use value of information and provide support for Web site information integration. According to the characteristics of semi-structured information on Web pages, a semantic annotation method based on header recognition and data item classification is proposed. Firstly, a description model is constructed for the domain to be annotated. Secondly, header recognition is used to annotate data items on extracted pages. For those data items fail to be annotated by header recognition, feature vectors are constructed based on the feature sets in the domain description model and semantics of those data items are annotated by the classification results of back-propagation neural network. The proposed method is tested on 19,657 data items in the domain of agricultural product price and 8089 data items in the domain of recruitment information. The annotation precision is 97.39% and 95.67% respectively, and the annotation recall is 95.41% and 95.67%, respectively. These results show that the proposed method can annotate semi-structured information on Web pages accurately and completely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zhou P, El-Gohary N (2017) Ontology-based automated information extraction from building energy conservation codes. Autom Constr. https://doi.org/10.1016/j.autcon.2016.09.004

    Article  Google Scholar 

  2. Kim J, Vasardani M, Winter S (2017) Similarity matching for integrating spatial information extracted from place descriptions. Int J Geogr Inf Syst. https://doi.org/10.1080/13658816.2016.1188930

    Article  Google Scholar 

  3. Varlamov MI, Turdakov D (2016) A survey of methods for the extraction of information from Web resources. Program Comput Softw 42(5):279–291. https://doi.org/10.1080/13658816.2016.1188930

    Article  Google Scholar 

  4. Wei Y, Zhang G, Chang Y et al (2009) Deep web semantic annotation method based on chinese part-of-speech and domain knowledge. J Zhengzhou Univ (Nat Sci Ed) 41(01):52–55

    Google Scholar 

  5. Li G, Chin B, Jianhua O, et al (2008) Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Acm Sigmod international conference on management of data. ACM. https://doi.org/10.1145/1376616.1376706

  6. Abiteboul S (1997) Querying semi-structured data. In: International conference on database theory. Springer. https://doi.org/10.1007/3-540-62222-5_33

    Google Scholar 

  7. Guezoulia L, Essafibc H (2016) CAS-based information retrieval in semi-structured documents: CASISS model. J Innov Digit Ecosyst. https://doi.org/10.1016/j.jides.2016.11.004

    Article  Google Scholar 

  8. Al-Yahya M, Al-Shaman M, Al-Otaiby N et al (2015) Ontology-based semantic annotation of Arabic language text. Int J Mod Educ Comput Sci 7(7):53–59. https://doi.org/10.5815/ijmecs.2015.07.07

    Article  Google Scholar 

  9. Albukhitan S, Alnazer A, Helmy T (2016) Semantic annotation of Arabic web resources using semantic web services. Procedia Comput Sci 83:504–511. https://doi.org/10.1016/j.procs.2016.04.243

    Article  Google Scholar 

  10. Rajput Q, Haider S (2011) BNOSA: A Bayesian network and ontology based semantic annotation framework. Web Semant Sci Serv Agents World Wide Web 9(2):99–112. https://doi.org/10.1016/j.websem.2011.04.002

    Article  Google Scholar 

  11. Yuan L, Li Z, Chen S (2008) Online-based deep web data annotation. J Softw 19(2):237–245. https://doi.org/10.3724/sp.j.1001.2008.00237

    Article  Google Scholar 

  12. Zhu X (2012) Research on key issues of deep web semantic annotation based on ontology learning. Soochow University. https://doi.org/10.7666/d.y2121209

  13. Chen Y, Li W, Peng X et al (2009) Improved semantic annotation method for documents based on ontology. J Southeast Univ 39(6):1109–1113. https://doi.org/10.3969/j.issn.1001-0505.2009.06.005

    Article  Google Scholar 

  14. Li M, Li X (2011) Deep Web data annotation method based on result schema. J Comput Appl 31(7):1733–1736. https://doi.org/10.3724/SP.J.1087.2011.01733

    Article  Google Scholar 

  15. Li X (2011) Deep web data annotation based on result schema. Lanzhou University of Technology. https://doi.org/10.7666/d.y1885776

  16. Ma A, Gao K, Zhang X et al (2009) Semantic annotation based on CPN network for Deep Web data. J Northeastern Univ 30(6):794–797. https://doi.org/10.3321/j.issn:1005-3026.2009

    Article  Google Scholar 

  17. Dong Y, Li Q, Ding Y, Peng Z (2012) Web data semantic annotation based on constraint conditional random fields. J Comput Res Dev 49(02):361–371

    Google Scholar 

  18. Dill S, Eiron N, Gibson D et al (2004) A case for automated large-scale semantic annotation. Web Semant Sci Serv Agents World Wide Web 1(1):115–132. https://doi.org/10.1016/j.websem.2003.07.006

    Article  Google Scholar 

  19. Dugas M, Meidt A, Neuhaus P et al (2016) ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository. BMC Med Res Methodol 16(1):65. https://doi.org/10.1186/s12874-016-0164-9

    Article  Google Scholar 

  20. Vargasvera M, Motta E, Domingue J et al (2002) MnM: ontology driven semi-automatic and automatic support for semantic markup. In: International conference on knowledge engineering and knowledge management ontologies and the semantic web. Springer. https://doi.org/10.1007/3-540-45810-7_34

    Chapter  Google Scholar 

  21. Ji S (2017) Research on key technologies of multi-source information integration for joint operations. Hangzhou Dianzi University

  22. Amanqui FKM, Verborgh R, Mannens E et al (2016) Using spatiotemporal information to integrate heterogeneous biodiversity semantic data. In: International conference on web engineering. Springer. https://doi.org/10.1007/978-3-319-38791-8_41

    Google Scholar 

  23. Zhu X (2012) Research on key issues of deep web semantic annotation based on ontology learning. Suzhou University. https://doi.org/10.7666/d.y2121209

  24. Pech F, Martinez A, Estrada H et al (2017) Semantic annotation of unstructured documents using concepts similarity. Sci Program 2017(2):1–10. https://doi.org/10.1155/2017/7831897

    Article  Google Scholar 

  25. Yao X, Han J, Cheng G et al (2016) Semantic annotation of high-resolution satellite images via weakly supervised learning. IEEE Trans Geosci Remote Sens 54(6):3660–3671. https://doi.org/10.1109/TGRS.2016.2523563

    Article  Google Scholar 

  26. Azar ER (2017) Semantic annotation of videos from equipment-intensive construction operations by shot recognition and probabilistic reasoning. J Comput Civ Eng. https://doi.org/10.1061/(asce)cp.1943-5487.0000693

    Article  Google Scholar 

  27. Li G, Duan Q, Li D et al (2013) Chinese deep web query interfaces scheme matching based on AHPH. Comput Eng Des 34(1):293–297. https://doi.org/10.3969/j.issn.1000-7024.2013.01.055

    Article  MathSciNet  Google Scholar 

  28. Huang Y (2013) Research on application of BP neural network in data classification of information system. China University of Geosciences (Beijing)

  29. Kumar S, Kumar K, Pandey AK (2016) Dynamic channel allocation in mobile multimedia networks using error back propagation and hopfield neural network (EBP-HOP). Procedia Comput Sci 89:107–116. https://doi.org/10.1016/j.procs.2016.06.015

    Article  Google Scholar 

  30. Erguzel TT, Ozekes S, Tan O et al (2015) Feature selection and classification of electroencephalographic signals: an artificial neural network and genetic algorithm based approach. Clin EEG Neurosci 46(4):321. https://doi.org/10.1177/1550059414523764

    Article  Google Scholar 

  31. Mohamed B, Issam A, Mohamed A et al (2015) ECG image classification in real time based on the haar-like features and artificial neural networks. In: International conference on advanced wireless information and communication technologies, pp 32–39. https://doi.org/10.1016/j.procs.2015.12.045

    Article  Google Scholar 

  32. Nawi NM, Khan A, Chiroma H et al (2014) Weight optimization in recurrent neural networks with hybrid metaheuristic cuckoo search techniques for data classification. Math Probl Eng 2015(4):1–12. https://doi.org/10.1155/2015/868375

    Article  Google Scholar 

  33. Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/tkde.2018.2873378

    Article  Google Scholar 

  34. Zhu X, Zhang S, Li Y, Zhang J, Yang L, Fang Y (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/tkde.2018.2858782

    Article  Google Scholar 

  35. MA Anxiang (2009) A research on key technology of deep web data integration based on result pattern. Northeastern University..https://doi.org/10.7666/d.y1717244

  36. Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2018.06.029

    Article  Google Scholar 

  37. Zhu X, Zhang S, Hu R, Zhu Y, Song J (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529. https://doi.org/10.1109/TKDE.2017.2763618

    Article  Google Scholar 

  38. Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2018) Dynamic graph learning for spectral feature selection. Multimedia Tools Appl 77(22):29739–29755. https://doi.org/10.1007/s11042-017-5272-y

    Article  Google Scholar 

  39. Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. In: Advances in kernel methods-support vector learning. pp 212–223. https://doi.org/10.3390/s16091462

    Article  Google Scholar 

  40. Friedman N, Geiger D, Idt MG (1997) Bayesian network classifiers. Mach Learn 29:131–163. https://doi.org/10.1023/A:1007465528199

    Article  MATH  Google Scholar 

  41. Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11(1):63–90. https://doi.org/10.1023/A:1022631118932

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the Key Research and Development Program of Shandong Province—“Research and Demonstration on Accurate Monitoring and Control Technology of Facilities Vegetable Environment” (Grant No. 2017CXGC0201), the Transformation and Popularization Project of Agricultural Scientific and Technological Achievements in Tianjin—“Integrated Application of Core Information Technology for Early Warning, Diagnosis and Prevention of Greenhouse Vegetable Diseases” (Grant No. 201704070) and the “12th Five-Year” National Science and Technology Support Plan Project (Grant No. 2012BAD35B06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingling Duan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Wang, T., Liu, Y. et al. A semi-structured information semantic annotation method for Web pages. Neural Comput & Applic 32, 6491–6501 (2020). https://doi.org/10.1007/s00521-018-03999-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-03999-5

Keywords

Navigation