Skip to main content

Learning DMEs from Positive and Negative Examples

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Abstract

The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a (valid) schema. Therefore, it is essential to devise algorithms to infer schemas from XML documents. The fundamental task in XML schema inference is learning regular expressions. In this paper we consider unordered XML, where the relative order among siblings is ignored, and focus on the subclass called disjunctive multiplicity expressions (DMEs) which are proposed for unordered XML. Previous work in this direction lacks inference algorithms that support for learning DME from both positive and negative examples. In this paper, we provide an algorithm to learn DMEs from both positive and negative examples based on genetic algorithms.

Work supported by the National Natural Science Foundation of China under Grant Nos. 61872339 and 61472405.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Unordered concatenation can be viewed as a weaker form of interleaving.

References

  1. Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. Theory Comput. Syst. 57(4), 927–966 (2015)

    Article  MathSciNet  Google Scholar 

  2. Benedikt, M., Fan, W., Geerts, F.: Xpath satisfiability in the presence of DTDs. J. ACM 55(2), 8:1–8:79 (2008)

    Article  MathSciNet  Google Scholar 

  3. Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML Data. In: Proceedings of the 33rd International Conference on VLDB, pp. 998–1009 (2007)

    Google Scholar 

  4. Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: Proceedings of the 16th International Conference on WebDB, pp. 13–18 (2013)

    Google Scholar 

  5. Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. J. VLDB 15(3), 263–289 (2006)

    Article  Google Scholar 

  6. Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: Proceedings of the 14th International Conference on DBPL (2013)

    Google Scholar 

  7. Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)

    Article  MathSciNet  Google Scholar 

  8. Grijzenhout, S., Marx, M.: The quality of the XML Web. J. Web Sem 19, 59–68 (2013)

    Article  Google Scholar 

  9. Li, Y., Mou, X., Chen, H.: Learning concise relax NG schemas supporting interleaving from XML documents. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 303–317. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_26

    Chapter  Google Scholar 

  10. Li, Y., Zhang, X., Xu, H., Mou, X., Chen, H.: Learning restricted regular expressions with interleaving from XML data. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 586–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_43

    Chapter  Google Scholar 

  11. Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. In: Proceedings of the 23rd International Conference on PODS, pp. 23–34 (2004)

    Google Scholar 

  12. Peng, F., Chen, H.: Discovering restricted regular expressions with interleaving. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 104–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25255-1_9

    Chapter  Google Scholar 

  13. Staworko, S., Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeaux, E.G., Solbrig, H.R.: Complexity and expressiveness of shex for RDF. In: Proceedings of the 18th International Conference on ICDT, pp. 195–211 (2015)

    Google Scholar 

  14. Zhang, X., Li, Y., Cui, F., Dong, C., Chen, H.: Inference of a concise regular expression considering interleaving from XML documents. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 389–401. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_31

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiming Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Dong, C., Chu, X., Chen, H. (2019). Learning DMEs from Positive and Negative Examples. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18590-9_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18589-3

  • Online ISBN: 978-3-030-18590-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics