Skip to main content

Analyzer: A Framework for File Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Abstract

This paper aims to introduce Analyzer – a complete framework for performing statistical analyses of real-world documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports. This paper particularly attempts to discuss proposed analyses model, standard application usage and features, and also basic aspects of Analyzer architecture and implementation.

Supported by the Czech Science Foundation (GAČR), grant no. 201/09/P364, and the Ministry of Education of the Czech Republic, grant no. MSM0021620838.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://java.sun.com/javase/6/

  2. http://platform.netbeans.org/

  3. http://dev.mysql.com/downloads/connector/j/

  4. http://db.apache.org/derby/

  5. http://www.h2database.com/

  6. http://www.gentoo.org/

  7. XML Path Language (XPath) 1.0. W3C (1999), http://www.w3.org/TR/xpath

  8. Extensible Markup Language (XML) 1.0, 4th edn. W3C (2006), http://www.w3.org/XML/

  9. XQuery 1.0: An XML Query Language. W3C (2007), http://www.w3.org/TR/xquery/

  10. Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data. In: WWW 2008, pp. 825–834. ACM, New York (2008)

    Chapter  Google Scholar 

  11. Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema: a Practical Study. In: WebDB 2004, pp. 79–84. ACM, New York (2004)

    Chapter  Google Scholar 

  12. Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes, 2nd edn. W3C (2004), http://www.w3.org/TR/xmlschema-2/

  13. Busse, R., Carey, M., Florescu, D., Kersten, M., Manolescu, I., Schmidt, A., Waas, F.: XMark Generator 0.96, http://www.xml-benchmark.org/

  14. Choi, B.: What are Real DTDs Like? In: WebDB 2002, Madison, Wisconsin, USA, pp. 43–48. ACM, New York (2002)

    Google Scholar 

  15. Galamboš, L.: Egothor 1.0, Java Search Engine (2006), http://www.egothor.org/

  16. Klettke, M., Schneider, L., Heuer, A.: Metrics for XML Document Collections. In: XMLDM 2002 Workshops, Prague, Czech Republic, pp. 162–176 (2002)

    Google Scholar 

  17. Krátký, M., Pokorný, J., Snášel, V.: Indexing XML Data with UB-Trees. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 155–164. Springer, Heidelberg (2002)

    Google Scholar 

  18. McArdle, S.: MIME Utils 2.0, Mime Type Detection Utility for Java (2009), http://www.medsea.eu/mime-util/

  19. McDowell, A., Schmidt, C., Yue, K.: Analysis and Metrics of XML Schema. In: SERP 2004, Las Vegas, Nevada, USA, pp. 538–544. CSREA Press (2004)

    Google Scholar 

  20. Mignet, L., Barbosa, D., Veltri, P.: The XML Web: a First Study. In: WWW 2003, pp. 500–510. ACM, New York (2003)

    Chapter  Google Scholar 

  21. Mlýnková, I., Pokorný, J.: Similarity of XML Schema Fragments Based on XML Data Statistics. In: Innovations 2007, pp. 243–247. IEEE Press, Los Alamitos (2007)

    Google Scholar 

  22. Mlýnková, I., Toman, K., Pokorný, J.: Statistical Analysis of Real XML Data Collections. In: COMAD 2006, New Delhi, India, pp. 20–31. Tata McGraw-Hill Publishing Company Limited, New York (2006)

    Google Scholar 

  23. Sahuguet, A.: Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 171–183. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  24. Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures, 2nd edn. W3C (2004), http://www.w3.org/TR/xmlschema-1/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Svoboda, M., Stárka, J., Sochna, J., Schejbal, J., Mlýnková, I. (2010). Analyzer: A Framework for File Analysis. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics