Analyzer: A Framework for File Analysis

Svoboda, Martin; Stárka, Jakub; Sochna, Jan; Schejbal, Jiří; Mlýnková, Irena

doi:10.1007/978-3-642-14589-6_23

Analyzer: A Framework for File Analysis

Martin Svoboda²²,
Jakub Stárka²²,
Jan Sochna²²,
Jiří Schejbal²² &
…
Irena Mlýnková²²

Conference paper

656 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Abstract

This paper aims to introduce Analyzer – a complete framework for performing statistical analyses of real-world documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports. This paper particularly attempts to discuss proposed analyses model, standard application usage and features, and also basic aspects of Analyzer architecture and implementation.

Supported by the Czech Science Foundation (GAČR), grant no. 201/09/P364, and the Ministry of Education of the Czech Republic, grant no. MSM0021620838.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://java.sun.com/javase/6/
http://platform.netbeans.org/
http://dev.mysql.com/downloads/connector/j/
http://db.apache.org/derby/
http://www.h2database.com/
http://www.gentoo.org/
XML Path Language (XPath) 1.0. W3C (1999), http://www.w3.org/TR/xpath
Extensible Markup Language (XML) 1.0, 4th edn. W3C (2006), http://www.w3.org/XML/
XQuery 1.0: An XML Query Language. W3C (2007), http://www.w3.org/TR/xquery/
Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data. In: WWW 2008, pp. 825–834. ACM, New York (2008)
Chapter Google Scholar
Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema: a Practical Study. In: WebDB 2004, pp. 79–84. ACM, New York (2004)
Chapter Google Scholar
Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes, 2nd edn. W3C (2004), http://www.w3.org/TR/xmlschema-2/
Busse, R., Carey, M., Florescu, D., Kersten, M., Manolescu, I., Schmidt, A., Waas, F.: XMark Generator 0.96, http://www.xml-benchmark.org/
Choi, B.: What are Real DTDs Like? In: WebDB 2002, Madison, Wisconsin, USA, pp. 43–48. ACM, New York (2002)
Google Scholar
Galamboš, L.: Egothor 1.0, Java Search Engine (2006), http://www.egothor.org/
Klettke, M., Schneider, L., Heuer, A.: Metrics for XML Document Collections. In: XMLDM 2002 Workshops, Prague, Czech Republic, pp. 162–176 (2002)
Google Scholar
Krátký, M., Pokorný, J., Snášel, V.: Indexing XML Data with UB-Trees. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 155–164. Springer, Heidelberg (2002)
Google Scholar
McArdle, S.: MIME Utils 2.0, Mime Type Detection Utility for Java (2009), http://www.medsea.eu/mime-util/
McDowell, A., Schmidt, C., Yue, K.: Analysis and Metrics of XML Schema. In: SERP 2004, Las Vegas, Nevada, USA, pp. 538–544. CSREA Press (2004)
Google Scholar
Mignet, L., Barbosa, D., Veltri, P.: The XML Web: a First Study. In: WWW 2003, pp. 500–510. ACM, New York (2003)
Chapter Google Scholar
Mlýnková, I., Pokorný, J.: Similarity of XML Schema Fragments Based on XML Data Statistics. In: Innovations 2007, pp. 243–247. IEEE Press, Los Alamitos (2007)
Google Scholar
Mlýnková, I., Toman, K., Pokorný, J.: Statistical Analysis of Real XML Data Collections. In: COMAD 2006, New Delhi, India, pp. 20–31. Tata McGraw-Hill Publishing Company Limited, New York (2006)
Google Scholar
Sahuguet, A.: Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 171–183. Springer, Heidelberg (2001)
Chapter Google Scholar
Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures, 2nd edn. W3C (2004), http://www.w3.org/TR/xmlschema-1/

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Charles University in Prague, Czech Republic
Martin Svoboda, Jakub Stárka, Jan Sochna, Jiří Schejbal & Irena Mlýnková

Authors

Martin Svoboda
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Stárka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Sochna
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Schejbal
View author publications
You can also search for this author in PubMed Google Scholar
Irena Mlýnková
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo, 606-8501, Kyoto, Japan
Masatoshi Yoshikawa
Information School, Renmin University of China, 100872, Beijing, China
Xiaofeng Meng
Graduate School of Engineering, University of Hyogo, 2167 Shosha, Himeji, 671-2280, Hyogo, Japan
Takayuki Yumoto
Graduate School of Informatics, Kyoto University, Yoshidahonmachi, Sakyo, 606-8501, Kyoto, Japan
Qiang Ma
Institute of HCI and Media Integration, Tsinghua University, 100084, Bejing, China
Lifeng Sun
Department of Information Science, Ochanomizu University, 2-1-1, Otsuka, Bunkyo-ku, 112-8610, Tokyo, Japan
Chiemi Watanabe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Svoboda, M., Stárka, J., Sochna, J., Schejbal, J., Mlýnková, I. (2010). Analyzer: A Framework for File Analysis. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-14589-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14588-9
Online ISBN: 978-3-642-14589-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics