Elsevier

Journal of Systems and Software

Volume 122, December 2016, Pages 93-109
Journal of Systems and Software

Securing native XML database-driven web applications from XQuery injection vulnerabilities

https://doi.org/10.1016/j.jss.2016.08.094Get rights and content

Highlights

  • Detects XQuery injection vulnerabilities in web applications using native XML DBs.

  • Implements a prototype system “XQueryFuzzer” based on the proposed approach.

  • Demonstrates the effectiveness of the prototype on benchmark web applications.

  • Three types of XQuery injection attacks unlisted in OWASP are identified.

Abstract

Database-driven web applications today are XML-based as they handle highly diverse information and favor integration of data with other applications. Web applications have become the most popular way to deliver essential services to customers, and the increasing dependency of individuals on web applications makes them an attractive target for adversaries. The adversaries exploit vulnerabilities in the database-driven applications to craft injection attacks which include SQL, XQuery and XPath injections. A large amount of work has been done on identification of SQL injection vulnerabilities resulting in several tools available for the purpose. However, a limited work has been done so far for the identification of XML injection vulnerabilities and the existing tools only identify XML injection vulnerabilities which could lead to a specific type of attack. Hence, this work proposes a black-box fuzzing approach to detect different types of XQuery injection vulnerabilities in web applications driven by native XML databases. A prototype XQueryFuzzer is developed and tested on various vulnerable applications developed with BaseX as the native XML database. An experimental evaluation demonstrates that the prototype is effective against detection of XQuery injection vulnerabilities. Three new categories of attacks specific to XQuery, but not listed in OWASP are identified during testing.

Introduction

Extensible Markup Language (XML) is a data representation that favors integration and interoperability between heterogeneous web applications. The information exchanged between the applications in the form of XML documents can be processed efficiently when they are stored appropriately. These documents are stored in either an extended relational DBMS or a native XML database system (Chaudhri, Zicari, Rashid, 2003, Liu, Murthy, 2009). XQuery / XPath can be used as a query language for retrieving the data from XML documents.

A Native XML database (NXD) has XML document as its fundamental unit of storage, and defines a logical model based on the content in the XML document (Pavlovic-Lazetic, 2007). NXDs are employed in cases where the data involved do not fit the relational data model, but fit the XML data model. NXDs are generally preferred for applications that hold highly diverse information, involve integration of information from different set of applications, and handle rapidly evolving schemas. NXDs are also preferred for applications that work with a huge set of documents or large-sized documents (e.g., books, web pages, marketing brochures), and involve management of long-running transactions like finance, pharmaceuticals, etc. (Bourret, 2009). Use of relational databases and flat file systems for building such applications results in issues such as scalability and lack of structured queries. These issues can be overcome by using NXDs with XQuery/XPath as the query language for processing. Some of the popular NXDs are BaseX, eXistDB, and MarkLogic.

NXDs find applications in a wide variety of domains such as document management systems, healthcare systems, financial applications, business-to-business transaction records, catalog data, and corporate information portals (Staken, 2001). Real-world business applications that employ NXDs to manage their content are Elsevier Science publishers, Las Vegas Sun publishers (Bourret, 2009), the Tibetan Buddhist Resource Center (TBRC) (Siegel and Retter, 2014), etc. The Tasmanian government websites use NXD for helping users to track legislation. NXDs are used to store various other types of documents such as drug information sheets, contracts, case law, and insurance claims. Commerzbank and Hewlett Packard use NXD for integration of information from a variety of sources to handle financial and business transaction data (Bourret, 2009). Healthcare applications prefer to store electronic health records (EHR) in NXDs for efficient storage and retrieval of information from the available medical records (i.e., scan reports, prescriptions, etc.) (de la Torre, Díaz, Antón, Díez, Sainz, López, Hornero, López, 2011, Lee, Tang, Choi, 2013, Al-Hamdani, 2010).

The existing literature reveals that there is a growing demand towards usage of NXDs in web applications. Even though various XML security standards W3C, Hirsch such as XML Encryption, XML Digital Signature, and XML access-control markup language are defined for preserving confidentiality, integrity and access-control mechanisms of XML documents, when NXDs are used at the backend, any vulnerability in the source code of the application may allow an adversary to perform unwanted actions resulting in extraction/modification of information from/in the documents. As the content of highly sensitive applications like finance, healthcare, etc. are driven by NXDs, security of NXDs is vitally important to ensure the integrity, privacy and confidentiality (Baviskar and Thilagam, 2011), and to make sure that information is used appropriately (Huang, 2003).

According to the Internet Security Threat Report by Symantec Corporation (Symantec, 2014), one in eight web applications has critical unpatched vulnerabilities. A vulnerability is a coding flaw in the application and new types of attacks emerge from time to time to exploit these vulnerabilities. The security consortiums, OWASP (Top10, 2013), SANS (2011), and WASC (Gordeychik, 2010) list injection vulnerabilities as the most prevalent flaw in web applications. Injection vulnerabilities allow an attacker to compromise the security of the application, and permit them to steal confidential information or inject malicious data into the application. Injection attacks such as SQL injection and XML injection exploit the vulnerabilities for extraction/insertion of data from/into the database of the application. These attacks attempt to modify the query submitted to the database by providing malicious input to the web application server. If the web application submits SQL queries to a relational database, then the attack is referred to as an SQL injection attack. If the web application submits XQuery to an XML database, it is referred to as an XQuery injection attack. XQuery contains a superset of an XPath expression syntax. XQuery 1.0 includes XPath 2.0 as a sublanguage. According to the Payment Card Industry Data Security Standard (PCI DSS) and Common Vulnerability Scoring System (CVSS), XQuery and XPath injections are high risk threats, and hence detection of the vulnerabilities that could lead to these injection attacks is of critical importance (Gordeychik, 2010).

Even though a large body of literature exists for preventing SQL injection (Huang, Yu, Hang, Tsai, Lee, Kuo, 2004, Huang, Tsai, Lin, Huang, Lee, Kuo, 2005, Halfond, Orso, 2005, Buehrer, Weide, Sivilotti, 2005, Xie, Aiken, 2006, Su, Wassermann, 2006, Kosuga, Kernel, Hanaoka, Hishiyama, Takahama, 2007, Thomas, Williams, 2007, Wassermann, Su, 2007, Liu, Yuan, Wijesekera, Stavrou, 2009, Bisht, Madhusudan, Venkatakrishnan, 2010, Scholte, Robertson, Balzarotti, Kirda, 2012, Lee, Jeong, Yeo, Moon, 2012, Jang, Choi, 2014), only a limited number of articles exist for identifying XML injection vulnerabilities in web applications (Huang, Antunes, Laranjeiro, Vieira, Madeira, 2009, Antunes, Vieira, 2011, Asmawi, Affendey, Udzir, Mahmod, 2012). The existing works on SQL injection focus on detection/prevention of known patterns of SQL injection attacks. The proposed solutions for detecting SQL injection have their own pros and cons, and they are described as follows. Secure programming imparts overhead on developers for implementing the security guidelines during development (Bravenboer, Dolstra, Visser, 2007, XPath-Injection, Truelove, Svoboda). Signature-based approach does not prevent zero-day attacks, and suffers from false negatives when attack query matches the structure of a legitimate query (Huang, Mitropoulos, Karakoidas, Louridas, Spinellis, 2011, Antunes, Vieira, 2011). The knowledge-based approach requires new training whenever modifications are made to the source code of the application (Huang, Tsai, Lin, Huang, Lee, Kuo, 2005, Mitropoulos, Karakoidas, Spinellis, 2009, Rosa, Santin, Malucelli, 2013). The accuracy of machine learning approach is dependent on appropriate selection of the training set (Scholte, Robertson, Balzarotti, Kirda, 2012, Valeur, Mutz, Vigna, 2005, Menahem, Schclar, Rokach, Elovici, 2012, Chan, Lee, Heng, 2013). Existing approaches for addressing XML injection concentrate on detection/prevention of vulnerabilities/attacks in web services only, and cover a certain types of XML injection attacks only. Therefore, there is a demand for a system that is capable of detecting different kinds of XQuery injection vulnerabilities in native XML database-driven applications. Hence, this paper focuses on development of a prototype for the detection of XQuery injection vulnerabilities in web applications.

Taking into account the limitations of existing approaches, a prototype is developed following a query model-based approach for identifying XQuery injection vulnerabilities. The existing query model-based approaches employ static analysis for preventing SQL injection attacks and involve code instrumentation for identifying and preventing the attacks, whereas the proposed approach is a fully-automated black-box fuzzer which does not require access to the source code of the application and concentrates on identifying XQuery injection vulnerabilities. To the best of our knowledge, this is the first work that focuses on identifying XQuery injection vulnerabilities in web applications driven by native XML databases. The prototype detects different types of XQuery injection vulnerabilities as specified in OWASP (2014) guidelines. While the existing approaches detect different kinds of SQL injection attacks that are predefined, we have identified three new categories of attack vectors which are not listed in OWASP.

The major contributions of this work are as follows:

  • We propose an automated black-box approach for identifying XQuery injection vulnerabilities in native XML database-driven web applications.

  • We implement a prototype system called XQueryFuzzer based on the proposed approach.

  • We propose an attack grammar to generate different types of XQuery attack strings for identifying vulnerabilities existing in the application and prove the consistency and completeness of the grammar.

  • We identify three new categories of attacks namely, alternate encoding, evaluation function and XQuery comment injection attack that are specific to XQuery, but not listed in OWASP.1

  • We evaluate the effectiveness of the proposed approach using web applications that are customized to use native XML database for storing data.

The remainder of this paper is organized as follows. Section 2 provides a background on XQuery injection vulnerabilities. Section 3 presents the related work. The proposed approach and implementation details are described in Section 4. Section 5 discusses the new category of attacks identified with examples. Section 6 provides information on the experimental setup and discusses the results. The paper is concluded in the last section.

Section snippets

Preliminaries

This section provides a simple example of an XQuery injection attack and discusses its various types.

Related work

This section provides insight about the work done so far for identifying/preventing injection vulnerabilities/attacks that target the database of the web application. With respect to SQL injection, works such as Huang, Tsai, Lin, Huang, Lee, Kuo, 2005, Huang, Yu, Hang, Tsai, Lee, Kuo, 2004; Xie and Aiken (2006); Kosuga et al. (2007); Wassermann and Su (2007) belong to the category of vulnerability detection. Works focusing on SQL injection attack prevention include Halfond and Orso (2005);

Prototype design

The primary objective of this work is to propose an automated approach and develop a prototype system XQueryFuzzer for detecting XQuery injection vulnerabilities based on OWASP (2014) guidelines in native XML database-driven web applications. The prototype uses a black-box fuzzer for discovering vulnerabilities. Black-box fuzzing ensures that the prototype does not require the availability of source code of the application. A summary of the processing steps is explained below.

Crawling: The

New category of XQuery injection attacks

Apart from the attacks specified in OWASP (2014), new kinds of attacks are always possible and an adversary can make use of the advancements offered by the query language to manipulate the database request for launching attacks. We exploited the vulnerabilities in the test applications to launch different kinds of XQuery injection attacks that are not specified in OWASP (2014). The three new categories of XQuery injection attacks identified during our testing are discussed in the following

Implementation and evaluation

This section describes the changes made to the web application architecture for capturing the XQueries, the applications considered for testing the prototype and the evaluation results. The prototype XQueryFuzzer is built using Django web framework with Python, and uses PostgreSQL as the database. Redis, a data structure server, is used for storing the queries on a FIFO queue. The prototype is designed to work only for web applications that use BaseX as the XML database, and requires

Conclusion and further work

XML injection has become a critical vulnerability with the increased use of XML databases by web applications. XML injection vulnerabilities need to be detected and corrected so that the web application is secure against various types of XQuery injection attacks. Hence, a prototype for identification of XQuery injection vulnerabilities has been developed. The prototype employs a crawler for identifying all possible points of injection, which are filled with malicious strings generated using an

Acknowledgments

This work is supported by the Ministry of Communications and Information Technology, Government of India and is part of the R&D project entitled “Development of Tool for detection of XML based injection vulnerabilities in web applications”, 2014–2016.

Nushafreen Palsetia received her B.E. degree in Computer Engineering in 2009 from University of Mumbai, India and M.Tech. degree in Computer Science and Engineering in 2015 from National Institute of Technology Karnataka, Surathkal, India. She is currently working as a Software Development Engineer with Dell R&D in the area of Storage Engineering.

References (77)

  • N. Antunes et al.

    Effective detection of SQL/XPath injection vulnerabilities in web services

    IEEE International Conference on Services Computing

    (2009)
  • N. Antunes et al.

    Enhancing penetration testing with attack signatures and interface monitoring for the detection of injection vulnerabilities in web services

    Services Computing (SCC), 2011 IEEE International Conference on

    (2011)
  • Arachni, 2016. Arachni - web application security scanner framework....
  • A. Asmawi et al.

    Model-based system architecture for preventing XPath injection in database-centric web services environment

    7th International Computing and Convergence Technology (ICCCT)

    (2012)
  • Auger, R., 2010. XQuery injection....
  • BaseX. BaseX-the XML database....
  • S. Baviskar et al.

    Protection of web users privacy by securing browser from web privacy attacks

    Int. J. Comput. Technol. Appl.

    (2011)
  • P. Bisht et al.

    Candid: Dynamic candidate evaluations for automatic prevention of SQL injection attacks

    ACM Trans. Inf. Syst. Secur. (TISSEC)

    (2010)
  • Bourret, R., 2009. Going native: Use cases for native XML databases....
  • M. Bravenboer et al.

    Preventing injection attacks with syntax embeddings

    Proceedings of the 6th International Conference on Generative Programming and Component Engineering

    (2007)
  • G. Buehrer et al.

    Using parse tree validation to prevent SQL injection attacks

    Proceedings of the 5th International Workshop on Software Engineering and Middleware

    (2005)
  • R. Chandrashekhar et al.

    SQL injection attack mechanisms and prevention techniques

    Advanced Computing, Networking and Security

    (2012)
  • A. Chaudhri et al.

    XML Data Management: Native XML and XML Enabled DataBase Systems

    (2003)
  • convert, 2016. Conversion module....
  • Django. Django-the web framework for perfectionists with deadlines....
  • eval, 2016. Xquery module....
  • eXistDB. eXistDB - the open source native XML database....
  • Forbes, T., 2014. Exploiting XPath injection vulnerabilities with XCat....
  • G. Gopalakrishnan

    Computation Engineering-Applied Automata Theory and Logic

    (2006)
  • Gordeychik, S., 2010. Web application security statistics. The Web Application Security Consortium....
  • W. Halfond et al.

    A classification of SQL-injection attacks and countermeasures

    Proceedings of the IEEE International Symposium on Secure Software Engineering

    (2006)
  • W.G. Halfond et al.

    Amnesia: analysis and monitoring for neutralizing SQL-injection attacks

    Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering

    (2005)
  • Hirsch, F., 2002. Getting started with XML security....
  • Huang, Y., 2003. Safeguarding a native XML database system....
  • Y.-W. Huang et al.

    Securing web application code by static analysis and runtime protection

    Proceedings of the 13th international conference on World Wide Web

    (2004)
  • JSpider, 2003. Jspider....
  • A. Klein

    Blind XPath Injection

    Technical Report

    (2005)
  • Y. Kosuga et al.

    Sania: Syntactic and semantic analysis for automated testing against SQL injection

    Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007

    (2007)
  • Cited by (0)

    Nushafreen Palsetia received her B.E. degree in Computer Engineering in 2009 from University of Mumbai, India and M.Tech. degree in Computer Science and Engineering in 2015 from National Institute of Technology Karnataka, Surathkal, India. She is currently working as a Software Development Engineer with Dell R&D in the area of Storage Engineering.

    G. Deepa is currently a Ph.D. Scholar with the Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India. She received her M.E. degree in Computer Science and Engineering from Anna University, Chennai in 2012. Her research interests include Web application security, Data mining and Software testing.

    Furqan Ahmed Khan received his M.Tech. degree in Software Engineering from Galgotias University, Greater Noida, India in 2014. He is currently working as a project scientist in an R&D project supported by the Ministry of Communications and Information Technology, Government of India at National Institute of Technology Karnataka, Surathkal, India. His research interests include Web application security and Web application architecture.

    P. Santhi Thilagam is currently an Associate Professor in the Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal (NITK), India. She received her Ph.D. in Computer Science and Engineering from NITK in 2008. Her research interests include Distributed data management, Graph mining and Data security. She has published widely in various database and data mining conferences. She was the recipient of the BITES best PhD thesis award for the year 2009 in Computer Science and Engineering category. More details of her research can be obtained at http://www.cse.nitk.ac.in/faculty/p-santhi-thilagam.

    Alwyn R. Pais is currently an Assistant Professor in the Department of Computer Science and Engineering, National Institute of Technology Karnataka (NITK), Surathkal, India. He completed his B.Tech. (CSE) from Mangalore University, India, M.Tech. (CSE) from IIT Bombay, India and Ph.D. from NITK, Surathkal, India. His area of interests include Information Security, Image Processing, Computer Vision, Wireless Sensor Networks, and Internet of Things (IoT).

    View full text