Skip to main content
Log in

A novel JSON based regular expression language for pattern matching in the internet of things

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The Internet of Things work by constantly sensing the physical properties in the vicinity of the user such as ambient light, sounds, motion and temperature. These sensors produce huge volumes of data that has to be efficiently sifted for relevant events required triggering certain actions. In addition, filtering has to be performed to ensure that privacy-sensitive confidential data is not leaked. Efficient and expressive pattern matching is thus a key enabling technology for the full realization of ambient and humanized computing. The bulk of research in this area has focused on the use of specialized hardware and reducing of the memory footprint. Unfortunately, there has been limited work if any on optimizing the core elements of pattern matching- the regular expression language and the compilation process that is responsible for converting patterns into internal data structures. The importance of writing good REs so that on compilation they do not lead to unrealizable data structures is relatively less understood. In the proposed research, we empirically compare different RE processing engines and practically demonstrate that the compilation phase is highly memory intensive and time-consuming as compared to the matching phase -and hence is worth exploring for new techniques and optimizations. As a second important contribution, we propose a novel technique for defining regular expressions by utilizing JavaScript Object Notation. Our evaluation with carefully created patterns shows that the performance of the proposed technique is at par with competing approaches. It is also less ambiguous, extensible, more expressive and much appropriate for defining large and complex patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(6):333–340

    Article  MathSciNet  MATH  Google Scholar 

  • Antonello R, Fernandes S, Sadok D, Kelner J, Szabó G (2015) Design and optimizations for efficient regular expression matching in DPI systems. Comput Commun 61:103–120

    Article  Google Scholar 

  • Becchi M (2008) Regex-processor. Available from: http://regex.wustl.edu

  • Becchi M, Cadambi S (2007) Memory-efficient regular expression search using state merging. In: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE (pp 1064–1072). IEEE

  • Becchi M, Crowley P (2007) A hybrid finite automaton for practical deep packet inspection. In: Proceedings of the 2007 ACM CoNEXT conference (p 1). ACM, New York

  • Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT Conference (p. 25). ACM, New York

  • Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772

    Article  MATH  Google Scholar 

  • Chang YK, Li YS, Chen YT (2015) A memory efficient DFA using compression and pattern segmentation. Procedia Comput Sci 56:292–299

    Article  Google Scholar 

  • Chen P, Desmet L, Huygens C (2014) A study on advanced persistent threats. In: IFIP International Conference on Communications and Multimedia Security (pp 63–72). Springer, Berlin, Heidelberg

  • Coit CJ, Staniford S, McAlerney J (2001) Towards faster string matching for intrusion detection or exceeding the speed of snort. In: DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (vol 1, pp 367–373). IEEE

  • Commentz-Walter B (1979) A string matching algorithm fast on the average. In: International Colloquium on Automata, Languages, and Programming. Springer, Berlin, pp 118–132

    Chapter  Google Scholar 

  • Cormode G, Thottan M (eds) (2010) Algorithms for next generation networks. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Eriksson M, Hallberg V (2011) Comparison between JSON and YAML for data serialization. Bachelor’s thesis

  • Ficara D, Giordano S, Procissi G, Vitucci F, Antichi G, Di Pietro A (2008) An improved DFA for fast regular expression matching. ACM SIGCOMM Comput Commun Rev 38(5):29–40

    Article  Google Scholar 

  • Ficara D, Di Pietro A, Giordano S, Procissi G, Vitucci F, Antichi G (2011) Differential encoding of DFAs for fast regular expression matching. IEEE/ACM Trans Netw 19(3):683–694

    Article  Google Scholar 

  • Fisk M, Varghese G (2002) Applying fast string matching to intrusion detection. LOS ALAMOS NATIONAL LAB NM

  • Flex (1987) Text Processing Tool.  Available from: http://flex.sourceforge.net/manual/

  • Fu Z, Wang K, Cai L, Li J (2014) Intelligent grouping algorithms for regular expressions in deep inspection. In: Computer Communication and Networks (ICCCN), 2014 23rd International Conference on (pp. 1–8). IEEE

  • GSON (2008) Google Gson (Open Source Java library). Available from: https://sites.google.com/site/gson/streaming

  • HOCON (2011a) Human-optimized config object notation. Available from: https://github.com/typesafehub/config/blob/master/HOCON.md

  • HOCON (2011b) Human-optimized config object notation. Available from: https://github.com/lightbend/config/blob/master/HOCON.md

  • JSON (2002) JavaScript Object Notation. Available from: http://www.json.org/

  • JsonCpp (2007) C++ library to manipulate JSON values. Available from: https://github.com/open-source-parsers/jsoncpp

  • jsonlite (2013) JSON parser/generator. Available from: https://github.com/amamchur/jsonlite

  • Kong S, Smith R, Estan C (2008) Efficient signature matching with multiple alphabet compression tables. In: Proceedings of the 4th international conference on Security and privacy in communication netowrks (p 1). ACM, New York

  • Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J (2006). Algorithms to accelerate multiple regular expressions matching for deep packet inspection. ACM SIGCOMM Computer Communication Review (vol 36, 4, pp 339–350). ACM, New York

    Google Scholar 

  • Kusswurm D (2014) Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX. Apress, New York

    Google Scholar 

  • Liu T, Yang Y, Liu Y, Sun Y, Guo L (2011) An efficient regular expressions compression algorithm from a new perspective. In: INFOCOM, 2011 Proceedings IEEE (pp 2129–2137). IEEE

  • Liu T, Liu AX, Shi J, Sun Y, Guo L (2014) Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J Sel Areas Commun 32(10):1797–1809

    Article  Google Scholar 

  • Luchaup D, Smith R, Estan C, Jha S (2009) Multi-byte regular expression matching with speculation. In: International Workshop on Recent Advances in Intrusion Detection (pp 284–303). Springer, Berlin, Heidelberg

  • MIT DARPA (1999) Mitdarpa intrusion detection data sets. Available from: http://www.ll.mit.edu/mission/communications/ist/corpora/

  • MongoDB (2009) Open-source cross-platform document-oriented database program. Available from: https://docs.mongodb.com/manual/reference/operator/query/regex/

  • Moreira N, Reis R (eds) (2012) Implementation and application of automata: 17th International Conference, CIAA 2012, Porto, Portugal, July 17–20, 2012. Proceedings (vol 7381). Springer, New York

  • Najam M, Younis U, Rasool RU (2014) Multi-byte Pattern Matching Using Stride-K DFA for High Speed Deep Packet Inspection. In: Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on (pp 547–553). IEEE

  • Najam M, Younis U, & ur Rasool R (2015) Speculative parallel pattern matching using stride-k DFA for deep packet inspection. J Netw Comput Appl 54:78–87

    Article  Google Scholar 

  • Nebel ME (2006) Fast string matching by using probabilities: on an optimal mismatch variant of Horspool’s algorithm. Theor Comput Sci 359(1–3):329–343

    Article  MathSciNet  MATH  Google Scholar 

  • Nourian M, Wang X, Yu X, Feng WC, Becchi M (2017) Demystifying automata processing: GPUs, FPGAs or Micron’s AP? In: Proceedings of the International Conference on Supercomputing (p. 1). ACM, New York

  • OpenDDL (2013) Open Data Description Language. Available from: http://openddl.org/

  • OpenDDL (2017) Open Data Description Language. Available from: http://openddl.org/

  • Patel J, Liu AX, Torng E (2014) Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans Netw 22(6):1701–1714

    Article  Google Scholar 

  • Peng M, Gao W, Wang H, Zhang Y, Huang J, Xie Q et al (2017) Parallelization of massive textstream compression based on compressed sensing. ACM Trans Inf Syst (TOIS) 36(2):17

    Article  Google Scholar 

  • Perf (2009) Linux profiler. Available from: https://perf.wiki.kernel.org/

  • Pintool (2012) A Dynamic Binary Instrumentation Tool. Available from: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

  • RapidJSON (2011) JSON parser and generator. Available from: http://rapidjson.org/

  • RE2 (2010) Regular Expression Engine. Available from: https://github.com/google/re2

  • Rexgrep (2012) Graphical Interface to the UNIX grep command. Available from: https://github.com/mstoilov/rpatk

  • SNORT (1998) Network Intrusion Detection and Prevention System. Available from: https://www.snort.org/

  • Sustik MA, Moore JS (2007) String searching over small alphabets. Computer Science Department. University of Texas at Austin, Austin

    Google Scholar 

  • Tsai HJ, Chen CC, Peng YC, Tsao YH, Chiang YN, Zhao WC et al (2017) A Flexible wildcard-pattern matching accelerator via simultaneous discrete finite automata. IEEE Trans Very Large Scale Integr VLSI Syst 25(12):3302–3316

    Article  Google Scholar 

  • Tuck N, Sherwood T, Calder B, Varghese G (2004) Deterministic memory-efficient string matching algorithms for intrusion detection. In: INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (vol 4, pp 2628–2639). IEEE

  • Vasiliadis G, Polychronakis M, Antonatos S, Markatos EP, Ioannidis S (2009) Regular expression matching on graphics hardware for intrusion detection. In: International Workshop on Recent Advances in Intrusion Detection (pp. 265–283). Springer, Berlin, Heidelberg

  • Wang K, Li J (2013) Towards fast regular expression matching in practice. ACM SIGCOMM Computer Communication Review (vol 43, 4, pp 531–532). ACM, New York

    Google Scholar 

  • Wang K, Fu Z, Hu X, Li J (2014) Practical regular expression matching free of scalability and performance barriers. Comput Commun 54:97–119

    Article  Google Scholar 

  • Wang H, Zhang Z, Taleb T (2018) Special issue on security and privacy of IoT. World Wide Web 21(1):1–6

    Article  Google Scholar 

  • Wu S, Manber U (1994) A fast algorithm for multi-pattern searching

  • Xu Y, Jiang J, Wei R, Song Y, Chao HJ (2014) TFA: a tunable finite automaton for pattern matching in network intrusion detection systems. IEEE J Sel Areas Commun 32(10):1810–1821

    Article  Google Scholar 

  • YAJL (2007) JSON parsing library. Available from: https://github.com/lloyd/yajl

  • YAML (2001) YAML Ain’t Markup Language. Available from: http://yaml.org/

  • Yu F, Chen Z, Diao Y, Lakshman TV, Katz RH (2006) Fast and memory-efficient regular expression matching for deep packet inspection. In: Architecture for Networking and Communications systems, 2006. ANCS 2006. ACM/IEEE Symposium on (pp. 93–102). IEEE

  • Yu X, Lin B, Becchi M (2014) Revisiting state blow-up: Automatically building augmented-fa while preserving functional equivalence. IEEE J Sel Areas Commun 32(10):1822–1833

    Article  Google Scholar 

  • Zhang Y, Shen Y, Wang H, Yong J, Jiang X (2016) On secure wireless communications for IoT under eavesdropper collusion. IEEE Trans Autom Sci Eng 13(3):1281–1293

    Article  Google Scholar 

  • Jackson (2008). Available from: https://github.com/FasterXML/jackson

Download references

Acknowledgements

This research has been supported by DSR, King Faisal University, Saudi Arabia. We are grateful to Ms. Michela Becchi from Department of Electrical and Computer Engineering at The University of Missouri, Columbia for providing us with Regular Expression Processor. We are also thankful to Prof. Andrew A. Chien from Large Scale Systems Group of The University of Chicago for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raihan ur Rasool.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been supported by DSR (Grant:160088), King Faisal University, Saudi Arabia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rasool, R.u., Najam, M., Ahmad, H.F. et al. A novel JSON based regular expression language for pattern matching in the internet of things. J Ambient Intell Human Comput 10, 1463–1481 (2019). https://doi.org/10.1007/s12652-018-0869-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0869-1

Keywords

Navigation