Abstract
The Internet of Things work by constantly sensing the physical properties in the vicinity of the user such as ambient light, sounds, motion and temperature. These sensors produce huge volumes of data that has to be efficiently sifted for relevant events required triggering certain actions. In addition, filtering has to be performed to ensure that privacy-sensitive confidential data is not leaked. Efficient and expressive pattern matching is thus a key enabling technology for the full realization of ambient and humanized computing. The bulk of research in this area has focused on the use of specialized hardware and reducing of the memory footprint. Unfortunately, there has been limited work if any on optimizing the core elements of pattern matching- the regular expression language and the compilation process that is responsible for converting patterns into internal data structures. The importance of writing good REs so that on compilation they do not lead to unrealizable data structures is relatively less understood. In the proposed research, we empirically compare different RE processing engines and practically demonstrate that the compilation phase is highly memory intensive and time-consuming as compared to the matching phase -and hence is worth exploring for new techniques and optimizations. As a second important contribution, we propose a novel technique for defining regular expressions by utilizing JavaScript Object Notation. Our evaluation with carefully created patterns shows that the performance of the proposed technique is at par with competing approaches. It is also less ambiguous, extensible, more expressive and much appropriate for defining large and complex patterns.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(6):333–340
Antonello R, Fernandes S, Sadok D, Kelner J, Szabó G (2015) Design and optimizations for efficient regular expression matching in DPI systems. Comput Commun 61:103–120
Becchi M (2008) Regex-processor. Available from: http://regex.wustl.edu
Becchi M, Cadambi S (2007) Memory-efficient regular expression search using state merging. In: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE (pp 1064–1072). IEEE
Becchi M, Crowley P (2007) A hybrid finite automaton for practical deep packet inspection. In: Proceedings of the 2007 ACM CoNEXT conference (p 1). ACM, New York
Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT Conference (p. 25). ACM, New York
Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772
Chang YK, Li YS, Chen YT (2015) A memory efficient DFA using compression and pattern segmentation. Procedia Comput Sci 56:292–299
Chen P, Desmet L, Huygens C (2014) A study on advanced persistent threats. In: IFIP International Conference on Communications and Multimedia Security (pp 63–72). Springer, Berlin, Heidelberg
Coit CJ, Staniford S, McAlerney J (2001) Towards faster string matching for intrusion detection or exceeding the speed of snort. In: DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (vol 1, pp 367–373). IEEE
Commentz-Walter B (1979) A string matching algorithm fast on the average. In: International Colloquium on Automata, Languages, and Programming. Springer, Berlin, pp 118–132
Cormode G, Thottan M (eds) (2010) Algorithms for next generation networks. Springer Science & Business Media, New York
Eriksson M, Hallberg V (2011) Comparison between JSON and YAML for data serialization. Bachelor’s thesis
Ficara D, Giordano S, Procissi G, Vitucci F, Antichi G, Di Pietro A (2008) An improved DFA for fast regular expression matching. ACM SIGCOMM Comput Commun Rev 38(5):29–40
Ficara D, Di Pietro A, Giordano S, Procissi G, Vitucci F, Antichi G (2011) Differential encoding of DFAs for fast regular expression matching. IEEE/ACM Trans Netw 19(3):683–694
Fisk M, Varghese G (2002) Applying fast string matching to intrusion detection. LOS ALAMOS NATIONAL LAB NM
Flex (1987) Text Processing Tool. Available from: http://flex.sourceforge.net/manual/
Fu Z, Wang K, Cai L, Li J (2014) Intelligent grouping algorithms for regular expressions in deep inspection. In: Computer Communication and Networks (ICCCN), 2014 23rd International Conference on (pp. 1–8). IEEE
GSON (2008) Google Gson (Open Source Java library). Available from: https://sites.google.com/site/gson/streaming
HOCON (2011a) Human-optimized config object notation. Available from: https://github.com/typesafehub/config/blob/master/HOCON.md
HOCON (2011b) Human-optimized config object notation. Available from: https://github.com/lightbend/config/blob/master/HOCON.md
JSON (2002) JavaScript Object Notation. Available from: http://www.json.org/
JsonCpp (2007) C++ library to manipulate JSON values. Available from: https://github.com/open-source-parsers/jsoncpp
jsonlite (2013) JSON parser/generator. Available from: https://github.com/amamchur/jsonlite
Kong S, Smith R, Estan C (2008) Efficient signature matching with multiple alphabet compression tables. In: Proceedings of the 4th international conference on Security and privacy in communication netowrks (p 1). ACM, New York
Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J (2006). Algorithms to accelerate multiple regular expressions matching for deep packet inspection. ACM SIGCOMM Computer Communication Review (vol 36, 4, pp 339–350). ACM, New York
Kusswurm D (2014) Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX. Apress, New York
Liu T, Yang Y, Liu Y, Sun Y, Guo L (2011) An efficient regular expressions compression algorithm from a new perspective. In: INFOCOM, 2011 Proceedings IEEE (pp 2129–2137). IEEE
Liu T, Liu AX, Shi J, Sun Y, Guo L (2014) Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J Sel Areas Commun 32(10):1797–1809
Luchaup D, Smith R, Estan C, Jha S (2009) Multi-byte regular expression matching with speculation. In: International Workshop on Recent Advances in Intrusion Detection (pp 284–303). Springer, Berlin, Heidelberg
MIT DARPA (1999) Mitdarpa intrusion detection data sets. Available from: http://www.ll.mit.edu/mission/communications/ist/corpora/
MongoDB (2009) Open-source cross-platform document-oriented database program. Available from: https://docs.mongodb.com/manual/reference/operator/query/regex/
Moreira N, Reis R (eds) (2012) Implementation and application of automata: 17th International Conference, CIAA 2012, Porto, Portugal, July 17–20, 2012. Proceedings (vol 7381). Springer, New York
Najam M, Younis U, Rasool RU (2014) Multi-byte Pattern Matching Using Stride-K DFA for High Speed Deep Packet Inspection. In: Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on (pp 547–553). IEEE
Najam M, Younis U, & ur Rasool R (2015) Speculative parallel pattern matching using stride-k DFA for deep packet inspection. J Netw Comput Appl 54:78–87
Nebel ME (2006) Fast string matching by using probabilities: on an optimal mismatch variant of Horspool’s algorithm. Theor Comput Sci 359(1–3):329–343
Nourian M, Wang X, Yu X, Feng WC, Becchi M (2017) Demystifying automata processing: GPUs, FPGAs or Micron’s AP? In: Proceedings of the International Conference on Supercomputing (p. 1). ACM, New York
OpenDDL (2013) Open Data Description Language. Available from: http://openddl.org/
OpenDDL (2017) Open Data Description Language. Available from: http://openddl.org/
Patel J, Liu AX, Torng E (2014) Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans Netw 22(6):1701–1714
Peng M, Gao W, Wang H, Zhang Y, Huang J, Xie Q et al (2017) Parallelization of massive textstream compression based on compressed sensing. ACM Trans Inf Syst (TOIS) 36(2):17
Perf (2009) Linux profiler. Available from: https://perf.wiki.kernel.org/
Pintool (2012) A Dynamic Binary Instrumentation Tool. Available from: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool
RapidJSON (2011) JSON parser and generator. Available from: http://rapidjson.org/
RE2 (2010) Regular Expression Engine. Available from: https://github.com/google/re2
Rexgrep (2012) Graphical Interface to the UNIX grep command. Available from: https://github.com/mstoilov/rpatk
SNORT (1998) Network Intrusion Detection and Prevention System. Available from: https://www.snort.org/
Sustik MA, Moore JS (2007) String searching over small alphabets. Computer Science Department. University of Texas at Austin, Austin
Tsai HJ, Chen CC, Peng YC, Tsao YH, Chiang YN, Zhao WC et al (2017) A Flexible wildcard-pattern matching accelerator via simultaneous discrete finite automata. IEEE Trans Very Large Scale Integr VLSI Syst 25(12):3302–3316
Tuck N, Sherwood T, Calder B, Varghese G (2004) Deterministic memory-efficient string matching algorithms for intrusion detection. In: INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (vol 4, pp 2628–2639). IEEE
Vasiliadis G, Polychronakis M, Antonatos S, Markatos EP, Ioannidis S (2009) Regular expression matching on graphics hardware for intrusion detection. In: International Workshop on Recent Advances in Intrusion Detection (pp. 265–283). Springer, Berlin, Heidelberg
Wang K, Li J (2013) Towards fast regular expression matching in practice. ACM SIGCOMM Computer Communication Review (vol 43, 4, pp 531–532). ACM, New York
Wang K, Fu Z, Hu X, Li J (2014) Practical regular expression matching free of scalability and performance barriers. Comput Commun 54:97–119
Wang H, Zhang Z, Taleb T (2018) Special issue on security and privacy of IoT. World Wide Web 21(1):1–6
Wu S, Manber U (1994) A fast algorithm for multi-pattern searching
Xu Y, Jiang J, Wei R, Song Y, Chao HJ (2014) TFA: a tunable finite automaton for pattern matching in network intrusion detection systems. IEEE J Sel Areas Commun 32(10):1810–1821
YAJL (2007) JSON parsing library. Available from: https://github.com/lloyd/yajl
YAML (2001) YAML Ain’t Markup Language. Available from: http://yaml.org/
Yu F, Chen Z, Diao Y, Lakshman TV, Katz RH (2006) Fast and memory-efficient regular expression matching for deep packet inspection. In: Architecture for Networking and Communications systems, 2006. ANCS 2006. ACM/IEEE Symposium on (pp. 93–102). IEEE
Yu X, Lin B, Becchi M (2014) Revisiting state blow-up: Automatically building augmented-fa while preserving functional equivalence. IEEE J Sel Areas Commun 32(10):1822–1833
Zhang Y, Shen Y, Wang H, Yong J, Jiang X (2016) On secure wireless communications for IoT under eavesdropper collusion. IEEE Trans Autom Sci Eng 13(3):1281–1293
Jackson (2008). Available from: https://github.com/FasterXML/jackson
Acknowledgements
This research has been supported by DSR, King Faisal University, Saudi Arabia. We are grateful to Ms. Michela Becchi from Department of Electrical and Computer Engineering at The University of Missouri, Columbia for providing us with Regular Expression Processor. We are also thankful to Prof. Andrew A. Chien from Large Scale Systems Group of The University of Chicago for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been supported by DSR (Grant:160088), King Faisal University, Saudi Arabia.
Rights and permissions
About this article
Cite this article
Rasool, R.u., Najam, M., Ahmad, H.F. et al. A novel JSON based regular expression language for pattern matching in the internet of things. J Ambient Intell Human Comput 10, 1463–1481 (2019). https://doi.org/10.1007/s12652-018-0869-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0869-1