Road transportation is the backbone of modern economies despite costing annually millions of human deaths and injuries and trillions of dollars. Twitter is a powerful information source for transportation but major challenges in big data management and Twitter analytics need addressing. We propose Iktishaf, developed over Apache Spark, a big data tool for traffic-related event detection from Twitter data in Saudi Arabia. It uses three machine learning (ML) algorithms to build multiple classifiers to detect eight event types. The classifiers are validated using widely used criteria and against external sources. Iktishaf Stemmer improves text preprocessing, event detection and feature space. Using 2.5 million tweets, we detect events without prior knowledge including the KSA national day, a fire in Riyadh, rains in Makkah and Taif, and the inauguration of Al-Haramain train. We are not aware of any work, apart from ours, that uses big data technologies for event detection of road traffic events from tweets in Arabic. Iktishaf provides hybrid human-ML methods and is a prime example of bringing together AI theory, big data processing, and human cognition applied to a practical problem.

Similar content being viewed by others
Change history
23 October 2020
Springer Nature’s version of this paper was updated to present the correct Arabic characters in the text body.
Agarwal S, Mittal N, Sureka A (2018) Potholes and bad road conditions- mining twitter to extract information on killer roads. ACM India Jt Int Conf Data Sci Manag Data CoDS-COMAD 2018
Ni M, He Q, Gao J (2017) Forecasting the Subway passenger flow under event occurrences with social media. IEEE Trans Intell Transp Syst 18(6):1623–1632
Wang D, Al-Rubaie A, Davies J, and Clarke SS (2014) Real time road traffic monitoring alert based on incremental learning from tweets, pp. 50–57
Suma S, Mehmood R, and Albeshri A (2020) Automatic detection and validation of smart city events using HPC and apache spark platforms, pp. 55–78
LauRYK (2017) Toward a social sensor based framework for intelligent transportation,” in 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), pp. 1–6
Klaithin S and Haruechaiyasak C (2016) Traffic information extraction and classification from Thai Twitter. Comput Sci Softw Eng (JCSSE), 2016 13th Int. Jt. Conf., pp. 1–6
D’Andrea E, Ducange P, Lazzerini B, Marcelloni F (2015) Real-time detection of traffic from twitter stream analysis. IEEE Trans Intell Transp Syst 16(4):2269–2283
Alomari E, Mehmood R (2017) Analysis of tweets in Arabic language for detection of road traffic conditions. in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST 224:98–110
Alomari E, Mehmood R, and Katib I (2019) Sentiment analysis of arabic tweets for road traffic congestion and event detection,” in In: Mehmood R., See S., Katib I., Chlamtac I. (eds) Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies, Springer (https://www.springer.com/us/book/9783030137045), p. to appear
Alomari E, Mehmood R, and Katib I (2019) Road traffic event detection using twitter data, machine learning, and apache spark in IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2019, pp. 1888–1895
Mehmood R, Bhaduri B, Katib I, and Chlamtac I, Eds. (2018) Smart societies, infrastructure, technologies and applications, vol. 224. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST), Springer, pp. 367
Mehmood R, See S, Katib I, and Chlamtac I, Eds., Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies. EAI/Springer Innovations in Communication and Computing, Springer International Publishing, Springer Nature Switzerland AG, pp. 692, 2020
Muhammed T, Mehmood R, Albeshri A (2018) Enabling reliable and resilient IoT based smart city applications. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST 224:169–184
Alam F, Mehmood R, Katib I, Albogami NN, Albeshri A (2017) Data fusion and IoT for smart ubiquitous environments: a survey. IEEE Access 5:9533–9554
Muhammed T, Mehmood R, Albeshri A, Katib I (2018) UbeHealth: a personalized ubiquitous cloud and edge-enabled networked healthcare system for smart cities. IEEE Access 6:32258–32285
Muhammed T, Mehmood R, Albeshri A, and Alzahrani A (2020) HCDSR: a hierarchical clustered fault tolerant routing technique for IoT-based smart societies, pp. 609–628
Mehmood R, Alam F, Albogami NN, Katib I, Albeshri A, Altowaijri SM (2017) UTiLearn: a personalised ubiquitous teaching and learning system for smart societies. IEEE Access 5:2615–2635
Lin C, He D, Kumar N, Choo KKR, Vinel A, Huang X (Jan. 2018) Security and privacy for the internet of drones: challenges and solutions. IEEE Commun Mag 56(1):64–69
Alomari KM, Elsherif HM, and Shaalan K (2017) Arabic tweets sentimental analysis using machine learning, in In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 602–610
Pandhare KR and Shah MA (2017) Real time road traffic event detection using Twitter and spark, 2017 Int. Conf. Inven. Commun. Comput. Technol., no. Icicct, pp. 445–449
Salas A, Georgakis P, Nwagboso C, Ammari A, and Petalas I (2017) Traffic event detection framework using social media, in 2017 IEEE International Conference on Smart Grid and Smart Cities, ICSGSC, 2017, pp. 303–307
Garg S, Kumar N, Rodrigues JJPC, Rodrigues JJPC (Mar. 2019) Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in SDN: a social multimedia perspective. IEEE Trans Multimed 21(3):566–578
Mehmood R, Graham G (2015) Big data logistics: a health-care transport capacity sharing model. Procedia Computer Science 64:1107–1114
Mehmood R, Meriton R, Graham G, Hennelly P, Kumar M (Jan. 2017) Exploring the influence of big data on city transport operations: a Markovian approach. Int J Oper Prod Manag 37(1):75–104
Arfat Y, Usman S, Mehmood R, and Katib I (2020) Big data tools, technologies, and applications: a survey, pp. 453–490
Arfat Y, Usman S, Mehmood R, and Katib I (2020) Big data for smart infrastructure design: opportunities and challenges, pp. 491–518
Arfat Y, Suma S, Mehmood R, and Albeshri A (2020) Parallel shortest path big data graph computations of US road network using apache spark: survey, architecture, and evaluation, pp. 185–214
Usman S, Mehmood R, Katib I (2020) Big data and HPC convergence for smart infrastructures: a review and proposed architecture, pp. 561–586
Muhammed T, Mehmood R, Albeshri A, Katib I (Mar. 2019) SURAA: a novel method and tool for loadbalanced and coalesced SpMV computations on GPUs. Appl Sci 9(5):947
Alyahya H, Mehmood R, and Katib I (2020) Parallel Iterative Solution of Large Sparse Linear Equation Systems on the Intel MIC Architecture, pp. 377–407
Usman S, Mehmood R, Katib I, Albeshri A, Altowaijri SM (2019) ZAKI: a smart method and tool for automatic performance optimization of parallel SpMV computations on distributed memory machines. Mob. Networks Appl
Usman S, Mehmood R, Katib I, Albeshri A (2019) ZAKI+: a machine learning based process mapping tool for SpMV computations on distributed memory architectures. IEEE Access 7:81279–81296
Arfat Y et al (2017) Enabling smarter societies through mobile big data fogs and clouds. Procedia Computer Science 109:1128–1133
Mehmood R, Faisal MA, Altowaijri S (2015) Future networked healthcare systems: a review and case study. In: Boucadair M, Jacquenet C (eds) Handbook of research on redesigning the future of internet architectures. IGI Global, Hershey, PA, pp 531–558
Tawalbeh LA, Bakhader W, Mehmood R, and Song H (2016) Cloudlet-based mobile cloud computing for healthcare applications, in 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1–6
Schlingensiepen J, Mehmood R, Nemtanu FC, Niculescu M (2014) Increasing sustainability of road transport in european cities and metropolitan areas by facilitating autonomic road transport systems (ARTS), pp. 201–210
Alam F, Mehmood R, Katib I, Altowaijri SM, Albeshri A (2019) TAAWUN: a decision fusion and feature specific road detection approach for connected autonomous vehicles. Mob. Networks Appl, Aug
Shoayee A, Mehmood R, Iyad K (2020) The role of big data and twitter data analytics in healthcare supply chain management, in Smart Infrastructure and Applications, Springer, Cham, pp. 267–279
Alamoudi E, Mehmood R, Albeshri A, Gojobori T (2020) A survey of methods and tools for large-scale DNA mixture profiling, pp. 217–248
Alotaibi S, Mehmood R (2018) Big data enabled healthcare supply chain management: Opportunities and challenges. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (LNICST) 224:207–215
Alotaibi S, Mehmood R, Katib I, Rana O, Albeshri A (Feb. 2020) Sehaa: a big data analytics tool for healthcare symptoms and diseases detection using twitter, apache spark, and machine learning. Appl Sci 10(4):1398
Aqib M, Mehmood R, Alzahrani A, Katib I, Albeshri A, Altowaijri SM (May 2019) Smarter traffic prediction using big data, in-memory computing, deep learning and GPUs. Sensors 19(9):2206
Aqib M, Mehmood R, Alzahrani A, Katib I, Albeshri A, Altowaijri SM (May 2019) Rapid transit systems: smarter urban planning using big data, in-memory computing, deep learning, and GPUs. Sustainability 11(10):2736
Alsolami B, Mehmood R,Albeshri A (2020) Hybrid statistical and machine learning methods for road traffic prediction: a review and tutorial,” in Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies Foundations for Smarter Cities and Societies, Springer, Cham, pp. 115–133
Kumar N, Chilamkurti N, Park JH (Dec. 2013) ALCA: agent learning-based clustering algorithm in vehicular ad hoc networks. Pers Ubiquitous Comput 17(8):1683–1692
Miglani A, Kumar N (Dec. 2019) Deep learning models for traffic flow prediction in autonomous vehicles: a review, solutions, and challenges. Veh Commun 20:100184
Al-Dhubhani R, Mehmood R, Katib I, Algarni A (2018) Location privacy in smart cities era. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST 224:123–138
Khanum A, Alvi A, Mehmood R (2018) Towards a semantically enriched computational intelligence (SECI) framework for smart farming. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST 224:247–257
Omar Alkhamisi A Mehmood R (2020) An ensemble machine and deep learning model for risk prediction in aviation systems, in 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 54–59
Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R (2019) A hybrid deep learning based model for anomaly detection in cloud datacentre networks. IEEE Trans. Netw. Serv. Manag
Liu B (May 2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Kurniawan DA, Wibirama S, and Setiawan NA (2016) Real-time traffic classification with twitter data mining
Suma S, Mehmood R, and Albeshri A (2019) Automatic detection and validation of smart city events using hpc and apache spark platforms,” in Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies, Springer
Mohammad O, AL-Smadi, Qawasmeh (2016) Knowledge-based approach for event extraction from arabic tweets” Int J Adv Comput Sci Appl, 7(6)
Alsaedi N, Burnap P, and Rana O (2017) Can we predict a riot ? Disruptive event detection using twitter 17(2)
Alabbas W, Al-Khateeb HM, Mansour A, Epiphaniou G, and Frommholz I (2017) Classification of colloquial Arabic tweets in real-time to detect high-risk floods,” in International Conference On Social Media, Wearable And Web Analytics, Social Media, pp. 1–8
Jaafar Y, Bouzoubaa K (2018) A survey and comparative study of Arabic NLP architectures. In: In intelligent natural language processing: trends and applications
Abdulla NA, Ahmed NA, Shehab MA, Al-ayyoub M (2013) Arabic sentiment analysis :lexicon-based and corpus-based. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M, Al-Kabi MN, Al-rifai S (2014) Towards improving the lexicon-based approach for Arabic sentiment analysis. Int J Inf Technol Web Eng 9(3):55–71
Alhaj YA, Xiang J, Zhao D, Al-Qaness MAA, Abd Elaziz M, Dahou A (2019) A study of the effects of stemming strategies on Arabic document classification. IEEE Access 7:32664–32671
Diab M, Ghoneim M, Habash N (2007) Arabic diacritization in the context of statistical machine translation. In: Proceedings of MT-summit
Skynews, “Infographic .. Saudi Arabia celebrates the 88th National Day,” 2018.
S. P. A. WAS (2018) Heavy rain in Makkah.
S. P. A. WAS (2018) Civil defense in Riyadh conducts cooling operations for burnt transformers in Al-Nafal neighborhood.
S. P. A. WAS (2018) The launch of the Al-Harameen Express train between Makkah and Madinah, passing through Jeddah and the Economic City
The work carried out in this paper is supported by the HPC Center at King AbdulAziz University (KAU). The experiments reported were performed on the Aziz supercomputer at KAU.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alomari, E., Katib, I. & Mehmood, R. Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning. Mobile Netw Appl 28, 603–618 (2023). https://doi.org/10.1007/s11036-020-01635-y
Issue Date:
DOI: https://doi.org/10.1007/s11036-020-01635-y