Abstract
Social networking is prevalent everywhere. Technology has made the world seem smaller by bringing individuals together, reconnecting people, and providing other opportunities for connection. However, the risk of data fraud has also increased. The early detection of fraud needs a deep analysis of tweets, which has been done using Neo4J. The Twitter API receives requests from thousands of developers every single day. Limits are imposed on the number of requests that may be made to assist in managing the overwhelming amount of these requests. The most typical request limit period is 15 min. If an endpoint has a rate limit of 1000 requests/15 min, then 1000 requests are permitted over 15 min. Every time a real-time tweet analysis error is returned to the application when these limits are exceeded, online updation of the Neo4J database is difficult. In the proposed method, first and foremost, a new time-based cache (TmCache) is introduced between the database and Twitter API, which removes the complexities of Twitter API and reduces the analyzing time by 85.36%. Furthermore, a machine learning-based approach Multinomial Naive Bayes classifier is used to classify the spam and non-spam tweets with the lowest training time and 96.6% accuracy. The proposed model is helpful in the early detection of spam accurately and can be used in reducing the risk of propagation of agendas which are a major threat to society at large in the current era of social networking.
Similar content being viewed by others
Data Availability
Not applicable.
Code Availability
Not applicable.
References
Hölsch, J., Grossniklaus, M.: An algebra and equivalences to transform graph patterns in neo4j. In EDBT/ICDT 2016 Workshops: EDBT Workshop on Querying Graph Structured Data (GraphQ). (2016)
Sun, Y., Sarwat, M.: A spatially-pruned vertex expansion operator in the Neo4j graph database system. GeoInformatica 23(3), 397–423 (2019)
Kuijpers, J., Fletcher, G., Lindaaker, T., Yakovets, N.: Path Indexing in the Cypher Query Pipeline. In EDBT (pp. 582-587). (2021)
Bergami, G., Magnani, M., Montesi, D.: A Join Operator for Property Graphs. In EDBT/ICDT Workshops. (2017, March)
He, H., Singh, A. K.: Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 405-418).(2008, June)
Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in twitter. IEEE Access 8, 155961–155970 (2020). https://doi.org/10.1109/ACCESS.2020.3019600
Webber, J., Robinson, I.: A programmatic introduction to neo4j. Addison-Wesley Professional. (2018)
Mahlous, A. R., Al-Laith, A.: Fake news detection in Arabic tweets during the COVID-19 pandemic. Int J Adv Comput Sci Appl. (2021)
Alam, F., Dalvi, F., Shaar, S., Durrani, N., Mubarak, H., Nikolov, A., Martino, G., Abdelali, A., Sajjad, H., Darwish, K., Nakov, P.: Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms. (2020)
Al-Sarem, M., Alsaeedi, A., Saeed, F., Boulila, W., AmeerBakhsh, O.: A novel hybrid deep learning model for detecting COVID-19-related rumors on social media based on LSTM and concatenated parallel CNNs. Appl. Sci. 11(17), 7940 (2021)
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., ... Chakraborty, T.: Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on Combating Online Hostile Posts inRegional Languages during Emergency Situation (pp. 21-29). Springer, Cham. (2021, February)
Zubiaga, A., Liakata, M., Procter, R.: Exploiting context for rumour detection in social media. In International conference on social informatics (pp. 109-123). Springer, Cham (2017, September)
Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of covid-19 misinformation on twitter. Online Soc. Netw. Media 22, 100104 (2021)
Kishore Shahi, G., Dirkson, A., Majchrzak, T. A.: An Exploratory Study of COVID-19 Misinformation on Twitter. arXiv e-prints, arXiv-2005. (2020)
Jiang, W., Hu, H.B., Xu, L.G.: Query acceleration of graph databases by ID caching technology. J. Electron. Sci. Technol. 17(1), 41–50 (2019)
Guia, J., Soares, V. G., Bernardino, J.: Graph Databases: Neo4j Analysis. In ICEIS (1) (pp. 351-356). (2017, January)
Srivastava, S., Singh, A. K.: Fraud detection in the distributed graph database. Cluster Computing, 1-23. (2022)
Srivastava, S., Singh, A. K.: Graph Based Analysis of Panama Papers. In 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 822-827). IEEE. (2018, December)
Angles, R.: The Property Graph Database Model. In AMW. (2018, May)
Gubichev, A., Then, M.: Graph Pattern Matching: Do We Have to Reinvent the Wheel?. In Proceedings of Workshop on GRAph Data management Experiences and Systems (pp. 1-7). (2014, June)
Hogan, Aidan., Blomqvist, Eva., Cochez, Michael., D’amato, Claudia., Melo, Gerard De, Gutierrez, Claudio., Kirrane, Sabrina., Gayo, José Emilio Labra., Navigli, Roberto., Neumaier, Sebastian., Ngonga Ngomo, Axel-Cyrille., Polleres, Axel., Rashid, Sabbir M., Rula, Anisa., Schmelzeisen, Lukas., Sequeda, Juan., Staab, Steffen, Zimmermann, Antoine.: Knowledge Graphs. ACM Comput. Surv. 54, 4, Article 71 (May 2022), 37 pages. (2021) https://doi.org/10.1145/3447772
Spyropoulos, V., Vasilakopoulou, C., Kotidis, Y.: Digree: A middleware for a graph databases polystore. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2580-2589). IEEE.(2016, December)
Sarstedt, M., Mooi, E.: A concise guide to market research. The Process, Data, and, 12. (2014)
Wang, J., Ntarmos, N., Triantafillou, P.: Graphcache: A caching system for graph queries. (2017)
Yang, J., Yue, Y., Rashmi, K. V.: A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) (pp. 191-208).(2020)
Bindu, P.V., Thilagam, P.S.: Mining social networks for anomalies: methods and challenges. J. Netw. Comput. Appl. 68, 213–229 (2016)
Abbas, M., Memon, K. A., Jamali, A. A., Memon, S., Ahmed, A.: Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 19(3), 62. (2019)
Harzevili, N.S., Alizadeh, S.H.: Mixture of latent multinomial naive Bayes classifier. Appl. Soft Comput. 69, 516–527 (2018)
Mahfoud, H.: Graph pattern matching with counting quantifiers and label-repetition constraints. Cluster Comput. 23(3), 1529–1553 (2020)
Arlitt, M.F., Williamson, C.L.: Web server workload characterization: The search for invariants. ACM SIGMETRICS Performance Eval. Rev 24(1), 126–137 (1996)
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (pp. 53-64). (2012, June)
Beckmann, N., Chen, H., Cidon, A.: LHD: Improving cache hit rate by maximizing hit density. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (pp. 389-403). (2018)
Huang, Q., Birman, K., Van Renesse, R., Lloyd, W., Kumar, S., Li, H. C.:An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 167-181). (2013, November)
Huang, Q., Gudmundsdottir, H., Vigfusson, Y., Freedman, D. A., Birman, K., van Renesse, R.: Characterizing load imbalance in real-world networked caches. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks (pp. 1-7). (2014, October)
Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H. C., ... Venkataramani, V.: Scaling memcache at facebook. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (pp. 385-398). (2013)
Shi, W., Wright, R., Collins, E., Karamcheti, V.: Workload characterization of a personalized web site and its implications for dynamic content caching. In Proceedings of the Seventh International Workshop on Web Caching and Content Distribution (WCW’02) (pp. 1-16).(2002, August)
Wendell, P., Freedman, M. J.: Going viral: flash crowds in an open CDN. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference (pp. 549-558). (2011, November)
Yang, J., Yue, Y., Vinayak, R.: Segcache: a memory-efficient and scalable in-memory key-value cache for small objects. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) (pp. 503-518). (2021)
Funding
The authors would like to thank the Technical Education Quality Improvement Programme (TEQIP-III), for supporting the research.
Author information
Authors and Affiliations
Contributions
Sakshi Srivastava MNNIT Allahabad (a) participated in analysis and interpretation of the data, (b) drafted the article or revised it critically for important intellectual content, and (c) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. Anil Kumar Singh MNNIT Allahabad (a) drafted the article or revised it critically for important intellectual content and (b) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Ethical Approval and Consent to participate
Not applicable.
Human and Animal Ethics
Not applicable.
Consent for Publication
Hereby, I, Sakshi Srivastava, consciously assure that for the Big Data Analysis using Time-Based Cache, the following is fulfilled: (1) this material is the authors’ own original work, which has not been previously published elsewhere. (2) The paper is not currently being considered for publication elsewhere.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Srivastava, S., Agrahari, S. & Singh, A.K. Early Spam Detection Using Time-Based Cache in Graph database. New Gener. Comput. 41, 607–634 (2023). https://doi.org/10.1007/s00354-023-00223-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-023-00223-4