Early Spam Detection Using Time-Based Cache in Graph database

Srivastava, Sakshi; Agrahari, Supriya; Singh, Anil Kumar

doi:10.1007/s00354-023-00223-4

Early Spam Detection Using Time-Based Cache in Graph database

Published: 13 June 2023

Volume 41, pages 607–634, (2023)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Sakshi Srivastava¹,
Supriya Agrahari¹ &
Anil Kumar Singh¹

142 Accesses
Explore all metrics

Abstract

Social networking is prevalent everywhere. Technology has made the world seem smaller by bringing individuals together, reconnecting people, and providing other opportunities for connection. However, the risk of data fraud has also increased. The early detection of fraud needs a deep analysis of tweets, which has been done using Neo4J. The Twitter API receives requests from thousands of developers every single day. Limits are imposed on the number of requests that may be made to assist in managing the overwhelming amount of these requests. The most typical request limit period is 15 min. If an endpoint has a rate limit of 1000 requests/15 min, then 1000 requests are permitted over 15 min. Every time a real-time tweet analysis error is returned to the application when these limits are exceeded, online updation of the Neo4J database is difficult. In the proposed method, first and foremost, a new time-based cache (TmCache) is introduced between the database and Twitter API, which removes the complexities of Twitter API and reduces the analyzing time by 85.36%. Furthermore, a machine learning-based approach Multinomial Naive Bayes classifier is used to classify the spam and non-spam tweets with the lowest training time and 96.6% accuracy. The proposed model is helpful in the early detection of spam accurately and can be used in reducing the risk of propagation of agendas which are a major threat to society at large in the current era of social networking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Combating Misinformation by Sharing the Truth: a Study on the Spread of Fact-Checks on Social Media

Article 11 June 2022

The emergence of social media data and sentiment analysis in election prediction

Article 06 August 2020

Data Availability

Not applicable.

Code Availability

Not applicable.

References

Hölsch, J., Grossniklaus, M.: An algebra and equivalences to transform graph patterns in neo4j. In EDBT/ICDT 2016 Workshops: EDBT Workshop on Querying Graph Structured Data (GraphQ). (2016)
Sun, Y., Sarwat, M.: A spatially-pruned vertex expansion operator in the Neo4j graph database system. GeoInformatica 23(3), 397–423 (2019)
Article Google Scholar
Kuijpers, J., Fletcher, G., Lindaaker, T., Yakovets, N.: Path Indexing in the Cypher Query Pipeline. In EDBT (pp. 582-587). (2021)
Bergami, G., Magnani, M., Montesi, D.: A Join Operator for Property Graphs. In EDBT/ICDT Workshops. (2017, March)
He, H., Singh, A. K.: Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 405-418).(2008, June)
Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in twitter. IEEE Access 8, 155961–155970 (2020). https://doi.org/10.1109/ACCESS.2020.3019600
Article Google Scholar
Webber, J., Robinson, I.: A programmatic introduction to neo4j. Addison-Wesley Professional. (2018)
Mahlous, A. R., Al-Laith, A.: Fake news detection in Arabic tweets during the COVID-19 pandemic. Int J Adv Comput Sci Appl. (2021)
Alam, F., Dalvi, F., Shaar, S., Durrani, N., Mubarak, H., Nikolov, A., Martino, G., Abdelali, A., Sajjad, H., Darwish, K., Nakov, P.: Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms. (2020)
Al-Sarem, M., Alsaeedi, A., Saeed, F., Boulila, W., AmeerBakhsh, O.: A novel hybrid deep learning model for detecting COVID-19-related rumors on social media based on LSTM and concatenated parallel CNNs. Appl. Sci. 11(17), 7940 (2021)
Article Google Scholar
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., ... Chakraborty, T.: Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on Combating Online Hostile Posts inRegional Languages during Emergency Situation (pp. 21-29). Springer, Cham. (2021, February)
Zubiaga, A., Liakata, M., Procter, R.: Exploiting context for rumour detection in social media. In International conference on social informatics (pp. 109-123). Springer, Cham (2017, September)
Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of covid-19 misinformation on twitter. Online Soc. Netw. Media 22, 100104 (2021)
Article Google Scholar
Kishore Shahi, G., Dirkson, A., Majchrzak, T. A.: An Exploratory Study of COVID-19 Misinformation on Twitter. arXiv e-prints, arXiv-2005. (2020)
Jiang, W., Hu, H.B., Xu, L.G.: Query acceleration of graph databases by ID caching technology. J. Electron. Sci. Technol. 17(1), 41–50 (2019)
Google Scholar
Guia, J., Soares, V. G., Bernardino, J.: Graph Databases: Neo4j Analysis. In ICEIS (1) (pp. 351-356). (2017, January)
Srivastava, S., Singh, A. K.: Fraud detection in the distributed graph database. Cluster Computing, 1-23. (2022)
Srivastava, S., Singh, A. K.: Graph Based Analysis of Panama Papers. In 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 822-827). IEEE. (2018, December)
Angles, R.: The Property Graph Database Model. In AMW. (2018, May)
Gubichev, A., Then, M.: Graph Pattern Matching: Do We Have to Reinvent the Wheel?. In Proceedings of Workshop on GRAph Data management Experiences and Systems (pp. 1-7). (2014, June)
Hogan, Aidan., Blomqvist, Eva., Cochez, Michael., D’amato, Claudia., Melo, Gerard De, Gutierrez, Claudio., Kirrane, Sabrina., Gayo, José Emilio Labra., Navigli, Roberto., Neumaier, Sebastian., Ngonga Ngomo, Axel-Cyrille., Polleres, Axel., Rashid, Sabbir M., Rula, Anisa., Schmelzeisen, Lukas., Sequeda, Juan., Staab, Steffen, Zimmermann, Antoine.: Knowledge Graphs. ACM Comput. Surv. 54, 4, Article 71 (May 2022), 37 pages. (2021) https://doi.org/10.1145/3447772
Spyropoulos, V., Vasilakopoulou, C., Kotidis, Y.: Digree: A middleware for a graph databases polystore. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2580-2589). IEEE.(2016, December)
Sarstedt, M., Mooi, E.: A concise guide to market research. The Process, Data, and, 12. (2014)
Wang, J., Ntarmos, N., Triantafillou, P.: Graphcache: A caching system for graph queries. (2017)
Yang, J., Yue, Y., Rashmi, K. V.: A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) (pp. 191-208).(2020)
Bindu, P.V., Thilagam, P.S.: Mining social networks for anomalies: methods and challenges. J. Netw. Comput. Appl. 68, 213–229 (2016)
Article Google Scholar
Abbas, M., Memon, K. A., Jamali, A. A., Memon, S., Ahmed, A.: Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 19(3), 62. (2019)
Harzevili, N.S., Alizadeh, S.H.: Mixture of latent multinomial naive Bayes classifier. Appl. Soft Comput. 69, 516–527 (2018)
Article Google Scholar
Mahfoud, H.: Graph pattern matching with counting quantifiers and label-repetition constraints. Cluster Comput. 23(3), 1529–1553 (2020)
Article Google Scholar
Arlitt, M.F., Williamson, C.L.: Web server workload characterization: The search for invariants. ACM SIGMETRICS Performance Eval. Rev 24(1), 126–137 (1996)
Article Google Scholar
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (pp. 53-64). (2012, June)
Beckmann, N., Chen, H., Cidon, A.: LHD: Improving cache hit rate by maximizing hit density. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (pp. 389-403). (2018)
Huang, Q., Birman, K., Van Renesse, R., Lloyd, W., Kumar, S., Li, H. C.:An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 167-181). (2013, November)
Huang, Q., Gudmundsdottir, H., Vigfusson, Y., Freedman, D. A., Birman, K., van Renesse, R.: Characterizing load imbalance in real-world networked caches. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks (pp. 1-7). (2014, October)
Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H. C., ... Venkataramani, V.: Scaling memcache at facebook. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (pp. 385-398). (2013)
Shi, W., Wright, R., Collins, E., Karamcheti, V.: Workload characterization of a personalized web site and its implications for dynamic content caching. In Proceedings of the Seventh International Workshop on Web Caching and Content Distribution (WCW’02) (pp. 1-16).(2002, August)
Wendell, P., Freedman, M. J.: Going viral: flash crowds in an open CDN. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference (pp. 549-558). (2011, November)
Yang, J., Yue, Y., Vinayak, R.: Segcache: a memory-efficient and scalable in-memory key-value cache for small objects. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) (pp. 503-518). (2021)

Download references

Funding

The authors would like to thank the Technical Education Quality Improvement Programme (TEQIP-III), for supporting the research.

Author information

Authors and Affiliations

Motilal Nehru National Institute of Technology, Prayagraj, Uttar Pradesh, India
Sakshi Srivastava, Supriya Agrahari & Anil Kumar Singh

Authors

Sakshi Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Supriya Agrahari
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Sakshi Srivastava MNNIT Allahabad (a) participated in analysis and interpretation of the data, (b) drafted the article or revised it critically for important intellectual content, and (c) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. Anil Kumar Singh MNNIT Allahabad (a) drafted the article or revised it critically for important intellectual content and (b) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.

Corresponding author

Correspondence to Sakshi Srivastava.

Ethics declarations

Conflict of Interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Ethical Approval and Consent to participate

Not applicable.

Human and Animal Ethics

Not applicable.

Consent for Publication

Hereby, I, Sakshi Srivastava, consciously assure that for the Big Data Analysis using Time-Based Cache, the following is fulfilled: (1) this material is the authors’ own original work, which has not been previously published elsewhere. (2) The paper is not currently being considered for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Srivastava, S., Agrahari, S. & Singh, A.K. Early Spam Detection Using Time-Based Cache in Graph database. New Gener. Comput. 41, 607–634 (2023). https://doi.org/10.1007/s00354-023-00223-4

Download citation

Received: 29 August 2022
Accepted: 04 May 2023
Published: 13 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00354-023-00223-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early Spam Detection Using Time-Based Cache in Graph database

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Combating Misinformation by Sharing the Truth: a Study on the Spread of Fact-Checks on Social Media

The emergence of social media data and sentiment analysis in election prediction

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval and Consent to participate

Human and Animal Ethics

Consent for Publication

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Early Spam Detection Using Time-Based Cache in Graph database

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Combating Misinformation by Sharing the Truth: a Study on the Spread of Fact-Checks on Social Media

The emergence of social media data and sentiment analysis in election prediction

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval and Consent to participate

Human and Animal Ethics

Consent for Publication

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation