Skip to main content
Log in

Early Spam Detection Using Time-Based Cache in Graph database

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Social networking is prevalent everywhere. Technology has made the world seem smaller by bringing individuals together, reconnecting people, and providing other opportunities for connection. However, the risk of data fraud has also increased. The early detection of fraud needs a deep analysis of tweets, which has been done using Neo4J. The Twitter API receives requests from thousands of developers every single day. Limits are imposed on the number of requests that may be made to assist in managing the overwhelming amount of these requests. The most typical request limit period is 15 min. If an endpoint has a rate limit of 1000 requests/15 min, then 1000 requests are permitted over 15 min. Every time a real-time tweet analysis error is returned to the application when these limits are exceeded, online updation of the Neo4J database is difficult. In the proposed method, first and foremost, a new time-based cache (TmCache) is introduced between the database and Twitter API, which removes the complexities of Twitter API and reduces the analyzing time by 85.36%. Furthermore, a machine learning-based approach Multinomial Naive Bayes classifier is used to classify the spam and non-spam tweets with the lowest training time and 96.6% accuracy. The proposed model is helpful in the early detection of spam accurately and can be used in reducing the risk of propagation of agendas which are a major threat to society at large in the current era of social networking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data Availability

Not applicable.

Code Availability

Not applicable.

References

  1. Hölsch, J., Grossniklaus, M.: An algebra and equivalences to transform graph patterns in neo4j. In EDBT/ICDT 2016 Workshops: EDBT Workshop on Querying Graph Structured Data (GraphQ). (2016)

  2. Sun, Y., Sarwat, M.: A spatially-pruned vertex expansion operator in the Neo4j graph database system. GeoInformatica 23(3), 397–423 (2019)

    Article  Google Scholar 

  3. Kuijpers, J., Fletcher, G., Lindaaker, T., Yakovets, N.: Path Indexing in the Cypher Query Pipeline. In EDBT (pp. 582-587). (2021)

  4. Bergami, G., Magnani, M., Montesi, D.: A Join Operator for Property Graphs. In EDBT/ICDT Workshops. (2017, March)

  5. He, H., Singh, A. K.: Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 405-418).(2008, June)

  6. Al-Rakhami, M.S., Al-Amri, A.M.: Lies kill, facts save: detecting COVID-19 misinformation in twitter. IEEE Access 8, 155961–155970 (2020). https://doi.org/10.1109/ACCESS.2020.3019600

    Article  Google Scholar 

  7. Webber, J., Robinson, I.: A programmatic introduction to neo4j. Addison-Wesley Professional. (2018)

  8. Mahlous, A. R., Al-Laith, A.: Fake news detection in Arabic tweets during the COVID-19 pandemic. Int J Adv Comput Sci Appl. (2021)

  9. Alam, F., Dalvi, F., Shaar, S., Durrani, N., Mubarak, H., Nikolov, A., Martino, G., Abdelali, A., Sajjad, H., Darwish, K., Nakov, P.: Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms. (2020)

  10. Al-Sarem, M., Alsaeedi, A., Saeed, F., Boulila, W., AmeerBakhsh, O.: A novel hybrid deep learning model for detecting COVID-19-related rumors on social media based on LSTM and concatenated parallel CNNs. Appl. Sci. 11(17), 7940 (2021)

    Article  Google Scholar 

  11. Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., ... Chakraborty, T.: Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on Combating Online Hostile Posts inRegional Languages during Emergency Situation (pp. 21-29). Springer, Cham. (2021, February)

  12. Zubiaga, A., Liakata, M., Procter, R.: Exploiting context for rumour detection in social media. In International conference on social informatics (pp. 109-123). Springer, Cham (2017, September)

  13. Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of covid-19 misinformation on twitter. Online Soc. Netw. Media 22, 100104 (2021)

    Article  Google Scholar 

  14. Kishore Shahi, G., Dirkson, A., Majchrzak, T. A.: An Exploratory Study of COVID-19 Misinformation on Twitter. arXiv e-prints, arXiv-2005. (2020)

  15. Jiang, W., Hu, H.B., Xu, L.G.: Query acceleration of graph databases by ID caching technology. J. Electron. Sci. Technol. 17(1), 41–50 (2019)

    Google Scholar 

  16. Guia, J., Soares, V. G., Bernardino, J.: Graph Databases: Neo4j Analysis. In ICEIS (1) (pp. 351-356). (2017, January)

  17. Srivastava, S., Singh, A. K.: Fraud detection in the distributed graph database. Cluster Computing, 1-23. (2022)

  18. Srivastava, S., Singh, A. K.: Graph Based Analysis of Panama Papers. In 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 822-827). IEEE. (2018, December)

  19. Angles, R.: The Property Graph Database Model. In AMW. (2018, May)

  20. Gubichev, A., Then, M.: Graph Pattern Matching: Do We Have to Reinvent the Wheel?. In Proceedings of Workshop on GRAph Data management Experiences and Systems (pp. 1-7). (2014, June)

  21. Hogan, Aidan., Blomqvist, Eva., Cochez, Michael., D’amato, Claudia., Melo, Gerard De, Gutierrez, Claudio., Kirrane, Sabrina., Gayo, José Emilio Labra., Navigli, Roberto., Neumaier, Sebastian., Ngonga Ngomo, Axel-Cyrille., Polleres, Axel., Rashid, Sabbir M., Rula, Anisa., Schmelzeisen, Lukas., Sequeda, Juan., Staab, Steffen, Zimmermann, Antoine.: Knowledge Graphs. ACM Comput. Surv. 54, 4, Article 71 (May 2022), 37 pages. (2021) https://doi.org/10.1145/3447772

  22. Spyropoulos, V., Vasilakopoulou, C., Kotidis, Y.: Digree: A middleware for a graph databases polystore. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 2580-2589). IEEE.(2016, December)

  23. Sarstedt, M., Mooi, E.: A concise guide to market research. The Process, Data, and, 12. (2014)

  24. Wang, J., Ntarmos, N., Triantafillou, P.: Graphcache: A caching system for graph queries. (2017)

  25. Yang, J., Yue, Y., Rashmi, K. V.: A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) (pp. 191-208).(2020)

  26. Bindu, P.V., Thilagam, P.S.: Mining social networks for anomalies: methods and challenges. J. Netw. Comput. Appl. 68, 213–229 (2016)

    Article  Google Scholar 

  27. Abbas, M., Memon, K. A., Jamali, A. A., Memon, S., Ahmed, A.: Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 19(3), 62. (2019)

  28. Harzevili, N.S., Alizadeh, S.H.: Mixture of latent multinomial naive Bayes classifier. Appl. Soft Comput. 69, 516–527 (2018)

    Article  Google Scholar 

  29. Mahfoud, H.: Graph pattern matching with counting quantifiers and label-repetition constraints. Cluster Comput. 23(3), 1529–1553 (2020)

    Article  Google Scholar 

  30. Arlitt, M.F., Williamson, C.L.: Web server workload characterization: The search for invariants. ACM SIGMETRICS Performance Eval. Rev 24(1), 126–137 (1996)

    Article  Google Scholar 

  31. Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (pp. 53-64). (2012, June)

  32. Beckmann, N., Chen, H., Cidon, A.: LHD: Improving cache hit rate by maximizing hit density. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (pp. 389-403). (2018)

  33. Huang, Q., Birman, K., Van Renesse, R., Lloyd, W., Kumar, S., Li, H. C.:An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 167-181). (2013, November)

  34. Huang, Q., Gudmundsdottir, H., Vigfusson, Y., Freedman, D. A., Birman, K., van Renesse, R.: Characterizing load imbalance in real-world networked caches. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks (pp. 1-7). (2014, October)

  35. Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H. C., ... Venkataramani, V.: Scaling memcache at facebook. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (pp. 385-398). (2013)

  36. Shi, W., Wright, R., Collins, E., Karamcheti, V.: Workload characterization of a personalized web site and its implications for dynamic content caching. In Proceedings of the Seventh International Workshop on Web Caching and Content Distribution (WCW’02) (pp. 1-16).(2002, August)

  37. Wendell, P., Freedman, M. J.: Going viral: flash crowds in an open CDN. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference (pp. 549-558). (2011, November)

  38. Yang, J., Yue, Y., Vinayak, R.: Segcache: a memory-efficient and scalable in-memory key-value cache for small objects. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) (pp. 503-518). (2021)

Download references

Funding

The authors would like to thank the Technical Education Quality Improvement Programme (TEQIP-III), for supporting the research.

Author information

Authors and Affiliations

Authors

Contributions

Sakshi Srivastava MNNIT Allahabad (a) participated in analysis and interpretation of the data, (b) drafted the article or revised it critically for important intellectual content, and (c) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. Anil Kumar Singh MNNIT Allahabad (a) drafted the article or revised it critically for important intellectual content and (b) gave approval to the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.

Corresponding author

Correspondence to Sakshi Srivastava.

Ethics declarations

Conflict of Interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Ethical Approval and Consent to participate

Not applicable.

Human and Animal Ethics

Not applicable.

Consent for Publication

Hereby, I, Sakshi Srivastava, consciously assure that for the Big Data Analysis using Time-Based Cache, the following is fulfilled: (1) this material is the authors’ own original work, which has not been previously published elsewhere. (2) The paper is not currently being considered for publication elsewhere.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, S., Agrahari, S. & Singh, A.K. Early Spam Detection Using Time-Based Cache in Graph database. New Gener. Comput. 41, 607–634 (2023). https://doi.org/10.1007/s00354-023-00223-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-023-00223-4

Keywords

Navigation