Skip to main content

Advertisement

Log in

TINB: a topical interaction network builder from WWW

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

Social network is a collection of people generally called ‘actors’ who are connected to each other based on some association criteria like a friend, follow, co-authorship, co-workers, etc. Interaction networks are the generalization of social networks. In recent developments of data sciences, analytics has applications in every significant area such as economy, general elections, epidemics, terrorism detection, clustering, marketing, etc. All of these areas require interaction data of various entities. Though the social network is a significant reservoir for such data, it covers only one segment of the information. A right amount of information is available on the web, but it is not useful for analytics in its raw form. This paper presents a framework that collects information from www using a parameterized crawler and prepares the social network-like structure of web pages, called interaction network. The interaction network prepared is similar to any traditional social network in every aspect. The web pages are selected based on contexts of the URLs found in the nearby vicinity of URLs, decided by predefined parameters. The proposed crawler is tested over several topics covering thousands of pages. More than 50 percent harvest rate is achieved by the proposed crawler. Properties of the interaction network such as degree distribution, clustering coefficient, modularity, distribution of communities, diameter and page rank have been investigated to establish the fact that it behaves like any traditional social network. The idea of preparing interaction network is extendible to the field of newage technologies like IoT, big data, deepweb, prediction models etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. https://dmoz-odp.org.

References

  1. Srivastava, A., Pillai, A., & Gupta, D. J. (2014). Social network analysis: Hardly easy. In 2014 IEEE international conference on reliability, optimization and information technology (ICROIT) (pp. 128–135). IEEE.

  2. Choudhary, R., & Solanki, A. (2015). Improved vision based algorithm for deep web data extraction. Journal of Web Engineering and Technology, 2(2), 23–32.

    Google Scholar 

  3. Sharma, A., & Solanki, A. (2015). A hybrid page rank algorithm for web Pages. International Journal for Scientific Research & Development, 3(3), 3702–3708.

    Google Scholar 

  4. Kneifer, C. J. (2014). A comparison study on violent video games: Explained by the gamers themselves (Doctoral dissertation, University of South Florida).

  5. Leskovec, J., Kleinberg, J., & Faloutsos, C. (2005). Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (pp. 177–187).

  6. Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (ACM TKDD), 1, Article 2.

  7. Yang, J., & Leskovec, J. (2015). Defining and evaluating network communities based on ground-truth. ICDM, 42, 181–213. https://doi.org/10.1007/s10115-013-0693-z.

    Article  Google Scholar 

  8. Leskovec, J., Lang, K., Dasgupta, A., & Mahoney, M. (2009). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1), 29–123.

    Article  MathSciNet  Google Scholar 

  9. Leskovec, J., Huttenlocher, D., Kleinberg, J. (2010). Predicting positive and negative links in online social networks. In WWW.

  10. Leskovec, J., Adamic, L., & Adamic, B. (2007). The dynamics of viral marketing. ACM Transactions on the Web (ACM TWEB) 1(1), Article 1.

  11. Paranjape, A., Benson, A. R., & Leskovec, J. (2017). Motifs in temporal networks. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 601–610).

  12. Kumar, S., Hooi, B., Makhija, D., Kumar, M., Subrahmanian, V. S., & Faloutsos, C. (2018). REV2: Fraudulent user prediction in rating platforms. In 11th ACM international conference on web search and data mining (WSDM).

  13. Kumar, S.,Hamilton, W.L., Leskovec, J., & Jurafsky, D. (2018). Community interaction and conflict on the web. In World wide web conference.

  14. Panzarasa, P., Opsahl, T., & Carley, K. M. (2009). Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60, 911–932, Article 5.

  15. McAuley, J., & Leskovec, J. (2012). Image labeling on a network: Using social-network metadata for image classification. In ECCV.

  16. McAuley, J., & Leskovec, J. (2013). From amateurs to connoisseurs: modelling the evolution of user expertise through online reviews. In WWW.

  17. Bai, C., Kumar, S., Leskovec, J., Metzger, M., Nunamaker, J. F., & Subrahmanian, V. S. (2019). Predicting visual focus of attention in multi-person discussion videos. In International joint conference on artificial intelligence (IJCAI).

  18. Leskovec, J., Backstrom, L., & Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle. In International conference on knowledge discovery and data mining ACM SIGKDD.

  19. McBryan, O. A. (1994). Genvl and WWWW: Tools for taming the web. Computer Networks and ISDN Systems., 27(2), 308.

    Article  Google Scholar 

  20. Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 163–177, Article 2.

  21. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems., 30, 107–117.

    Article  Google Scholar 

  22. Craswell, N., Hawking, D., & Robertson, S. E. (2001). Effective site finding using link anchor information. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 250–257).

  23. Davison, B. D. (2000). Topical locality in the web. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 272–279).

  24. Bra, P. M. E. D., & Post, R. D. J. (1994). Information retrieval in the world wide web: making client-based searching feasible. Computer Networks and ISDN Systems, 27(2), 183–192.

    Article  Google Scholar 

  25. Chakrabarti, S., Berg, M. V. D., & Dom, B. (1999). Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 31(11–16), 1623–1640.

    Article  Google Scholar 

  26. Iwazume, M., Shirakami, K., Hatadani, K., Takeda, H., & Nishida, T. (1996). IICA: An ontology-based internet navigation system. In Proceedings AAAI-96 workshop internet-based information systems.

  27. Hersovici, M., Jacovi, M., Maarek, Y. S., Pelleg, D., Shtalhaim, M., & Ur, S. (1998). The shark-search algorithm: An application: Tailored web site mapping. Computer Networks and ISDN Systems, 30(1–7), 317–326.

    Article  Google Scholar 

  28. Menczer, F., Pant, G., Ruiz, M., & Srinivasan, P. (2001). Evaluating topic-driven web crawlers. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 241–249).

  29. Subramanyam, M., Phanindra, G. V. R., Tiwari, M. & Jain, M. (2001). Focused crawling using TFIDF centroid. In Hypertext retrieval and mining (CS610) class project.

  30. Bedi, P., Thukral, A., & Banati, H. (2012). A multi-threaded semantic focused crawler. Journal of Computer Science and Technology, 27(6), 1233–1242.

    Article  Google Scholar 

  31. Dong, H., & Hussain, F. K. (2014). Self-adaptive semantic focused crawler for mining services information discovery. IEEE Transactions on Industrial Informatics, 10(2), 1616–1626.

    Article  Google Scholar 

  32. Du, Y. J., Hai, Y. F., Xie, C. Z., & Wang, X. M. (2014). An approach for selecting seed URLs of focused crawler based on user-interest ontology. Applied Soft Computing, 14(Part C), 663–676.

    Article  Google Scholar 

  33. Yang, S. Y. (2010). A focused crawler with ontology-supported website models for information agents. In P. Bellavista, R. S. Chang, H. C. Chao, S. F. Lin, & P. M. A. Sloot (Eds.), Advances in grid and pervasive computing. GPC (Vol. 6104)., Lecture notes in computer science Berlin: Springer.

    Google Scholar 

  34. Al-Turjman, F. (2017). Energy-aware data delivery framework for safety-oriented mobile IoT. IEEE Sensors Journal, 18(1), 470–478.

    Article  Google Scholar 

  35. Al-Turjman, F., & Alturjman, S. (2018). 5G/IoT-enabled UAVs for multimedia delivery in industry-oriented applications. Multimedia Tools and Applications, 79, 1–22.

    Google Scholar 

  36. Al‐Turjman, F. (2019). Smart‐city medium access for smart mobility applications in Internet of Things. Transactions on Emerging Telecommunications Technologies. https://doi.org/10.1002/ett.3723.

    Article  Google Scholar 

  37. Al-Turjman, F., & Malekloo, A. (2019). Smart parking in IoT-enabled cities: A survey. Sustainable Cities and Society, 49, 101608.

    Article  Google Scholar 

  38. Ullah, F., Naeem, H., Naeem, M. R., Jabbar, S., Khalid, S., Al‐Turjman, F., & Abuarqoub, A. (2019). Detection of clone scammers in Android markets using IoT‐based edge computing. Transactions on Emerging Telecommunications Technologies. https://doi.org/10.1002/ett.3791.

    Article  Google Scholar 

  39. Singh, J., & Solanki, A. (2016). A deep web search engine for deep page. In International conference on communication and computing systems (ICCCS-2016), Taylor and Francis, at Dronacharya College of Engineering, Gurgaon, 9–11 September (pp. 919–925).

  40. Solanki, A. & Kumar, E. (2010). Online query submission for deep web in specific domains. In Proceedings of 2nd International Conference on Computer Engineering and Technology, Chengdu, China, indexed in IEEE Digital Library (vol. 3, pp. 32–34).

  41. Srivastava, A., Pillai, A., & Gupta, D. J. (2018). Crawling social web with cluster coverage sampling. In M. Hoda, N. Chauhan, S. Quadri, & P. Srivastava (Eds.), Software engineering Advances in intelligent systems and computing (Vol. 731, pp. 103–114). Berlin: Springer.

    Google Scholar 

  42. Erdos, P., & Renyi, A. (1959). On random graphs. Publ. Math. Debrecen., 6, 290–297.

    MathSciNet  MATH  Google Scholar 

  43. Erdos, P., & Renyi, A. (1960). On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kozl., 5, 17–61.

    MathSciNet  MATH  Google Scholar 

  44. Erdos, P., & Renyi, A. (1961). On the strength of connectedness of a random graph. Acta Math. Acad. Sci. Hungar., 12, 261–267.

    Article  MathSciNet  Google Scholar 

  45. Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.

    Article  MathSciNet  Google Scholar 

  46. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory Experiment, 10, P10008.

    Article  Google Scholar 

  47. Kumar, A., Sangwan, S. R., & Nayyar, A. (2020). Multimedia social big data: Mining. In S. Tanwar, S. Tyagi, & N. Kumar (Eds.), Multimedia big data computing for IoT applications. Intelligent Systems Reference Library (Vol. 163). Singapore: Springer. https://doi.org/10.1007/978-981-13-8759-3_11

  48. Patel, D., Narmawala, Z., Tanwar, S., & Singh, P. K. (2018). A systematic review on scheduling public transport using IoT as tool. In B. Panigrahi, M. Trivedi, K. Mishra, S. Tiwari, & P. Singh (Eds.), Smart innovations in communication and computational sciences. Advances in intelligent systems and computing (Vol. 670, pp. 39–48). Singapore: Springer.

    Chapter  Google Scholar 

  49. Nayyar, A. (2019). Instant approach to software testing: Principles, applications, techniques, and practices. Delhi: BPB Publications.

    Google Scholar 

  50. Diwaker, C., Tomar, P., Solanki, A., Nayyar, A., Jhanjhi, N. Z., Abdullah, A., et al. (2019). A new model for predicting component-based software reliability using soft computing. IEEE Access, 7, 147191–147203.

    Article  Google Scholar 

  51. Gheisari, M., Panwar, D., Tomar, P., Harsh, H., Zhang, X., Solanki, A., et al. (2019). An optimization model for software quality prediction with case study analysis using MATLAB. IEEE Access, 7, 85123–85138.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arun Solanki.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, A., Pillai, A., Punj, D. et al. TINB: a topical interaction network builder from WWW. Wireless Netw 27, 589–608 (2021). https://doi.org/10.1007/s11276-020-02469-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-020-02469-y

Keywords

Navigation