Skip to main content
Log in

Research of network data mining based on reliability source under big data environment

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In the era of big data, facing vast amounts of network data, only identifying the reliable data source can the researchers extract the original data that can be used in scientific research. Building reliable network data mining model based on the improvement of PageRank algorithm with applying each improved algorithm. Then the model is divided into three modules: the first, use PageRank and TrustRank to eliminate cheating webpages; then, refine webpages which related to research topic highly by TC-PageRank which combined with the topic relevancy between webpages and weight of time difference; finally, determine the authoritative webpages of the original data source by the improved HITS which considered the influence of the similarity between webpage and research topic and the amplification of webpage links to the authoritative webpages. Meanwhile, the partitioning of matrix operation based on MapReduce reduces the time and space complexity of the algorithms. And the feasibility and accuracy of the method are verified by comparative analysis of the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Malone J, McGarry K, Wermter S et al (2006) Data mining using rule extraction from Kohonen self-organising maps [J]. Neural Comput Appl 15(1):9–17

    Article  Google Scholar 

  2. Mohanty AK, Senapati MR, Lenka SK (2013) An improved data mining technique for classification and detection of breast cancer from mammograms [J]. Neural Comput Appl 22(1):303–310

    Article  Google Scholar 

  3. Bhardwaj AK, Singh M (2015) Data mining-based integrated network traffic visualization framework for threat detection [J]. Neural Comput Appl 26(1):117–130

    Article  Google Scholar 

  4. Small SG, Medsker L (2014) Review of information extraction technologies and applications [J]. Neural Comput Appl 25(3):533–548

    Article  Google Scholar 

  5. Cao XY, Zhang X, Liu L et al (2014) Research on internet public opinion heat based on the response level of emergencies [J]. Chin J Manag Sci 22(3):82–89

    Google Scholar 

  6. Yin GP (2012) What online reviews are more useful by consumers’ thought? [J]. Manag World 12:115–124

    Google Scholar 

  7. Ahuja MS, Bal DJS, Varnica B (2014) Web Crawler: extracting the web data [J]. Int J Comput Trends Technol 13(3):132–137

    Article  Google Scholar 

  8. Xu S, Yoon HJ, Tourassi G (2014) A user-oriented web crawler for selectively acquiring online content in e-health research [J]. Bioinformatics 30(1):104–114

    Article  Google Scholar 

  9. Si XM, Liu Y (2011) Influence of internet chat rooms on network public opinion [J]. J Internet Technol 12(3):393–398

    Google Scholar 

  10. Chen L, Qi L, Wang F (2012) Comparison of feature-level learning methods for mining online consumer reviews [J]. Expert Syst Appl 39(10):9588–9601

    Article  Google Scholar 

  11. Stvilia B, Gasser L, Twidale MB et al (2007) A framework for information quality assessment [J]. J Am Soc Inform Sci Technol 58(12):1720–1733

    Article  Google Scholar 

  12. Hilbert M, Lopez P (2011) The world’s technological capacity to store, communicate, and compute information [J]. Science 332(6025):60–65

    Article  Google Scholar 

  13. Page L, Brin S, Motwani R et al. (1998) The PageRank citation ranking: bringing order to the web [EB/OL]. http://ilpubs.Stanford.edu: 8090/422. Accessed 19 Dec 1998

  14. Richardson M, Domingos P (2002) The intelligent surfer: probabilistic combination of link and content information in PageRank [J]. Adv Neural Inf Process Syst 14:673–680

    Google Scholar 

  15. Haveliwala TH (2002) Topic-sensitive PageRank [C]. In: Proceedings of the 11th international world wide web conference, Hawaii, pp 517–526

  16. Chang Q, Zhou MQ, Geng GH (2007) PageRank and HITS-based web search [J]. Comput Technol Dev 18(7):77–79

    Google Scholar 

  17. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters [C]. In: Proceedings of the 6th conference on symposium on operating systems design and implementation, USENIX Association

Download references

Acknowledgments

This study was funded by the National Natural Science Foundation of China (71302087) and Graduate Innovative Projects of Jiangsu Province in 2014 (KYZZ_0287).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinhai Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., He, Y. & Ma, Y. Research of network data mining based on reliability source under big data environment. Neural Comput & Applic 28 (Suppl 1), 327–335 (2017). https://doi.org/10.1007/s00521-016-2349-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2349-x

Keywords

Navigation