Research of network data mining based on reliability source under big data environment

Li, Jinhai; He, Youshi; Ma, Yunlei

doi:10.1007/s00521-016-2349-x

Research of network data mining based on reliability source under big data environment

Original Article
Published: 23 May 2016

Volume 28, pages 327–335, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jinhai Li^1,2,
Youshi He² &
Yunlei Ma³

719 Accesses
4 Citations
Explore all metrics

Abstract

In the era of big data, facing vast amounts of network data, only identifying the reliable data source can the researchers extract the original data that can be used in scientific research. Building reliable network data mining model based on the improvement of PageRank algorithm with applying each improved algorithm. Then the model is divided into three modules: the first, use PageRank and TrustRank to eliminate cheating webpages; then, refine webpages which related to research topic highly by TC-PageRank which combined with the topic relevancy between webpages and weight of time difference; finally, determine the authoritative webpages of the original data source by the improved HITS which considered the influence of the similarity between webpage and research topic and the amplification of webpage links to the authoritative webpages. Meanwhile, the partitioning of matrix operation based on MapReduce reduces the time and space complexity of the algorithms. And the feasibility and accuracy of the method are verified by comparative analysis of the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Analysis of Key Nodes in Complex Social Networks

The Optimization and Improvement of MapReduce in Web Data Mining

Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization

Article 24 October 2017

References

Malone J, McGarry K, Wermter S et al (2006) Data mining using rule extraction from Kohonen self-organising maps [J]. Neural Comput Appl 15(1):9–17
Article Google Scholar
Mohanty AK, Senapati MR, Lenka SK (2013) An improved data mining technique for classification and detection of breast cancer from mammograms [J]. Neural Comput Appl 22(1):303–310
Article Google Scholar
Bhardwaj AK, Singh M (2015) Data mining-based integrated network traffic visualization framework for threat detection [J]. Neural Comput Appl 26(1):117–130
Article Google Scholar
Small SG, Medsker L (2014) Review of information extraction technologies and applications [J]. Neural Comput Appl 25(3):533–548
Article Google Scholar
Cao XY, Zhang X, Liu L et al (2014) Research on internet public opinion heat based on the response level of emergencies [J]. Chin J Manag Sci 22(3):82–89
Google Scholar
Yin GP (2012) What online reviews are more useful by consumers’ thought? [J]. Manag World 12:115–124
Google Scholar
Ahuja MS, Bal DJS, Varnica B (2014) Web Crawler: extracting the web data [J]. Int J Comput Trends Technol 13(3):132–137
Article Google Scholar
Xu S, Yoon HJ, Tourassi G (2014) A user-oriented web crawler for selectively acquiring online content in e-health research [J]. Bioinformatics 30(1):104–114
Article Google Scholar
Si XM, Liu Y (2011) Influence of internet chat rooms on network public opinion [J]. J Internet Technol 12(3):393–398
Google Scholar
Chen L, Qi L, Wang F (2012) Comparison of feature-level learning methods for mining online consumer reviews [J]. Expert Syst Appl 39(10):9588–9601
Article Google Scholar
Stvilia B, Gasser L, Twidale MB et al (2007) A framework for information quality assessment [J]. J Am Soc Inform Sci Technol 58(12):1720–1733
Article Google Scholar
Hilbert M, Lopez P (2011) The world’s technological capacity to store, communicate, and compute information [J]. Science 332(6025):60–65
Article Google Scholar
Page L, Brin S, Motwani R et al. (1998) The PageRank citation ranking: bringing order to the web [EB/OL]. http://ilpubs.Stanford.edu: 8090/422. Accessed 19 Dec 1998
Richardson M, Domingos P (2002) The intelligent surfer: probabilistic combination of link and content information in PageRank [J]. Adv Neural Inf Process Syst 14:673–680
Google Scholar
Haveliwala TH (2002) Topic-sensitive PageRank [C]. In: Proceedings of the 11th international world wide web conference, Hawaii, pp 517–526
Chang Q, Zhou MQ, Geng GH (2007) PageRank and HITS-based web search [J]. Comput Technol Dev 18(7):77–79
Google Scholar
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters [C]. In: Proceedings of the 6th conference on symposium on operating systems design and implementation, USENIX Association

Download references

Acknowledgments

This study was funded by the National Natural Science Foundation of China (71302087) and Graduate Innovative Projects of Jiangsu Province in 2014 (KYZZ_0287).

Author information

Authors and Affiliations

Taizhou University, Taizhou, 225300, China
Jinhai Li
School of Management, Jiangsu University, Zhenjiang, 212013, China
Jinhai Li & Youshi He
Faculty of Science, Jiangsu University, Zhenjiang, 212013, China
Yunlei Ma

Authors

Jinhai Li
View author publications
You can also search for this author in PubMed Google Scholar
Youshi He
View author publications
You can also search for this author in PubMed Google Scholar
Yunlei Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinhai Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., He, Y. & Ma, Y. Research of network data mining based on reliability source under big data environment. Neural Comput & Applic 28 (Suppl 1), 327–335 (2017). https://doi.org/10.1007/s00521-016-2349-x

Download citation

Received: 09 March 2015
Accepted: 09 May 2016
Published: 23 May 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00521-016-2349-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research of network data mining based on reliability source under big data environment

Abstract

Access this article

Similar content being viewed by others

The Analysis of Key Nodes in Complex Social Networks

The Optimization and Improvement of MapReduce in Web Data Mining

Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Informed consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research of network data mining based on reliability source under big data environment

Abstract

Access this article

Similar content being viewed by others

The Analysis of Key Nodes in Complex Social Networks

The Optimization and Improvement of MapReduce in Web Data Mining

Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Informed consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation