Abstract
Recently, as the amount of genetic information has been increasing following the completion of the human genome project, bioinformatics information management has been coming to the fore. However, since bioinformatics information is composed of diverse kinds of genetic information, users cannot easily approach and use it. In the present paper, a high-dimensionality information management scheme is proposes that enables users to select those pieces of bioinformatics information that are highly frequently used using the Bernoulli distribution so that users can easily approach those pieces of bioinformatics information that are preferred by them. The proposed scheme is an approach to high-dimensionality priority selection that requires the presentation of two or more pieces of bioinformatics information. In addition, in the case of the proposed scheme, since the order of priority of information is determined based on the kinds, functions, and characteristics of bioinformatics information, users can easily approach bioinformatics information according to their purpose of use of the information. According to the results of experiments, the proposed scheme showed a success rate 11.6 % higher than that of existing schemes in terms of bioinformatics information searches and the delay time of bioinformatics information services used by independent users was shown to be 17.3 % shorter than that of existing schemes .





Similar content being viewed by others

References
Wang, M.D.: In the spotlight: bioinformatics. IEEE Rev. Biomed. Eng. 6, 3–8 (2013)
Irsoy, O., Yildiz, O.T., Alpaydin, E.: Design and analysis of classifier learning experiments in bioinformatics: survey and case studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(6), 1663–1675 (2012)
Chen, Y.-P.P.: Guest editorial: advanced algorithms of bioinformatics. IEEE Trans. Comput. Biol. Bioinform. 10(2), 273 (2013)
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)
Houle, M.E., Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? Lecture notes in computer science. Sci. Stat. Database Manag. 6187, 482–500 (2010)
Agrawal, R., Gehrke, J., Gunopulos, P., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11, 5–33 (2005)
K. Kailing, H. P. Kriegel, P. Kröger, “Density-Connected Subspace Clustering for High-Dimensional Data,” In Proc. of the 2004 SIAM International Conference on Data Mining, pp. 246, 2004
Cordeiro De Amorim, R., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognition 45(3), 1061 (2012)
Böhm, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Density connected clustering with local subspace preferences. In: Proceeeding of Fourth IEEE International Conference on Data Mining (ICDM’04), p. 27 (2004)
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. ACM SIGMOD Record, p. 61. ACM, New York (1999)
Kriegel, H., Kröger, P., Renz, M., Wurst S.: A generic framework for efficient subspace clustering of high-dimensional data. In: Proceeding of Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 250–257 (2005)
Andersson, T., Handel, P.: Multiple-tone estimation by IEEE standard 1057 and the expectation-maximization algorithm. In: Proceeding of the 20th IEEE Instrumentation and Measurement Technology Conference, vol. 1, pp. 739–742 (2003)
Wang, W.: Big data, big challenges. In: Proceeding of 2014 IEEE International Conference on Semantic Computing (ICSC), p. 6 (2014)
Sowe, S.K., Kimata, T., Dong, M., Zettsu, K.: Managing heterogeneous sensor data on a big data platform: IoT services for data-intensive science. In: Proceeding of 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW), pp. 295–300 (2014)
Kashlev, A., Lu, S.: A system architecture for running big data workflows in the cloud. In: Proceeding of 2014 IEEE International Conference on Services Computing (SCC), pp. 51–58 (2014)
Fang, C., Yang, F., Zeng, X., Li, X.: BMF-BD: Bayesian model fusion on Bernoulli distribution for efficient yield estimation of integrated circuits. In: Proceeding of 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6 (2014)
Sagiroglu S., Sinanc, D.: Big datga: a review. In: Proceeding of 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47 (2013)
Katal, A., Wazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: Proceeding of 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409 (2013)
Hansmann, T., Niemeyer, P.: Big data—characterizing an emerging research field using topic models. In: Proceeding of 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence(WI) aqnd Intelligent Agent Technologies (IAT), pp. 43–51 (2014)
Acknowledgments
This Research was supported by the Tongmyong University Research Grants 2016.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeong, YS., Shin, SS. & Han, KH. High-dimensionality priority selection scheme of bioinformatics information using Bernoulli distribution. Cluster Comput 20, 539–546 (2017). https://doi.org/10.1007/s10586-016-0622-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0622-5