Abstract
The retrieval of biological data is quite trending nowadays as a significant amount of research is being carried out in this area. There are numerous algorithms being proposed for analyzing biological data based on pattern matching based approach. Several new pattern matching based algorithms ranging from brute force approach to most recent algorithms are being developed. As it is well understood that for retrieval of data, the retrieval algorithm must be fast in terms of execution, very less attention has been paid towards the factors which might affect the execution time of an algorithm. Factors like pattern length, type of datasets, input size and other related factors can affect the execution of an algorithm, but how much is really unknown and unaddressed. Hence, this paper has addressed this problem by utilizing factorial design 2k. The factorial technique is designed and implemented in such a way, which will give new insight to researchers while proposing or developing algorithms for retrieving biological data. The study shows for the algorithm to be efficient, the main motivating factor is pattern length. Pattern length is having a 38.5% effect on the execution time of an algorithm followed by the type of dataset with the impact of 18%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Egholm M, Margulies M, Altman W, Attiya S, Bader J, Bemben L et al (2005) Genome sequencing in open microfabricated high density picoliter reactors. Nature 437:376–380
E.P. Consortium (2004) The ENCODE (ENCyclopedia of DNA elements) project. Science 306:636–640
G.P. Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061
Yang Z, Yu J, Kitsuregawa M (2010) Fast algorithms for top-k approximate string matching. AAAI
Faro S, Lecroq T (2013) The exact online string matching problem: a review of the most recent results. ACM Comput Surv (CSUR) 45:13
Hakak S, Kamsin A, Shivakumara P, Idris MYI (2018) Partition-based pattern matching approach for efficient retrieval of Arabic text. Malays J Comput Sci 31:200–209
Hakak S, Kamsin A, Tayan O, Idris MYI, Gani A, Zerdoumi S (2017) Preserving content integrity of digital holy Quran: survey and open challenges. IEEE Access 5:7305–7325
Hakak S, Kamsin A, Palaiahnakote S, Tayan O, Idris MYI, Abukhir KZ (2018) Residual-based approach for authenticating pattern of multi-style diacritical Arabic texts. PLoS ONE 13:e0198284
Hakak S, Kamsin A, Tayan O, Idris MYI, Gilkar GA (2017) Approaches for preserving content integrity of sensitive online Arabic content: a survey and research challenges. Inf Process Manag
Zerdoumi S, Sabri AQM, Kamsin A, Hashem IAT, Gani A, Hakak S et al (2017) Image pattern recognition in big data: taxonomy and open challenges: survey. Multimed Tools Appl 1–31
Hakak S, Kamsin A, Shivakumara P, Idris MYI, Gilkar GA (2018) A new split based searching for exact pattern matching for natural texts. PLoS ONE 13:e0200912
Hakak SI (2015) Evaluating the effect of routing protocol, packet size and DSSS rate on network performance indicators in MANET’s. Kulliyyah of Engineering, International Islamic University Malaysia
Allauzen C, Crochemore M, Raffinot M (1999) Factor oracle: a new structure for pattern matching. In: International conference on current trends in theory and practice of computer science. Springer, pp 295–310
Faro S, Lecroq T (2009) Efficient variants of the backward-oracle-matching algorithm. Int J Found Comput Sci 20:967–984
Khan ZA, Pateriya R (2012) Multiple pattern string matching methodologies: a comparative analysis. Int J Sci Res Publ 2:1–7
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Berry T, Ravindran S (1999) A fast string matching algorithm and experimental results. Stringology 16–28
Hudaib A, Al-Khalid R, Suleiman D, Itriq M, Al-Anani A (2008) A fast pattern matching algorithm with two sliding windows (TSW). J Comput Sci 4:393
Itriq M, Hudaib A, Al-Anani A, Al-Khalid R, Suleiman D (2012) Enhanced two sliding windows algorithm for pattern matching (ETSW). J Am Sci 8:607–616
Hakak S, Anwar F, Latif SA, Gilkar G, Alam M (2014) Impact of packet size and node mobility pause time on average end to end delay and jitter in MANET’s. In: 2014 International conference on computer and communication engineering (ICCCE). IEEE, pp 56–59
Hakak S, Latif SA, Anwar F, Alam MK (2014) Impact of key factors on average jitter in MANET. In: First international conference on systems informatics, modeling and simulation computer society. IEEE, pp 179–183
Hakak S, Latif SA, Anwar F, Alam M, Gilkar G (2014) Effect of mobility model and packet size on throughput in MANET’s. In: 2014 International conference on computer and communication engineering (ICCCE). IEEE, pp 150–153
Hakak S, Latif SA, Anwar F, Alam M, Gilkar G (2014) Effect of 3 key factors on average end to end delay and jitter in MANET. J ICT Res Appl 8:113–125
Jain R (1990) The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley
Acknowledgements
This research is supported by FRGS FP003A-2017, Faculty of Computer Science and Information Technology, University of Malaya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Parvez, H.M.S., Hakak, S., Gilkar, G.A., Abdur Rahman, M. (2020). Factorial Analysis of Biological Datasets. In: Uddin, M.S., Bansal, J.C. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-13-7564-4_1
Download citation
DOI: https://doi.org/10.1007/978-981-13-7564-4_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7563-7
Online ISBN: 978-981-13-7564-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)