Abstract:
Motif discovery is the problem of finding common substrings in a set of biological strings. Therefore, it can be applied to find Transcription Factors Binding Sites (TFBS...Show MoreMetadata
Abstract:
Motif discovery is the problem of finding common substrings in a set of biological strings. Therefore, it can be applied to find Transcription Factors Binding Sites (TFBS) that have common patterns (motifs). The Quorum Planted (1, d, q) Motif Search (gPMS) is a version of Planted (1, d) motif discovery where the motif of length I occurs in at least q percent of the sequences with up to d misnialches. The recent introduction of technologies such as ChIP (chromatin immunoprecipitation) experiments poses further challenges for algorithm developers, as the outputs of such experiments contain thousands of sequences. ChIP-Seq (ChIP sequencing, to analyze protein interactions with DNA) has gained considerable attention in the field. 1be focus of this paper is to present an approximate algorithm, quorum Strong Motif Finder (gSMF) that returns up to k highest ranked (strongest) motifs in at least q percent of the data sequences. The proposed algorithm has been tested on ChIP-Seq (large) data that was sampled using the SamScicct algorithm. In comparisons with the FMotif algorithm, the experimental results show that gSMF is faster and returns predicted motifs similar to published ones in the literature and to motifs discovered by a tool that uses the established motif finding algorithms of AlignACE, MEME, MDscan, Trawler, and Weeder.
Date of Conference: 20-22 May 2019
Date Added to IEEE Xplore: 12 September 2019
ISBN Information: