skip to main content
10.1145/2463372.2463532acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Automatic string replace by examples

Published: 06 July 2013 Publication History

Abstract

Search-and-replace is a text processing task which may be largely automated with regular expressions: the user must describe with a specific formal language the regions to be modified (search pattern) and the corresponding desired changes (replacement expression). Writing and tuning the required expressions requires high familiarity with the corresponding formalism and is typically a lengthy, error-prone process.
In this paper we propose a tool based on Genetic Programming (GP) for generating automatically both the search pattern and the replacement expression based only on examples. The user merely provides examples of the input text along with the desired output text and does not need any knowledge about the regular expression formalism nor about GP. We are not aware of any similar proposal. We experimentally evaluated our proposal on 4 different search-and-replace tasks operating on real-world datasets and found good results, which suggests that the approach may indeed be practically viable.

References

[1]
R. Babbar and N. Singh. Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text. In Proceedings of the fourth workshop on Analytics for noisy unstructured text data, AND '10, pages 43--50, New York, NY, USA, 2010. ACM.
[2]
D. Barrero, D. Camacho, and M. R-Moreno. Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. Data Mining and Multi-agent Integration, pages 143--154, 2009.
[3]
A. Bartoli, G. Davanzo, A. De Lorenzo, M. Mauri, E. Medvet, and E. Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th GECCO conference companion, pages 1477--1478. ACM, 2012.
[4]
F. Brauer, R. Rieger, A. Mocan, and W. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1285--1294. ACM, 2011.
[5]
A. Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 2007 GECCO conference, GECCO '07, pages 2643--2646, New York, NY, USA, 2007. ACM.
[6]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2):182--197, apr 2002.
[7]
B. Dunay, F. Petry, and B. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, volume 1, pages 396--400. IEEE, 1994.
[8]
J. Friedl. Mastering Regular Expressions. O'Reilly Media, Inc., 2006.
[9]
A. González-Pardo, D. Barrero, D. Camacho, and M. R-Moreno. A case study on grammatical-based representation for regular expression evolution. In Y. Demazeau, F. Dignum, J. Corchado, J. Bajo, R. Corchuelo, E. Corchado, F. Fernández-Riverola, V. Julián, P. Pawlewski, and A. Campbell, editors, Trends in Practical Applications of Agents and Multiagent Systems, volume 71 of Advances in Intelligent and Soft Computing, pages 379--386. Springer Berlin / Heidelberg, 2010.
[10]
E. Kinber. Learning regular expressions from representative examples and membership queries. Grammatical Inference: Theoretical Results and Applications, pages 94{108, 2010.
[11]
Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and A. Arbor. Regular Expression Learning for Information Extraction. Computational Linguistics, (October):21--30, 2008.
[12]
E. Medvet and A. Bartoli. Brand-related events detection, classification and summarization on twitter. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT '12. IEEE Computer Society, 2012, to appear.
[13]
R. Miller and A. Marshall. Cluster-based find and replace. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 57--64. ACM, 2004.
[14]
R. Miller and B. Myers. Lapis: Smart editing with text structure. In CHI'02 extended abstracts on Human factors in computing systems, pages 496{497. ACM, 2002.
[15]
R. Miller and B. Myers. Multiple selections in smart text editing. In Proceedings of the 7th international conference on Intelligent user interfaces, pages 103--110. ACM, 2002.
[16]
R. Miller, B. Myers, et al. Lightweight structured text processing. In Proceedings of 1999 USENIX Annual Technical Conference, pages 131--144, 1999.
[17]
E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from email: applying named entity recognition to informal text. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT '05, pages 443{450, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics.
[18]
B. Svingen. Learning Regular Languages Using Genetic Programming. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998 Proceedings of the Third Annual Conference, pages 374--376. Morgan Kaufmann, 1998.
[19]
M. Tomita. Dynamic construction of finite automata from examples using hill-climbing. Proceedings of the fourth annual cognitive science conference, pages 105--108, 1982.
[20]
T. Wu and W. Pottenger. A semi-supervised active learning algorithm for information extraction from textual data. Journal of the American Society for Information Science and Technology, 56(3):258--271, 2005.

Cited By

View all
  • (2024)Computational peptide discovery with a genetic programming approachJournal of Computer-Aided Molecular Design10.1007/s10822-024-00558-038:1Online publication date: 3-Apr-2024
  • (2021)Automatic Search-and-Replace From Examples With Coevolutionary Genetic ProgrammingIEEE Transactions on Cybernetics10.1109/TCYB.2019.291833751:5(2612-2624)Online publication date: May-2021
  • (2015)Data Quality ChallengeJournal of Data and Information Quality10.1145/27869836:4(1-4)Online publication date: 19-Oct-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '13: Proceedings of the 15th annual conference on Genetic and evolutionary computation
July 2013
1672 pages
ISBN:9781450319638
DOI:10.1145/2463372
  • Editor:
  • Christian Blum,
  • General Chair:
  • Enrique Alba
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. genetic programming
  2. search-and-replace

Qualifiers

  • Research-article

Conference

GECCO '13
Sponsor:
GECCO '13: Genetic and Evolutionary Computation Conference
July 6 - 10, 2013
Amsterdam, The Netherlands

Acceptance Rates

GECCO '13 Paper Acceptance Rate 204 of 570 submissions, 36%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Computational peptide discovery with a genetic programming approachJournal of Computer-Aided Molecular Design10.1007/s10822-024-00558-038:1Online publication date: 3-Apr-2024
  • (2021)Automatic Search-and-Replace From Examples With Coevolutionary Genetic ProgrammingIEEE Transactions on Cybernetics10.1109/TCYB.2019.291833751:5(2612-2624)Online publication date: May-2021
  • (2015)Data Quality ChallengeJournal of Data and Information Quality10.1145/27869836:4(1-4)Online publication date: 19-Oct-2015
  • (2015)Learning Text Patterns Using Separate-and-Conquer Genetic ProgrammingGenetic Programming10.1007/978-3-319-16501-1_2(16-27)Online publication date: 15-Mar-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media