Crawling and Extracting Process Data from the Web

Liu, Yaling; Agah, Arvin

doi:10.1007/978-3-642-03348-3_55

Yaling Liu²⁵ &
Arvin Agah²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2303 Accesses

Abstract

In this paper, we address the design and implementation of a supporting system for process-based searches. This supporting system can efficiently crawl the Web and extract processes from obtained data. The retrieved processes can then be used in a Process-Based Search Engine (PBSE). In this work, a process is defined as a sequence of activities for achieving a goal. A PBSE uses the extracted processes to transform an original query into multiple sub-queries, and then performs keyword search for each transformed sub-query. To facilitate effective process-based searches, a large number of high quality processes are required. This paper focuses on how to efficiently and effectively build a database of processes by exploring the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Integrating Textual and Model-Based Process Descriptions for Comprehensive Process Search

Searching textual and model-based process descriptions based on a unified data format

Article Open access 22 December 2017

Process Querying: Methods, Techniques, and Applications

References

Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: 1st international Conference on Scalable information Systems, InfoScale 2006, vol. 152. ACM, New York (2006)
Google Scholar
Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 469–483. Springer, Heidelberg (1998)
Google Scholar
van der Aalst, W.M., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.: Workflow mining: a survey of issues and approaches. Data Knowl. Eng. 47(2), 237–267 (2003)
Article Google Scholar
Rembert, A.J.: Comprehensive workflow mining. In: 44th Annual Southeast Regional Conference. ACM-SE 44, pp. 222–227. ACM, New York (2006)
Chapter Google Scholar
Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process mining: an experimental Evaluation. Journal of Data Mining and Knowledge Discovery 14(2), 245–304 (2007)
Article MathSciNet Google Scholar
Turner, C.J., Tiwari, A., Mehnen, J.: A genetic programming approach to business process mining. In: 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1307–1314. ACM, New York (2008)
Google Scholar
Cook, J.E., Wolf, A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998)
Article Google Scholar
Cook, J.E., Wolf, A.L.: Software process validation: quantitatively measuring the correspondence of a process to a model. ACM Trans. Softw. Eng. Methodol. 8(2), 147–176 (1999)
Article Google Scholar
Jensen, C., Scacchi, W.: Applying a Reference Framework to Open Source Software Process Discovery. In: 1st Workshop on Open Source in an Industrial Context, Anaheim, CA (2003)
Google Scholar
Jensen, C., Scacchi, W.: Data Mining for Software Process Discovery in Open Source Software Development Communities. In: Workshop on Mining Software Repositories, Edinburgh, Scotland, pp. 96–100 (2004)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) 3rd Annual Conference on Autonomous Agents. AGENTS 1999, pp. 190–197. ACM, New York (1999)
Google Scholar
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: Apers, P.M., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) 27th international Conference on Very Large Data Bases, pp. 129–138. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: 14th international Conference on World Wide Web, pp. 66–75. ACM, New York (2005)
Chapter Google Scholar
Mundluru, D., Xia, X.: Experiences in crawling deep web in the context of local search. In: 2nd international Workshop on Geographic information Retrieval. GIR 2008, pp. 35–42. ACM, New York (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering & Computer Science, The University of Kansas, 1520 West 15th Street, Lawrence, KS, 66045-7621, USA
Yaling Liu & Arvin Agah

Authors

Yaling Liu
View author publications
You can also search for this author in PubMed Google Scholar
Arvin Agah
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Science & Engineering Institute, School of Education Technology, Beijing Normal University, Xinjiekouwai Ave. 19, 100875, Beijing, China
Ronghuai Huang
The Hong Kong University of Science and Technology, Clear Water Bay,, Hong Kong, Hong Kong
Qiang Yang
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
João Gama
School of Information, Zhongguancum, Renmin University, 100872, Beijing, China
Xiaofeng Meng
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, St. Lucia, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Agah, A. (2009). Crawling and Extracting Process Data from the Web. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-03348-3_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics