Skip to main content

Crawling and Extracting Process Data from the Web

  • Conference paper
Advanced Data Mining and Applications (ADMA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

Abstract

In this paper, we address the design and implementation of a supporting system for process-based searches. This supporting system can efficiently crawl the Web and extract processes from obtained data. The retrieved processes can then be used in a Process-Based Search Engine (PBSE). In this work, a process is defined as a sequence of activities for achieving a goal. A PBSE uses the extracted processes to transform an original query into multiple sub-queries, and then performs keyword search for each transformed sub-query. To facilitate effective process-based searches, a large number of high quality processes are required. This paper focuses on how to efficiently and effectively build a database of processes by exploring the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: 1st international Conference on Scalable information Systems, InfoScale 2006, vol. 152. ACM, New York (2006)

    Google Scholar 

  2. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 469–483. Springer, Heidelberg (1998)

    Google Scholar 

  3. van der Aalst, W.M., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.: Workflow mining: a survey of issues and approaches. Data Knowl. Eng. 47(2), 237–267 (2003)

    Article  Google Scholar 

  4. Rembert, A.J.: Comprehensive workflow mining. In: 44th Annual Southeast Regional Conference. ACM-SE 44, pp. 222–227. ACM, New York (2006)

    Chapter  Google Scholar 

  5. Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process mining: an experimental Evaluation. Journal of Data Mining and Knowledge Discovery 14(2), 245–304 (2007)

    Article  MathSciNet  Google Scholar 

  6. Turner, C.J., Tiwari, A., Mehnen, J.: A genetic programming approach to business process mining. In: 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1307–1314. ACM, New York (2008)

    Google Scholar 

  7. Cook, J.E., Wolf, A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998)

    Article  Google Scholar 

  8. Cook, J.E., Wolf, A.L.: Software process validation: quantitatively measuring the correspondence of a process to a model. ACM Trans. Softw. Eng. Methodol. 8(2), 147–176 (1999)

    Article  Google Scholar 

  9. Jensen, C., Scacchi, W.: Applying a Reference Framework to Open Source Software Process Discovery. In: 1st Workshop on Open Source in an Industrial Context, Anaheim, CA (2003)

    Google Scholar 

  10. Jensen, C., Scacchi, W.: Data Mining for Software Process Discovery in Open Source Software Development Communities. In: Workshop on Mining Software Repositories, Edinburgh, Scotland, pp. 96–100 (2004)

    Google Scholar 

  11. Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) 3rd Annual Conference on Autonomous Agents. AGENTS 1999, pp. 190–197. ACM, New York (1999)

    Google Scholar 

  12. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: Apers, P.M., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) 27th international Conference on Very Large Data Bases, pp. 129–138. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  13. Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: 14th international Conference on World Wide Web, pp. 66–75. ACM, New York (2005)

    Chapter  Google Scholar 

  14. Mundluru, D., Xia, X.: Experiences in crawling deep web in the context of local search. In: 2nd international Workshop on Geographic information Retrieval. GIR 2008, pp. 35–42. ACM, New York (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, Y., Agah, A. (2009). Crawling and Extracting Process Data from the Web. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics