Abstract
In this paper, we address the design and implementation of a supporting system for process-based searches. This supporting system can efficiently crawl the Web and extract processes from obtained data. The retrieved processes can then be used in a Process-Based Search Engine (PBSE). In this work, a process is defined as a sequence of activities for achieving a goal. A PBSE uses the extracted processes to transform an original query into multiple sub-queries, and then performs keyword search for each transformed sub-query. To facilitate effective process-based searches, a large number of high quality processes are required. This paper focuses on how to efficiently and effectively build a database of processes by exploring the Web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: 1st international Conference on Scalable information Systems, InfoScale 2006, vol. 152. ACM, New York (2006)
Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workflow Logs. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 469–483. Springer, Heidelberg (1998)
van der Aalst, W.M., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters, A.J.: Workflow mining: a survey of issues and approaches. Data Knowl. Eng. 47(2), 237–267 (2003)
Rembert, A.J.: Comprehensive workflow mining. In: 44th Annual Southeast Regional Conference. ACM-SE 44, pp. 222–227. ACM, New York (2006)
Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process mining: an experimental Evaluation. Journal of Data Mining and Knowledge Discovery 14(2), 245–304 (2007)
Turner, C.J., Tiwari, A., Mehnen, J.: A genetic programming approach to business process mining. In: 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1307–1314. ACM, New York (2008)
Cook, J.E., Wolf, A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998)
Cook, J.E., Wolf, A.L.: Software process validation: quantitatively measuring the correspondence of a process to a model. ACM Trans. Softw. Eng. Methodol. 8(2), 147–176 (1999)
Jensen, C., Scacchi, W.: Applying a Reference Framework to Open Source Software Process Discovery. In: 1st Workshop on Open Source in an Industrial Context, Anaheim, CA (2003)
Jensen, C., Scacchi, W.: Data Mining for Software Process Discovery in Open Source Software Development Communities. In: Workshop on Mining Software Repositories, Edinburgh, Scotland, pp. 96–100 (2004)
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) 3rd Annual Conference on Autonomous Agents. AGENTS 1999, pp. 190–197. ACM, New York (1999)
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: Apers, P.M., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) 27th international Conference on Very Large Data Bases, pp. 129–138. Morgan Kaufmann Publishers, San Francisco (2001)
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: 14th international Conference on World Wide Web, pp. 66–75. ACM, New York (2005)
Mundluru, D., Xia, X.: Experiences in crawling deep web in the context of local search. In: 2nd international Workshop on Geographic information Retrieval. GIR 2008, pp. 35–42. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Agah, A. (2009). Crawling and Extracting Process Data from the Web. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-03348-3_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)