Abstract
Co-location mining is a classical problem in spatial pattern mining. Considering a set of boolean spatial features, the goal is to find subsets of features frequently located together. It has wide applications in environmental management, public safety, transportation or tourism. These last years, many algorithms have been proposed to extract frequent co-locations. However, most solutions do a “data-centered knowledge discovery” instead of a “expert-centered knowledge discovery”. Successfully providing useful and interpretable patterns to experts is still an open problem. In this setting, we propose a domain-driven co-location mining approach that combines constraint-based mining and cartographic visualization. Experts can push new domain constraints into the mining algorithm, resulting in more relevant patterns and more efficient extraction. Then, they can visualize solutions using a new concise and intuitive cartographic visualization of co-locations. Using this original visualization approach, they identify new interesting patterns, and use uninteresting ones to define new constraints and refine their analysis. These proposals have been integrated into a prototype based on PostGIS geographic information system. Experiments have been done using a real geological datasets studying soil erosion, and results have been validated by a domain expert.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB. Morgan Kaufmann, Burlington, Massachusetts, pp 487–499
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) SIGMOD conference. ACM Press, pp 207–216
Andrienko GL, Andrienko NV (1999) Knowledge-based visualization to support spatial data mining. In: IDA, pp 149–160
Andrienko GL, Andrienko NV, Rinzivillo S, NanniM, Pedreschi D, Giannotti F (2009) Interactive visual clustering of large collections of trajectories. In: VAST. IEEE Computer Society, pp 3–10
Arctur D, Zeiler M (2004) Designing geodatabases: case studies in Gis data modeling. Environmental Systems Research
Atherton J, Olson D, Farley L, Qauqau I (2005) Fiji watersheds at risk: watershed assessment for healthy reefs and fisheries
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Haas LM, Tiwary A (eds) SIGMOD conference. ACM Press, pp 85–93
Bertini E, Lalanne D (2010) Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery. SIGKDD Explor Newsl 11(2):9–18
Bogorny V, Valiati JF, da Silva Camargo S, Engel PM, Kuijpers B, Alvares LO (2006) Mining maximal generalized frequent geographic patterns with knowledge constraints. In: ICDM. IEEE Computer Society, pp 813–817
Boulicaut JF, Jeudy B (2010) Constraint-based data mining. In: Data mining and knowledge discovery handbook, pp 339–354
Brunk C, Kelly J, Kohavi R (1997) Mineset: an integrated system for data mining. In: KDD, pp 135–138
Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE. IEEE Computer Society, pp 443–452
Cao L (2008) Domain driven data mining (d3m). In: ICDM workshops. IEEE Computer Society, pp 74–76
Ceci M, Appice A, Malerba D (2007) Discovering emerging patterns in 1004 spatial databases: a multi-relational approach. In: PKDD, vol 4702. Springer, LNCS, pp 390–397
Celik M, Kang JM, Shekhar S (2007) Zonal co-location pattern discovery with dynamic parameters. In: ICDM. IEEE Computer Society, pp 433–438
Chen K, Liu L (2003) Validating and refining clusters via visual rendering. In: ICDM. IEEE Computer Society, pp 501–504
De Marchi F, Petit JM (2003) Zigzag: a new algorithm for mining large inclusion dependencies in database. In: ICDM. IEEE Computer Society, pp 27–34
Desmier E, Flouvat F, Gay D, Selmaoui-Folcher N (2011) A clustering-based visualization of colocation patterns. In: Desai BC, Cruz IF, Bernardino J (eds) IDEAS. ACM, pp 70–78
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54
Flouvat F, DeMarchi F, Petit JM(2004) ABS: Adaptive Borders Search of frequent itemsets. In: Bayardo RJ, Goethals B, Zaki MJ (eds) FIMI, CEUR-WS.org, CEUR Workshop Proceedings, vol 126
Flouvat F, De Marchi F, Petit JM (2009) The izi project: easy prototyping of interesting pattern mining algorithms. In: Advanced techniques for datamining and knowledge discovery. Springer, LNCS, pp 1–15
Guo D (2009) Flow mapping and multivariate visualization of large spatial interaction data. Trans Vis Comput Graph 15(6):1041–1048
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update, vol 11
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ChenW, Naughton JF, Bernstein PA (eds) SIGMOD conference. ACM, pp 1–12
Heer J, Boyd D (2005) Vizster: visualizing online social networks, pp 23–25
Hsu W, Lee ML, Wang J (2007) Temporal and spatio-temporal data mining. IGI Publishing, Hershey
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485
Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. GeoInformatica 10(3):239–260
Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2):100–111
Jaffré T (1992) Floristic and ecological diversity of the vegetation on ultramafic rocks in new caledonia. The vegetation of ultramafic (serpentine) soils, pp 101–107
Janeja VP, Adam NR, Atluri V, Vaidya J (2010) Spatial neighborhood based anomaly detection in sensor datasets. Data Min Knowl Discov 20(2):221–258
Jaudoin H, Flouvat F, Petit JM, Toumani F (2009) Towards a scalable query rewriting algorithm in presence of value constraints. J Data Semant 12:37–65
Keim DA, Schneidewind J, Sips M (2005) FP-Viz: visual frequent pattern mining. In: Proceedings of IEEE symposium on information visualization (InfoVis ’05), Poster Paper
Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Egenhofer MJ, Herring JR (eds) SSD, vol 951. Springer, Lecture Notes in Computer Science, pp 47–66
Leung CKS, Irani P, Carmichael CL (2008) Wifisviz: effective visualization of frequent itemsets. In: ICDM. IEEE Computer Society, pp 875–880
Lin DI, Kedem ZM (1998) Pincer search: a new algorithm for discovering the maximum frequent set. In: Schek HJ, Saltor F, Ramos I, Alonso G (eds) EDBT, vol 1377. Springer, Lecture Notes in Computer Science, pp 105–119
Lisi FA, Malerba D (2004) Inducing multi-level association rules from multiple relations. Mach Learn 55(2):175–210
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Malerba D (2008) A relational perspective on spatial data mining. Int J Data Mining Model Manag 1(1):103–118
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3):241–258
McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(01):39
Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2(1):68–77
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. ACM SIGMOD Record 27(2):13–24
Nourine L, Petit JM (2012) Extending set-based dualization: application to pattern mining. In: Raedt LD, Bessière C, Dubois D, Doherty P, Frasconi P, Heintz F, Lucas PJF (eds) ECAI, vol 242. IOS Press, Frontiers in Artificial Intelligence and Applications, pp 630–635
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. Data Eng (Section 4):433–442
Pelleg D, Moore AW (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Langley P (ed) ICML. Morgan Kaufmann, Burlington, Massachusetts, pp 727–734
Qian F, He Q, He J (2009) Mining spatial co-location patterns with dynamic neighborhood constraint. In: ECML/PKDD’09, vol 5782. Springer, LNCS, pp 238–253
Raedt LD, Zimmerman A (2007) Constraint-based pattern set mining. In: ICDM. IEEE Computer Society, pp 1–12
Selmaoui-Folcher N, Flouvat F, Gay D, Rouet I (2011) Spatial pattern mining for soil erosion characterization. IJAEIS 2(2):73–92
Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: SSTD, pp 236–256
Tobler W (1979) Cellular geography. In: Gale S, Olsson G (eds) Philosophy in geography. Reidel, Dordrecht, pp 379–389
Yang J, PengW,Ward MO, Rundensteiner EA (2003) Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: INFOVIS. IEEE Computer Society, pp 105–112
Yoo JS, Bow M (2012) Mining spatial colocation patterns: a different framework. Data Min Knowl Discov 24(1):159–194
Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE TKDE 18(10):1323–1337
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: KDD, pp 283–286
Acknowledgments
This work was funded by French contract ANR-2010-COSI-012-01 FOSTER.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Flouvat, F., Van Soc, JF.N., Desmier, E. et al. Domain-driven co-location mining. Geoinformatica 19, 147–183 (2015). https://doi.org/10.1007/s10707-014-0209-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-014-0209-3