ABSTRACT
Nowadays, in many practical situations, analytical tasks need to be performed on complex heterogeneous data, often described by a domain ontology (DO). Such cases abound in life science fields such as agro-informatics, where observations and measures on animals/plants are logged for subsequent mining. The data is naturally structured as graph(s), unlabelled and missing some values, hence it fits well pattern mining. In our own precision farming project aimed at decision support for dairy cow management, we mine for knowledge in milk production data. In one task, we aim at contrast patterns explaining the relative impact of independent production factors. To that end, ontologically-generalized graph patterns (OGPs), a variety of generalized graph patterns, where vertices and edges are labelled by DO classes and properties, respectively, were defined. A mining methodology was also designed that reconciles OWL DOs, abstraction from RDF graphs and literals in data. To address the well-known cost-related limitations of graph mining -exacerbated here by class/property specializations and data properties- we split the mining task into (1) mining of generic object property topology patterns and (2) label refinement. Those focus on two sorts of OGPs, called topologies and class stars, respectively, which, after being mined separately, get (3) assembled into fully-fledged OGPs.
- M. Adda et al. 2005. On the discovery of semantically enhanced sequential patterns. In 4th Intl. Conf. on Machine Learning and Applications. IEEE, 8--pp.Google ScholarDigital Library
- M. Adda et al. 2010. A framework for mining meaningful usage patterns within a semantically enhanced web portal. In 3rd C* Conf. CS&SE. 138--147.Google Scholar
- C. Aggarwal et al. 2014. Frequent Pattern Mining (2014 ed.). Springer.Google Scholar
- R. Agrawal et al. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc., ACM SIGMOD Conf., Washington, D.C. 207--216.Google ScholarDigital Library
- S. Anand et al. 1995. The role of domain knowledge in data mining. In Proc. of the 4th Int. Conf. on Information and knowledge management. ACM, 37--43.Google ScholarDigital Library
- M. Barati et al. 2017. Mining semantic association rules from RDF data. Knowledge-Based Systems 133 (2017), 183--196.Google ScholarDigital Library
- S. Bay and M. Pazzani. 2001. Detecting group differences: Mining contrast sets. Data mining and knowledge discovery 5, 3 (2001), 213--246.Google Scholar
- B. Berendt. 2006. Using and learning semantics in frequent subgraph mining. In Intl. WS. KDWEB. Springer, 18--38.Google Scholar
- A. Cakmak and G. Ozsoyoglu. 2008. Taxonomy-superimposed graph mining. In Proc. of the 11th intl. conf. on EDBT. ACM, 217--228.Google Scholar
- V. Carletti et al. 2017. Introducing VF3: A New Algorithm for Subgraph Isomorphism. Lecture Notes in Computer Science, Vol. 10310. Springer, 128--139.Google Scholar
- L. De Raedt. 2008. Logical and relational learning. Springer.Google Scholar
- G. Dong and J. Li. 1999. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of the fifth ACM SIGKDD intl. conf. ACM, 43--52.Google Scholar
- M. Dyer and C. Greenhill. 2000. The complexity of counting graph homomor-phisms. Random Structures & Algorithms 17, 3--4 (2000), 260--289.Google ScholarCross Ref
- J. Euzenat and P. Valtchev. 2003. An integrative proximity measure for ontology alignment. In SIW@ISWC-2003. 33--38.Google Scholar
- S. Fortin and L. Liu. 1996. An object-oriented approach to multi-level association rule mining. In Proc. of the fifth intl. CIKM. 65--72.Google Scholar
- V. Fuentes et al. 2021. Toward a Dairy Ontology to Support PrecisionFarming. In Proceedings of ICBO2021.Google Scholar
- A. Goldstein et al. 2019. A Framework for Evaluating Agricultural Ontologies. arXiv preprint arXiv:1906.10450 (2019).Google Scholar
- C. Gonçalves Frasco et al. 2020. Towards an Effective Decision-making System based on Cow Profitability using Deep Learning:. In 12th ICAART. 949--958.Google Scholar
- J. Han and Y. Fu. 1995. Discovery of multiple-level association rules from large databases. In VLDB, Vol. 95. 420--431.Google ScholarDigital Library
- A. Inokuchi. 2004. Mining Generalized Substructures from a Set of Labeled Graphs. In Fourth IEEE ICDM. IEEE, 415--418. Google ScholarCross Ref
- A. Inokuchi et al. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD. Springer, 13--23.Google Scholar
- T. Jiang et al. 2007. Mining generalized associations of semantic relations from textual web content. IEEE TKDE 19, 2 (2007), 164--179.Google Scholar
- C. Jonquet et al. 2018. AgroPortal: A vocabulary and ontology repository for agronomy. Computers and Electronics in Agriculture 144 (2018), 126--143.Google ScholarCross Ref
- R. Khade et al. 2019. Finding Meaningful Contrast Patterns for Quantitative Data.. In EDBT. 444--455.Google Scholar
- S. Kiplagat et al. 2012. Genetic improvement of livestock for milk production. In Milk Production---Advanced Genetic Traits, Cellular Mechanism, Animal Management and Health. Intech Publishers, 77--96.Google Scholar
- F. Kramer and T. Beißbarth. 2017. Working with ontologies. In Bioinformatics. Springer, 123--135.Google Scholar
- T. Martin et al. 2020. Leveraging a Domain Ontology in (Neural) Learning from Heterogeneous Data.. In CIKM (Workshops).Google Scholar
- T. Martin et al. 2021. Towards Mining Generalized Patterns From RDF Data And A Domain Ontology. In Proceedings of GEM@ECML-PKDD2021. Springer.Google ScholarCross Ref
- P. Monnin. 2020. Matching and mining in knowledge graphs of the Web of data-Applications in pharmacogenomics. Ph.D. Dissertation. Université de Lorraine.Google Scholar
- Victoria Nebot and Rafael Berlanga. 2012. Finding association rules in semantic web data. Knowledge-Based Systems 25, 1 (2012), 51--62.Google ScholarDigital Library
- S. Nijssen and J. Kok. 2004. Frequent graph mining and its application to molecular databases. In IEEE Transact. on Systems, Man and Cybernetics, Vol. 5. 4571--4577.Google Scholar
- P. Novak et al. 2009. Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. JMLR 10, 2 (2009).Google Scholar
- A. Petermann et al. 2017. Mining and ranking of generalized multi-dimensional frequent subgraphs. In IEEE ICDIM. IEEE, Fukuoka, 236--245.Google Scholar
- P. Ristoski and H. Paulheim. 2016. Rdf2vec: Rdf graph embeddings for data mining. In International Semantic Web Conference. Springer, 498--514.Google Scholar
- R. Srikant and R. Agrawal. 1996. Mining quantitative association rules in large relational tables. In Proceedings of the 1996 ACM SIGMOD. 1--12.Google Scholar
- R. Srikant and R. Agrawal. 1997. Mining generalized association rules. Future Generation Computer Systems 13, 2--3 (1997), 161--180.Google ScholarDigital Library
- S. Wrobel. 1997. An algorithm for multi-relational discovery of subgroups. In PKDD. Springer, 78--87.Google Scholar
- X. Yan and J. Han. 2002. gSpan: Graph-based substructure pattern mining. In IEEE ICDM. 721--724.Google Scholar
- X. Yan and J. Han. 2003. CloseGraph: mining closed frequent graph patterns. In Proceedings of the ninth ACM SIGKDD. ACM, 286--295.Google Scholar
- X. Zhang et al. 2012. Mining link patterns in linked data. In WAIM. Springer, 83--94.Google Scholar
Index Terms
- Generalized graph pattern discovery in linked data with data properties and a domain ontology
Recommendations
Graph pattern mining on top of a domain ontology - preliminary results from a dairy production application
AbstractA domain ontology (DO) is a machine-readable knowledge repository which, whenever properly exploited, can help to discover meaningful and intelligible patterns from compatible datasets. Yet since such data is naturally graph-shaped, the ...
Hyperclique pattern discovery
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of ...
Interesting pattern mining in multi-relational data
Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-...
Comments