Predicting Invariant Nodes in Large Scale Semantic Knowledge Graphs

Barsotti, Damian; Dominguez, Martin Ariel; Duboue, Pablo Ariel

doi:10.1007/978-3-319-90596-9_4

Damian Barsotti¹¹,
Martin Ariel Dominguez¹¹ &
Pablo Ariel Duboue¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 795))

Included in the following conference series:

Annual International Symposium on Information Management and Big Data

383 Accesses
1 Citations

Abstract

Understanding and predicting how large scale knowledge graphs change over time has direct implications in software and hardware associated with their maintenance and storage. An important subproblem is predicting invariant nodes, that is, nodes within the graph will not have any edges deleted or changed (add-only nodes) or will not have any edges added or changed (del-only nodes). Predicting add-only nodes correctly has practical importance, as such nodes can then be cached or represented using a more efficient data structure. This paper presents a logistic regression approach using attribute-values as features that achieves 90%+ precision on DBpedia yearly changes trained using Apache Spark. The paper concludes by outlining how we plan to use these models for evaluating Natural Language Generation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://dbpedia.org.
2.
https://spark.apache.org/.

References

Antoniou, G., Van Harmelen, F.: Web ontology language: OWL. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 67–92. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24750-0_4
Chapter Google Scholar
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)
Google Scholar
Botelho, F.C., Ziviani, N.: External perfect hashing for very large key sets. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 653–662. ACM, New York (2007). http://doi.acm.org/10.1145/1321440.1321532
Cheng, S., Termehchy, A., Hristidis, V.: Efficient prediction of difficult keyword queries over databases. IEEE Trans. Knowl. Data Eng. 26(6), 1507–1520 (2014)
Article Google Scholar
Drury, B., Valverde-Rebaza, J.C., de Andrade Lopes, A.: Causation generalization through the identification of equivalent nodes in causal sparse graphs constructed from text using node similarity strategies. In: Proceedings of SIMBig, Peru (2015)
Google Scholar
Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of the 2003 Conference on Empirical Methods for Natural Language Processing, EMNLP 2003, Sapporo, Japan, July 2003
Google Scholar
Duboue, P.A., Domínguez, M.A.: Using robustness to learn to order semantic properties in referring expression generation. In: Montes-y-Gómez, M., Escalante, H.J., Segura, A., Murillo, J.D. (eds.) IBERAMIA 2016. LNCS (LNAI), vol. 10022, pp. 163–174. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47955-2_14
Chapter Google Scholar
Duboue, P.A., Domínguez, M.A., Estrella, P.: On the robustness of standalone referring expression generation algorithms using RDF data. In: WebNLG 2016, p. 17 (2016)
Google Scholar
Eder, J., Koncilia, C.: Modelling changes in ontologies. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM 2004. LNCS, vol. 3292, pp. 662–673. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30470-8_77
Chapter Google Scholar
Kauppinen, T., Hyvnen, E.: Modeling and reasoning about changes in ontology time series. In: Sharman, R., Kishore, R., Ramesh, R. (eds.) Ontologies. Integrated Series in Information Systems, pp. 319–338. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-37022-4_11
Chapter Google Scholar
Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)
MATH Google Scholar
Lassila, O., Swick, R.R., Wide, W., Consortium, W.: Resource description framework (RDF) model and syntax specification (1998)
Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. 6(2), 167–195 (2015)
Google Scholar
Li, X., Zhou, W.: Performance comparison of Hive, Impala and Spark SQL. In: 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 1, pp. 418–423. IEEE (2015)
Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar
Owen, S.: Mahout in Action. Manning, Shelter Island (2012)
Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Rula, A., Panziera, L., Palmonari, M., Maurino, A.: Capturing the currency of DBpedia descriptions and get insight into their validity. In: Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 20 October 2014 (2014)
Google Scholar
Stefanović, D., McKinley, K.S., Moss, J.E.B.: Age-based garbage collection. ACM SIGPLAN Not. 34(10), 370–381 (1999)
Article Google Scholar
Tsai, C.F., Hsu, Y.F., Lin, C.Y., Lin, W.Y.: Intrusion detection by machine learning: a review. Expert Syst. Appl. 36(10), 11994–12000 (2009)
Article Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Xie, Q., Ma, X., Dai, Z., Hovy, E.: An interpretable knowledge transfer model for knowledge base completion. In: ACL 2017: Proceedings of the Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Vancouver (2017)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the Secretaria de Ciencia y Tecnica of Cordoba Province for support and Annie Ying and the anonymous reviewers for helpful comments and suggestions. They would also like to extend their gratitude to the organizers of the SIMBig Symposium.

Author information

Authors and Affiliations

FaMAF-UNC, Cordoba, Argentina
Damian Barsotti, Martin Ariel Dominguez & Pablo Ariel Duboue

Authors

Damian Barsotti
View author publications
You can also search for this author in PubMed Google Scholar
Martin Ariel Dominguez
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Ariel Duboue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Ariel Duboue .

Editor information

Editors and Affiliations

University of Florida, Gainesville, Florida, USA
Juan Antonio Lossio-Ventura
Universidad del Pacífico, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barsotti, D., Dominguez, M.A., Duboue, P.A. (2018). Predicting Invariant Nodes in Large Scale Semantic Knowledge Graphs. In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2017. Communications in Computer and Information Science, vol 795. Springer, Cham. https://doi.org/10.1007/978-3-319-90596-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-90596-9_4
Published: 21 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90595-2
Online ISBN: 978-3-319-90596-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics