Abstract
From semantic point of view, information is usually contained in small units, called facts that are usually smaller than sentences. Identification of these facts in a text is not a trivial task. We present a heuristic algorithm for extraction of facts from sentences using a simple representation based on a relational data model. We focus our study on texts that contain a lot of facts by their nature: structured textbooks. The algorithm is based on data obtained by a syntactic analyzer. The obtained facts can be useful for information retrieval tasks, automatic summarization, etc. Our experiments are conducted for Spanish language. We obtained better results than the similar methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barker, K., Agashe, B., Chaw, S.-Y., Fan, J., Friedland, N., Glass, M., Hobbs, J., Hovy, E., Israel, D., Kim, D.S., Mulkar-Mehta, R., Patwardhan, S., Porter, B., Tecuci, D., Yeh, P.: Learning by reading: a prototype system, performance baseline and lessons learned. In: AAAI 2007: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 280–286. AAAI Press (2007)
Calvo, H., Gelbukh, A.: Automatic Semantic Role Labeling using Selectional Preferences with Very Large Corpora. Computación y Sistemas 12(1), 128–150 (2008)
Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 164–175. Springer, Heidelberg (2006)
Hovy, E., Kwon, N., Zhou, L.: A semi-automatic evaluation scheme: automated nuggetization for manual annotation. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2007, pp. 217–220 (2007)
Mann, G.: Multi Document Statistical Fact Extraction and Fusion. PhD Thesis, John Hopkins University, Maryland, 238 (2006)
Martínez-Santiago, F., García-Cumbreras, M.: Identifiación de formas lógicas en el caso del español: propuesta de un modelo basado en reglas y aprendizaje automático. In: Procesamiento del Lenguaje Natural, pp. 245–252 (2005)
Montes-y-Gómez, M., Gelbukh, A., López-López, A.: Mining the news: trends, associations, and deviations. Computación y Sistemas 5(1), 14–24 (2001)
Moreno, T., Moreno, G.: Lengua y Literatura 2, cuarta edn, Editorial Santillana, México (1991)
Mulkar, R., Hobbs, J., Hovy, E., Chalupsky, H., Lin, C.: Learning by reading: Two experiments. In: Proceedings of the IJCAI Workshop on Knowledge and Reasoning for Answering Questions, KRAQ (2007)
Nieto-López, J., Betancourt-Suárez, M., Nieto-López, R.: Historia 1, tercera edn, Sistemas Técnicos de Edición. México (1994)
Pasca, M., Lin, D., Bigham, J., Lifchits, A., Jain, A.: Names and Similarities on the Web: Fact Extraction in the Fast Lane. In: Proc. ACL 2006 (2006)
Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), ELRA, La Valletta, Malta (May 2010)
Rincón, A., Rocha, A.: ABC de Física. Tercer curso, sexta edn, Editorial Herrero, México (1984)
Stephen, A., Jon, P.: Dependency based logical form transformations. In: Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2006)
Zhao, S., Betz, J.: Corroborate and Learn Facts from the Web (2006), http://140.122.184.128/presentation/08-03-06/Corroborate%20and%20Learn%20Facts%20from%20the%20Web.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sidorov, G., Herrera-de-la-Cruz, J.A., Galicia-Haro, S.N., Posadas-Durán, J.P., Chanona-Hernandez, L. (2011). Heuristic Algorithm for Extraction of Facts Using Relational Model and Syntactic Data. In: Batyrshin, I., Sidorov, G. (eds) Advances in Artificial Intelligence. MICAI 2011. Lecture Notes in Computer Science(), vol 7094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25324-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-25324-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25323-2
Online ISBN: 978-3-642-25324-9
eBook Packages: Computer ScienceComputer Science (R0)