Abstract
The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization (NEN) methods that can map these ambiguous surface forms into their canonical form – an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge (e.g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF (term frequency–inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McLinn, J.: Major bridge collapses in the US, and around the world. IEEE Trans. Reliab. 59(3), 449–482 (2010)
Pearson-Kirk, D.: The benefits of bridge condition monitoring. In: Proceedings of the Institution of Civil Engineers – Bridge Engineers, vol. 161, no. 3, pp. 151–185 (2008)
Commonwealth Bureau of Roads: The condition of bridges on interstate highways. Commonwealth Bureau of Roads, Canberra (1986)
Organisation for Economic Cooperation and Development: The durability of concrete road bridges. Road Transport Research Program, Organisation for Economic Cooperation and Development, Paris (1988)
American Society of Civil Engineers: Report card for America’s infrastructure. https://www.infrastructurereportcard.org/cat-item/bridges. Accessed 09 Jul 2017
National Transportation Safety Board: Highway accident report interstate 35W over the Mississippi River Minneapolis, Minnesota. National Transportation Safety Board, Washington, D.C. (2008)
Morcous, G., Lounis, Z., Cho, Y.: An integrated system for bridge management using probabilistic and mechanistic deterioration models: application to bridge decks. KSCE J. Civil Eng. 14(4), 527–537 (2010)
Huang, Y.: Artificial neural network model of bridge deterioration. J. Perform. Constr. Facil. 24(6), 597–602 (2010)
Liu, H., Madanat, S.: Adaptive optimisation methods in system-level bridge management. Struct. Infrastruct. Eng. 11(7), 884–896 (2015)
Saeed, T.U., Moomen, M., Ahmed, A., Murillo-Hoyos, J., Volovski, M., Labi, S.: Performance evaluation and life prediction of highway concrete bridge superstructure across design types. J. Perform. Constr. Facil. 31(5) (2017)
Liu, K., El-Gohary, N.: Semantic modeling of bridge deterioration knowledge for supporting big bridge data analytics. In: Proceedings of the 2016 ASCE Construction Research Congress, pp. 930–939. American Society of Civil Engineers, Reston (2016)
Liu, K., El-Gohary, N.: Similarity-based dependency parsing for extracting dependency relations from bridge inspection reports. In: Proceedings of the 2017 ASCE International Workshop on Computing in Civil Engineering, pp. 316–323. American Society of Civil Engineers, Reston (2017)
Liu, K., El-Gohary, N.: Feature discretization and selection methods for supporting bridge deterioration prediction. In: Proceedings of the 2018 ASCE Construction Research Congress. American Society of Civil Engineers, Reston (2018, in press)
Liu, K., El-Gohary, N.: Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Autom. Constr. 81, 313–323 (2017)
Liu, K., El-Gohary, N.: Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports. Automation in Construction (2017, Submitted)
Liu, K., El-Gohary, N.: Hierarchical spectral clustering for unsupervised linking of data extracted from bridge inspection reports. Advanced Engineering Informatics (2017, Submitted)
Federal Highway Administration: Developing advanced methods of assessing bridge performance. http://www.fhwa.dot.gov/publications/publicroads/09novdec/04.cfm. Accessed 13 Mar 2018
Popov, A.M., Adaskina Y.V., Andreyeva, D.A., Charabet, J.K., Moskvina, A.D., Protopopova, E.V., Yushina, T.A.: Named entity normalization for fact extraction task. In: Proceedings of the International Conference “Dialogue 2016”. Computational Linguistics and Intellectual Technologies, Moscow, Russia (2016)
Liu, X., Zhou, M., Wei, F., Fu, Z., Zhou, X.: Joint inference of named entity recognition and normalization for tweets. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 526–535. Association for Computational Linguistics, Stroudsburg (2002)
Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1) (2017)
Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Stroudsburg (2002)
Zhou, W., Yu, C., Smalheiser, N., Torvik, V., Hong, J.: Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–662. Association for Computing Machinery, New York (2007)
Cohen, A.M.: Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2005)
Hanisch, D., Fundel, K., Mevissen, H.T., Zimmer, R., Fluck, J.: ProMiner: rule-based protein and gene entity recognition. BMC Bioinform. 6(1) (2005)
Wei, C.H., Kao, H.Y.: Cross-species gene normalization by species inference. BMC Bioinform. 12(8) (2011)
Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14(1) (2013)
Jijkoun, V., Khalid, M.A., Marx, M., Rijke, M.D.: Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp. 23–30. Association for Computing Machinery, New York (2008)
Khalid, M.A., Jijkoun, V., de Rijke, M.: The impact of named entity normalization on information retrieval for question answering. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 705–710. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_83
Magdy, W., Darwish, K., Emam, O., Hassan, H.: Arabic cross-document person name normalization. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 25–32. Association for Computational Linguistics, Stroudsburg (2007)
Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 680–686. Association for Computational Linguistics, Stroudsburg (2014)
Liu, F., Weng, F., Jiang, X.: A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 1035–1044. Association for Computational Linguistics, Stroudsburg (2012)
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1577–1586. Association for Computational Linguistics, Stroudsburg (2013)
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78. Association for Computational Linguistics, Stroudsburg (2009)
Leaman, R., Lu, Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)
Bird, S., Loper, E., Klein, E.: Natural language toolkit. http://www.nltk.org/. Accessed 06 June 2017
Python Core Team: Python: A dynamic, open source programming language. http://www.python.org/. Accessed 06 June 2017
Acknowledgements
This material is based upon work supported by the Strategic Research Initiatives (SRI) Program by the College of Engineering at the University of Illinois at Urbana-Champaign.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liu, K., El-Gohary, N. (2018). Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics. In: Smith, I., Domer, B. (eds) Advanced Computing Strategies for Engineering. EG-ICE 2018. Lecture Notes in Computer Science(), vol 10864. Springer, Cham. https://doi.org/10.1007/978-3-319-91638-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-91638-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91637-8
Online ISBN: 978-3-319-91638-5
eBook Packages: Computer ScienceComputer Science (R0)