Abstract
In this paper we consider the key role of corpus homogeneity in the problem of domain adaptation. Domain adaptation is an interesting research topic concerned with the capability of portability that a linguistic tool is able to display. Since a linguistic tool is commonly developed for a specific domain, to make use of the tool with a different domain decrease its performance. In this way, determining the homogeneity of the implicated corpora is crucial for the purpose of minimising the portability cost. We examine the semantic relatedness between domains by analysing the co-occurrence of the terms. By mapping the texts and corresponding terms into the latent semantic space we identify the underlying semantic similarity between different domains. We evaluate a collection of reviews corresponding to four different domains and the results obtained so far have shown how our method is a plausible alternative in measuring the homogeneity of the collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Glickman, O., Jones, R.: Examining machine learning for adaptable end-to-end information extraction systems (1999)
Cardie, C.: Empirical methods in information extraction. AI Magazine 39(1), 65–79 (1997)
Vila, K., Ferrández, A.: Model-driven restricted-domain adaptation of question answering systems for business intelligence. In: Proceedings of the 2nd International Workshop on Business Intelligence and the WEB, pp. 36–43 (2011)
Oakes, M.P.: Statistical measures for corpus profiling. In: BCS Offices, London (eds.) Proceedings of the Open University Workshop on Corpus Profiling (2008)
Bank, M., Remus, R., Schierle, M.: Textual characteristics for language engineering (2012)
Kilgarriff, A.: Comparig corpora. International Journal of Corpus Linguistics 6(1), 97–133 (2001)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
Landauer, T.K., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25 (1998)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press (1999)
Taboada, M., Anthony, C., Voll, K.: Creating semantic orientation dictionaries, pp. 427–432 (2006)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network, pp. 252–259 (2003)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. 2nd edn. Prentice Hall (2008)
Aue, A., Gamon, M.: Customizing sentiment classifiers to new domains: A case study (2005)
Jindal, N., Liu, B.: Mining comparative sentences and relations (2006)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Uribe, D. (2014). LSA Based Approach to Domain Detection. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-13647-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13646-2
Online ISBN: 978-3-319-13647-9
eBook Packages: Computer ScienceComputer Science (R0)