Abstract
With the development of web data integration, it poses a new challenge how to match relevant reviews to integrated database objects and provide users the more complete holistic views of entities. According to the features of web data integration and reviews from web, we proposed a method based on 2-layer Conditional Random Fields(CRF) to match reviews to database objects. On the one hand, our method leverages the integrated structured entity and significantly reduces the dependence on manually labeled training data. On the other hand, we employ semi-Markov CRF to recognize the structured entities and exploit a variety of entity-level and pattern-level recognition clues available in a database of entities and labeled reviews, thereby effectively resolving the entity variety and improving the accuracy of the entity recognition. Experiments in multiple domains show that our method can substantially superior to traditional tf-idf based methods as well as a recent language model-based method for the review matching problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Whang, S.E., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. In: SIGMOD, Rhode Island, USA, pp. 219–232 (2009)
Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD, Maryland, USA, pp. 85–96 (2005)
Kalashnikov, D.V., Mehrotra, S.: Domain-Independent Data Cleaning via Analysis of Entity-Relationship Graph. ACM Trans. Database Syst. 31(2), 716–767 (2006)
Roy, P., Mohania, M., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM, Bremen, Germany, pp. 405–412 (2005)
Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.: Efficiently linking text documents with relevant structured information. In: VLDB, Seoul, Korea, pp. 667–678 (2006)
Dalvi, N., Kumar, R., Pang, B., Tomkins, A.: Matching reviews to objects using a language model. In: EMNLP, Singapore, pp. 609–618 (2009)
Dalvi, N., Kumar, R., Pang, B., Tomkins, A.: A translation model for matching reviews to objects. In: CIKM, Hong Kong, China, pp. 167–176 (2009)
Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit distance constraints. In: SIGMOD, Rhode Island, USA, pp. 759–770 (2009)
Weld, D.S., Hoffmann, R., Wu, F.: Using wikipedia to bootstrap open information extraction. ACM SIGMOD Record 37(4), 62–68 (2009)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, MA, USA, pp. 282–289 (2001)
Sarawagi, S., Cohen, W.W.: Semi-markov conditional random fields for information extraction. In: NIPS, British Columbia, Canada, pp. 1185–1192 (2004)
Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: Combining semi-markov extraction processes and data integration methods. In: SIGKDD, Washington, USA, pp. 22–25 (2004)
Mansuri, I.R., Sarawagi, S.: Integrating unstructured data into relational databases. In: ICDE, Atlanta, GA, USA, p. 29 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yongxin, Z., Qingzhong, L., Dequan, W., Yanhui, D., Congli, L., Zhongmin, Y. (2015). Matching Reviews to Object Based on 2-Stage CRF. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-25255-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)