Abstract
Product entity resolution is an important part of online product search, where product entities coming from different websites need to be aggregated in the search results. In this paper, we propose an approach to product entity resolution using the descriptive power of an ontology. In our algorithm, we use similarity measures that are defined specifically for each type of product feature and learn the feature weights by means of a genetic algorithm. In the evaluation of our algorithm, we obtain F 1-measures of 59% and 72% for two product classes that we consider. The obtained results are significantly better than those obtained from a state-of-the-art product entity resolution algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apache: Apache Mahout (2014), http://mahout.apache.org/
Benjelloun, O., Garcia-Molina, H., Kawai, H., Larson, T.E., Menestrina, D., Su, Q., Thavisomboon, S., Widom, J.: Generic Entity Resolution in the SERF Project. Bulletin, 1–9 (June 2006), http://goo.gl/rOhCFh
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a Generic Approach to Entity Resolution. The VLDB Journal 18(1), 255–276 (2008)
Brandl, X., Deckert, C., Frommer, F., Karl, D., Koslowski, D., Schley, D., Sonntag, D., Wechselberger, A., Hepp, M.: CEO: Consumer Electronics Ontology - An Ontology for Consumer Electronics Products and Services (2014), http://goo.gl/eFOMV1
Carini, A., Sehgal, V., Freeman Evans, P., Roberge, D.: European Online Retail Forecast, 2010 To 2015, Tech rep., Forrester Research, Inc. (2011), http://goo.gl/mxxD5J
Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), pp. 73–78 (2003)
Dyer, D.W.: Watchmaker Framework (2014), http://goo.gl/CZjwVg
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Halalai, R., Lemnaru, C., Potolea, R.: Distributed Community Detection in Social Networks with Genetic Algorithms. In: 6th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP 2010), pp. 35–41. IEEE Computer Society (2010)
Hepp, M.: GoodRelations: An Ontology for Describing Products and Services Offers on the Web. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 329–346. Springer, Heidelberg (2008)
Jaro, M.A.: Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84(406), 414–420 (1989)
Köpcke, H., Thor, A., Thomas, S.: Tailoring Entity Resolution for Matching Product Offers. In: 15th International Conference on Extending Database Technology (EDBT 2012), pp. 545–550. ACM (2012)
Köpcke, H., Thor, A., Rahm, E.: Evaluation of Entity Resolution Approaches on Real-World Match Problems. VLDB Endowment 3, 484–493 (2010)
Kryvyy, R., Tkachenko, S., Karkuljovskyy, V.: Analysis of Frameworks for Developing Genetic Algorithms. In: 7th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH 2011), pp. 209–210. IEEE Computer Society (2011)
Lee, T., Wang, Z., Wang, H.: Web Scale Entity Resolution using Relational Evidence, Microsoft Research, Technical Report MSR-TR-2011-30 (2011), http://goo.gl/OhNU3A
McDowell, L.K., Cafarella, M.: Ontology-Driven, Unsupervised Instance Population. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 218–236 (2012)
Monge, A., Elkan, C.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD 1997). ACM (1997)
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology Population and Enrichment: State of the Art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 134–166. Springer, Heidelberg (2011)
Polo, L., Berrueta, D.: Measurement Units Ontology (2008), http://goo.gl/DMQUEJ
Singla, P., Domingos, P.: Entity Resolution with Markov Logic. In: Sixth International Conference on Data Mining (ICDM 2006), pp. 572–582. IEEE (2006)
Winkler, W.E.: Using the EM algorithm for weight computation in the fellegi-sunter model of record linkage. In: Section on Survey Research Methods. pp. 354–359. American Statistical Association (1990)
Yerva, S.R., Miklós, Z., Aberer, K.: Towards Better Entity Resolution Techniques for Web Document Collections. In: 1st International Workshop on Data Engineering meets the Semantic Web (DESWeb 2010), pp. 209–214 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Vermaas, R., Vandic, D., Frasincar, F. (2014). An Ontology-Based Approach for Product Entity Resolution on the Web. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)