Abstract
The large amount of available Web data sources is an important opportunity for Web users and also for various data-intensive Web applications. Nevertheless, the selection of the most relevant data sources and thus of high quality information is still a challenging issue. This paper proposes an approach for data source selection that is based on the notion of reputation of the data sources. The data quality literature defines reputation as a multi-dimensional quality attribute that measures the trustworthiness and importance of an information source.
This paper introduces a set of metrics able to measure the reputation of a Web source by considering its authority, its relevance in a given context, and the quality of the content. These variables have been empirically assessed for the top 20 sources identified by Google as a response to 100 queries in the tourism domain. In particular, Google’s ranking has been compared with the ranking obtained by means of a multi-dimensional source reputation index. Results show that the assessment of reputation represents a tangible aid to the selection of information sources and to identification of reliable data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Artz, D., Gil, Y.: A survey of trust in computer science and the Semantic Web. J. Web Sem. 5(2), 58–71 (2007)
Anholt, S.: Competitive Identity: The New Brand Management for Nations, Cities and Regions. Palgrave Macmillan, Basingstoke (2009)
Atzeni, P., Merialdo, P., Sindoni, G.: Web site evaluation: Methodology and case study. In: Arisawa, H., Kambayashi, Y., Kumar, V., Mayr, H.C., Hunt, I. (eds.) ER Workshops 2001. LNCS, vol. 2465, p. 253. Springer, Heidelberg (2002)
Bagozzi, R.P., Yi, Y.: On the evaluation of structural equation models. Journal of the Academy of Marketing Science 16(1), 74–94 (1988)
Barbagallo, D., Cappiello, C., Francalanci, C., Matera, M.: Reputation Based Self-Service Environments. In: ComposableWeb 2009: International Workshop on Lightweight Integration on the Web, San Sebastian, Spain, pp. 12–17 (2009)
Barbagallo, D., Cappiello, C., Francalanci, C., Matera, M.: A Reputation-based DSS: the INTEREST Approach. In: ENTER: International Conference on Information Technology and Travel&Tourism (February 2010)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys 41(3) (2009)
Balasubramaniam, S., Lewis, G.A., Simanta, S., Smith, D.B.: 2008. Situated Software: Concepts, Motivation, Technology, and the Future. IEEE Software, 50–55 (November-December 2008)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7), 107–117 (1998)
Chen, X., Ding, C.: QoS Based Ranking for Web Search. In: Proc. of International Conference on Web Intelligence and Intelligent Agent Technology, pp. 747–750 (2008)
Chen, K., Zhang, Y., Zheng, Z., Zha, H., Sun, G.: Adapting ranking functions to user preference. In: Data Engineering Workshop, ICDEW, pp. 580–587 (2008)
DeStefano, D., LeFevre, J.A.: Cognitive load in hypertext reading: A review. Computers in Human Behavior 23(3), 1616–1641 (2007)
Fornell, C., Larcker, D.F.: Evaluating structural equation models with unobservable variables and measurement errors: Algebra and statistics. Journal of Marketing Research 18(3), 383–388 (1981)
Gackowski, Z.: Redefining information quality: the operations management approach. In: Eleventh International Conference on Information Quality (ICIQ 2006), Boston, MA, USA, pp. 399–419 (2006)
Gupta, S., Jindal, A.: Contrast of link based web ranking techniques. In: International Symposium on Biometrics and Security Technologies (ISBAST), pp. 1–6 (2008)
Hair, J., Anderson, R., Tatham, R., Black, W.: Multivariate data analysis, 5th edn. Prentice Hall, Upper Saddle River (1998)
Jaccard, J., Choi, K.W.: LISREL approaches to interaction effects in multiple regression. Sage Publications, Thousand Oaks (1996)
Jiang, S., Zilles, S., Holte, R.: Empirical Analysis of the Rank Distribution of Relevant Documents in Web Search. In: International Conference on Web Intelligence and Intelligent Agent Technology, pp. 208–213 (2008)
Kendall, M.G., Babington Smith, B.: Randomness and Random Sampling Numbers. Journal of the Royal Statistical Society 101(1), 147–166 (1938)
Kleinberg, J.M.: Hubs, authorities, and communities. ACM Comput. Surv. 31(4es), 5 (1999)
Lamberti, F., Sanna, A., Demartini, C.: A Relation-Based Page Rank Algorithm for Semantic Web Search Engines. IEEE Transactions on Knowledge and Data Engineering 21(1), 123–136 (2009)
Louta, M., Anagnostopoulos, I., Michalas, A.: Efficient internet search engine service provisioning exploiting a collaborative web result ranking mechanism. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1477–1482 (2008)
Mare, R.D.: Social background and school continuation decisions. Journal of, the American Statistical Association 75, 295–305 (1980)
Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R., Catarci, T., Batini, C.: The DaQuinCIS Broker: Querying Data and Their Quality in Cooperative Information Systems. J. Data Semantics 1, 208–232 (2003)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2(1-2), 1–135 (2008)
Skopik, F., Truong, H.L., Dustdar, S.: Trust and Reputation Mining in Professional Virtual Communities. In: Gaedke, M., Grossniklaus, M., Díaz, O. (eds.) ICWE 2009. LNCS, vol. 5648, pp. 76–90. Springer, Heidelberg (2009)
Yu, J., Benatallah, B., Saint-Paul, R., Casati, F., Daniel, F., Matera, M.: A framework for rapid integration of presentation components. In: International Conference on the World Wide Web, pp. 923–932 (2007)
Yu, J., Benatallah, B., Casati, F., Daniel, F., Matera, M., Saint-Paul, R.: Mixup: A development and runtime environment for integration at the presentation layer. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 479–484. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbagallo, D., Cappiello, C., Francalanci, C., Matera, M. (2011). Enhancing the Selection of Web Sources: A Reputation Based Approach. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2010. Lecture Notes in Business Information Processing, vol 73. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19802-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-19802-1_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19801-4
Online ISBN: 978-3-642-19802-1
eBook Packages: Computer ScienceComputer Science (R0)