ABSTRACT
This paper presents an approach for automatic assessment of web sites in large scale e-Government surveys. The approach aims at supplementing and to some extent replacing human evaluation which is typically the core part of these surveys.
The heart of the solution is a colony inspired algorithm, called the lost sheep, which automatically locates targeted governmental material online. The algorithm centers around classifying link texts to determine if a web page should be downloaded for further analysis.
The proposed algorithm is designed to work with minimum human interaction and utilize the available resources as best possible. Using the lost sheep, the people carrying out a survey will only provide sample data for a few web sites for each type of material sought after. The algorithm will automatically locate the same type of material in the other web sites part of the survey. This way it significantly reduces the need for manual work in large scale e-Government surveys.
- Millard, J.: eGovernment measurement for policy makers. European Journal of ePractice 4 (2008)Google Scholar
- Pina, V., Torres, L., Royo, S.: Is E-Government Leading to More Accountable and Transparent Local Governments? An Overall View. Financial Accountability & Management 26 (2010) 3--20Google Scholar
- The Consumer Council of Norway: Testfakta kommunetest januar 2011. Retrieved March 23rd, 2011, from http://forbrukerportalen.no/Artikler/2011/testfakta_kommunetest_januar_2011 (2011)Google Scholar
- United Nations Department of Economic and Social Affairs: Global e-government survey 2012, e-government for the people. Retrieved March 21st, 2013, from http://unpan1.un.org/intradoc/groups/public/documents/un/unpan048065.pdf (2012)Google Scholar
- United Nations Department of Economic and Social Affairs: Global e-government survey 2010, leveraging e-government at a time of financial and economic crisis. Retrieved May 11th, 2010, from http://www2. unpan.org/egovkb/global_reports/10report.htm(2010)Google Scholar
- Capgemini: Digitizing public services in europe: Putting ambition into action. Retrieved March 16th, 2011, from http://www.capgemini.com/insights-and-resources/by-publication/2010-egovernment-benchmark/(2010)Google Scholar
- Berntzen, L., Olsen, M. G.: Benchmarking e-government - a comparative review of three international benchmarking studies. International Conference on the Digital Society 0 (2009) 77--82 Google ScholarDigital Library
- Heeks, R.: Understanding and measuring egovernment: international benchmarking studies. In: UNDESA workshop,âĂIJE-Participation and E-Government: Understanding the Present and Creating the FutureâĂİ, Budapest, Hungary. (2006) 27--28Google Scholar
- Goodwin, M., Susar, D., Nietzio, A., Snaprud, M., Jensen, C.: Global Web Accessibility Analysis of National Government Portals and Ministry Web Sites. Journal of Information Technology & Politics 8 (2011) 41--67Google Scholar
- Olston, C., Najork, M.: Web Crawling. Information Retrieval 4 (2010) 175--246 Google ScholarDigital Library
- Sun, Y., Zhuang, Z., Giles, C. L.: A large-scale study of robots.txt. In: WWW '07: Proceedings of the 16th international conference on World Wide Web, New York, NY, USA, ACM (2007) 1123--1124 Google ScholarDigital Library
- Goodwin, M.: A solution to the exact match on rare item searches: introducing the lost sheep algorithm. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, ACM (2011) 38 Google ScholarDigital Library
- Ke, W., Mostafa, J.: Scalability of findability: effective and efficient IR operations in large information networks. In: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM (2010) 74--81 Google ScholarDigital Library
- da Costa Jr, M., Gong, Z.: Web structure mining: an introduction. In: Information Acquisition, 2005 IEEE International Conference on, IEEE (2005) 6--ppGoogle Scholar
- Chun, A.: An AI framework for the automatic assessment of e-government forms. AI Magazine 29 (2008) 52Google Scholar
- Davison, B.: Topical locality in the Web. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM (2000) 272--279 Google ScholarDigital Library
- Chakrabarti, S.: Data mining for hypertext: A tutorial survey. ACM SIGKDD Explorations Newsletter 1 (2000) 1--11 Google ScholarDigital Library
- Menczer, F.: Mapping the semantics of web text and links. IEEE Internet Computing 9 (2005) 27--36 Google ScholarDigital Library
- Goodwin, M.: Towards Automated eGovernment Monitoring. PhD thesis, Ph.D. Dissertation to the Faculty of Engineering and Science at Aalborg University, Denmark (2011)Google Scholar
- Greening, D.: Data mining on the web. Web Techniques 5 (2000)Google Scholar
- De Bra, P., Houben, G., Kornatzky, Y., Post, R.: Information retrieval in distributed hypertexts. In: Proceedings of the 4th RIAO Conference. (1994) 481--491Google Scholar
- Hersovici, M., Jacovi, M., Maarek, Y., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm. An application: tailored Web site mapping. Computer Networks and ISDN Systems 30 (1998) 317--326 Google ScholarDigital Library
- Dong, J., Zuo, W., Peng, T.: Focused crawling guided by link context. In: Proceedings of the 24th IASTED international conference on Artificial intelligence and applications, ACTA Press (2006) 365--369 Google ScholarDigital Library
- Zahiri, S.: Learning automata based classifier. Pattern Recognition Letters 29 (2008) 40--48 Google ScholarDigital Library
- Oommen, B.: Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 27 (1997) 733--739 Google ScholarDigital Library
- Granmo, O. C., Oommen, B. J.: Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Applied Intelligence 33 (2010) 3--20 Google ScholarDigital Library
- Ulltveit-Moe, N., Olsen, M. G., Pillai, A., Thomsen, C., Gjøsæter, T., Snaprud, M.: Architecture for large-scale automatic web accessibility evaluation based on the uwem methodology. In: Norwegian Conference for Informatics (NIK). (2008)Google Scholar
- World Wide Web Consortium: Web Content Accessibility Guidelines (WCAG) 2.0. Retrieved November 4th, 2009, from http://www.w3.org/TR/REC-WCAG20--20081211/(2008)Google Scholar
- Kan, M. Y.: Web page classification without the web page. In: WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, New York, NY, USA, ACM (2004) 262--263 Google ScholarDigital Library
- Qi, X., Davison, B.: Web page classification: Features and algorithms. ACM Computing Surveys (CSUR) 41 (2009) 1--31 Google ScholarDigital Library
- Joachims, T.: Learning to classify text using support vector machines: Methods, theory, and algorithms. Computational Linguistics 29 (2002) 656--664Google Scholar
- Náther, P.: N-gram based Text Categorization. (2005)Google Scholar
- Resnik, P.: Mining the web for bilingual text. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics (1999) 527--534 Google ScholarDigital Library
- Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems (TOIS) 26 (2008) 1--34 Google ScholarDigital Library
Index Terms
Towards automatic assessment of government web sites
Recommendations
Intelligent crawling of web applications for web archiving
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThe steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently ...
Automatic categorization of web sites based on source types
HYPERTEXT '04: Proceedings of the fifteenth ACM conference on Hypertext and hypermediaAn important issue with the Web is verification of the accuracy, currency and authenticity of the information associated with Web sites. One way to address this problem is to identify the "source" or "sponsor" of the Web site. However, source ...
Effective web-scale crawling through website analysis
WWW '06: Proceedings of the 15th international conference on World Wide WebThe web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from these two areas to provide an effective mechanism for web-scale crawling. ...
Comments