research-article

Predicting Download Directories for Web Resources

Authors:

George Valkanas,

Dimitrios GunopulosAuthors Info & Claims

WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)

Article No.: 8, Pages 1 - 12

https://doi.org/10.1145/2611040.2611076

Published: 02 June 2014 Publication History

Abstract

Browsing the web is one of the most common activities that users engage in nowadays, and downloading web resources of interest, such as images, documents, music, etc., is part of this process. However, users would rather temporarily save that resource to a default path that they have easy access to (e.g. their "Desktop") than select the actual directory where they would eventually place it. This clearly implies that existing user interfaces are not as effective for this particular task as the users would like them to be. Instead of proposing a different User Interface, in this paper, we try to address the problem at its core, and propose a methodology to suggest the most likely directory where the file would (eventually) be saved by the user. By doing so, future interfaces can also benefit from our technique. We provide a formal definition of the problem and propose a classification framework to tackle it. We present our overall solution to this problem, namely Directory Download PrediCtor, or DiDoCtor for short. We give experimental evidence of its effectiveness, by implementing our approach as part of a widely used browser and evaluate it with real user activity. We also discuss lessons learned from this process, regarding the efficiency perspective.

References

[1]

Australian Bureau of Statistics. 4. Personal internet use - Table 5, Australia, 2010-11, http://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/8146.02010-11?OpenDocument, accessed 27 Feb 2014.

[2]

Statistics Canada. Internet use by individuals, by type of activity, http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/comm29a-eng.htm, accessed 29 Sep 2012.

[3]

The Radicati Group, Inc., Email Statistics Report 2009-2013, http://www.radicati.com/wp/wpcontent/uploads/2009/05/email-stats-report-execsummary.pdf, accessed 29 Sep 2012.

[4]

Save File To. https://addons.mozilla.org/en-us/firefox/addon/save-file-to/,accessed 6 Jan 2013.

[5]

Automatic Save Folder. https://addons.mozilla.org/en-US/firefox/addon/automatic-save-folder/, accessed 29 Sep 2012.

[6]

Save Link in Folder. https://addons.mozilla.org/en-US/firefox/addon/save-link-in-folder/, accessed 29 Sep 2012.

[7]

Previous Folders. https://addons.mozilla.org/enus/firefox/addon/previous-folders/, accessed 29 Sep 2012.

[8]

W3C, HTML 4.01 Specification, http://www.w3.org/TR/html401/, accessed 29 Sep 2012.

[9]

X. Bao and T. G. Dietterich. Folderpredictor: Reducing the cost of reaching the right folder. ACM Trans. Intell. Syst. Technol., 2(1):8:1--8:23, Jan. 2011.

Digital Library

[10]

Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the dust: Different urls with similar text. In WWW, pages 111--120, 2007.

Digital Library

[11]

R. Bayer. Symmetric binary b-trees: Data structure and maintenance algorithms. Acta Inf., 1:290--306, 1972.

Digital Library

[12]

A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Comput. Netw. ISDN Syst., 29(8-13):1157--1166, Sept. 1997.

Digital Library

[13]

C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121--167, jun 1998.

Digital Library

[14]

T. W. Butler. Computer response time and user performance. In Proc. CHI, pages 58--62, 1983.

Digital Library

[15]

S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal, 7(3):163--178, aug 1998.

Digital Library

[16]

G. Dannenbring. System response time and user performance. Systems, Man and Cybernetics, IEEE Transactions on, SMC-14(3):473--478, may-june 1984.

[17]

B. D. Davison. Topical locality in the web. In Proc. SIGIR, pages 272--279, 2000.

Digital Library

[18]

P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. KDD, pages 71--80, 2000.

Digital Library

[19]

M. Dredze, T. A. Lau, and N. Kushmerick. Automatically classifying emails into activities. In Proc. IUI, pages 70--77, 2006.

Digital Library

[20]

S. Dumais and H. Chen. Hierarchical classification of web content. In Proc. SIGIR, pages 256--263, 2000.

Digital Library

[21]

N. Eiron and K. S. McCurley. Analysis of anchor text for web search. In Proc. SIGIR, pages 459--460, 2003.

Digital Library

[22]

D. Fisher and G. Saksena. Link prefetching in mozilla: a server-driven approach. In F. Douglis and B. D. Davison, editors, Web content caching and distribution, pages 283--291. 2004.

Digital Library

[23]

J. Fürnkranz. Exploiting structural information for text classification on the www. In Proc. IDA, pages 487--498, 1999.

Digital Library

[24]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10--18, nov 2009.

Digital Library

[25]

Infoplease. Most Popular Internet Activities, http://www.infoplease.com/ipa/A0921862.html, accessed 29 Sep 2012.

[26]

T. Joachims. Training linear svms in linear time. In Proc. KDD, pages 217--226, 2006.

Digital Library

[27]

E. Kirda and C. Kruegel. Protecting users against phishing attacks. Comput. J., 49(5):554--561, Sept. 2006.

Digital Library

[28]

I. Kononenko. Estimating attributes: analysis and extensions of relief. In Proc. ECML, pages 171--182, 1994.

Digital Library

[29]

O. A. McBryan. GENVL and WWWW: Tools for taming the web. In Proc. WWW, page 15, CERN, Geneva, 1994.

[30]

G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, (2):81--97, March.

[31]

D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proc. AAAI, pages 25--30, 2008.

[32]

NielsenWire. What americans do online: Social media and games dominate activity. http://blog.nielsen.com/nielsenwire/online_mobile/whatamericans-do-online-social-media-and-games-dominateactivity/, accessed 29 Sep 2012.

[33]

J. R. Quinlan. Induction of decision trees. Mach. Learn., 1(1):81--106, mar 1986.

Digital Library

[34]

R. B. Segal and J. O. Kephart. Mailcat: an intelligent assistant for organizing e-mail. In Proc. AGENTS, pages 276--282, 1999.

Digital Library

[35]

J. Shen, L. Li, T. G. Dietterich, and J. L. Herlocker. A hybrid learning system for recognizing user tasks from desktop activities and email messages. In Proc. IUI, pages 86--92, 2006.

Digital Library

[36]

D. Smith. A business case for subsecond response time: Faster is better. In Computerworld, pages 1--11, 1983.

[37]

S. Stamou, A. Ntoulas, V. Krikos, P. Kokosis, and D. Christodoulakis. Classifying web data in directory structures. In Proc. APWeb, pages 238--249, 2006.

Digital Library

[38]

D. Sullivan. Top Internet Activities? Search & Email, Once Again, http://searchengineland.com/top-internet-activitiessearch-email-once-again-88964, accessed 29 Sep 2012.

Index Terms

Predicting Download Directories for Web Resources
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Mobile web browsing: usability study
Mobility '07: Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology

The mobile phones are increasingly used to access different kind of information other than just to make voice calls. However, browsing large web pages which is not adapted for small-screen viewing is still very inconvenient. Web browsers are emerging ...
A study of tabbed browsing among mozilla firefox users
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

We present a study which investigated how and why users of Mozilla Firefox use multiple tabs and windows during web browsing. The detailed web browsing usage of 21 participants was logged over a period of 13 to 21 days each, and was supplemented by ...
Web Screen Reading Automation Assistance Using Semantic Abstraction
IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces

A screen reader's sequential press-and-listen interface makes for an unsatisfactory and often times painful web-browsing experience for blind people. To help alleviate this situation, we introduce Web Screen Reading Automation Assistant (SRAA) for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)

June 2014

506 pages

ISBN:9781450325387

DOI:10.1145/2611040

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Aristotle University of Thessaloniki

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WIMS '14

WIMS '14: 4th International Conference on Web Intelligence, Mining and Semantics

June 2 - 4, 2014

Thessaloniki, Greece

Acceptance Rates

WIMS '14 Paper Acceptance Rate 41 of 90 submissions, 46%;

Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
62
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten