skip to main content
10.1145/2611040.2611076acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Predicting Download Directories for Web Resources

Published: 02 June 2014 Publication History

Abstract

Browsing the web is one of the most common activities that users engage in nowadays, and downloading web resources of interest, such as images, documents, music, etc., is part of this process. However, users would rather temporarily save that resource to a default path that they have easy access to (e.g. their "Desktop") than select the actual directory where they would eventually place it. This clearly implies that existing user interfaces are not as effective for this particular task as the users would like them to be. Instead of proposing a different User Interface, in this paper, we try to address the problem at its core, and propose a methodology to suggest the most likely directory where the file would (eventually) be saved by the user. By doing so, future interfaces can also benefit from our technique. We provide a formal definition of the problem and propose a classification framework to tackle it. We present our overall solution to this problem, namely Directory Download PrediCtor, or DiDoCtor for short. We give experimental evidence of its effectiveness, by implementing our approach as part of a widely used browser and evaluate it with real user activity. We also discuss lessons learned from this process, regarding the efficiency perspective.

References

[1]
Australian Bureau of Statistics. 4. Personal internet use - Table 5, Australia, 2010-11, http://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/8146.02010-11?OpenDocument, accessed 27 Feb 2014.
[2]
Statistics Canada. Internet use by individuals, by type of activity, http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/comm29a-eng.htm, accessed 29 Sep 2012.
[3]
The Radicati Group, Inc., Email Statistics Report 2009-2013, http://www.radicati.com/wp/wpcontent/uploads/2009/05/email-stats-report-execsummary.pdf, accessed 29 Sep 2012.
[4]
Save File To. https://addons.mozilla.org/en-us/firefox/addon/save-file-to/,accessed 6 Jan 2013.
[5]
Automatic Save Folder. https://addons.mozilla.org/en-US/firefox/addon/automatic-save-folder/, accessed 29 Sep 2012.
[6]
Save Link in Folder. https://addons.mozilla.org/en-US/firefox/addon/save-link-in-folder/, accessed 29 Sep 2012.
[7]
Previous Folders. https://addons.mozilla.org/enus/firefox/addon/previous-folders/, accessed 29 Sep 2012.
[8]
W3C, HTML 4.01 Specification, http://www.w3.org/TR/html401/, accessed 29 Sep 2012.
[9]
X. Bao and T. G. Dietterich. Folderpredictor: Reducing the cost of reaching the right folder. ACM Trans. Intell. Syst. Technol., 2(1):8:1--8:23, Jan. 2011.
[10]
Z. Bar-Yossef, I. Keidar, and U. Schonfeld. Do not crawl in the dust: Different urls with similar text. In WWW, pages 111--120, 2007.
[11]
R. Bayer. Symmetric binary b-trees: Data structure and maintenance algorithms. Acta Inf., 1:290--306, 1972.
[12]
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Comput. Netw. ISDN Syst., 29(8-13):1157--1166, Sept. 1997.
[13]
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121--167, jun 1998.
[14]
T. W. Butler. Computer response time and user performance. In Proc. CHI, pages 58--62, 1983.
[15]
S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal, 7(3):163--178, aug 1998.
[16]
G. Dannenbring. System response time and user performance. Systems, Man and Cybernetics, IEEE Transactions on, SMC-14(3):473--478, may-june 1984.
[17]
B. D. Davison. Topical locality in the web. In Proc. SIGIR, pages 272--279, 2000.
[18]
P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. KDD, pages 71--80, 2000.
[19]
M. Dredze, T. A. Lau, and N. Kushmerick. Automatically classifying emails into activities. In Proc. IUI, pages 70--77, 2006.
[20]
S. Dumais and H. Chen. Hierarchical classification of web content. In Proc. SIGIR, pages 256--263, 2000.
[21]
N. Eiron and K. S. McCurley. Analysis of anchor text for web search. In Proc. SIGIR, pages 459--460, 2003.
[22]
D. Fisher and G. Saksena. Link prefetching in mozilla: a server-driven approach. In F. Douglis and B. D. Davison, editors, Web content caching and distribution, pages 283--291. 2004.
[23]
J. Fürnkranz. Exploiting structural information for text classification on the www. In Proc. IDA, pages 487--498, 1999.
[24]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10--18, nov 2009.
[25]
Infoplease. Most Popular Internet Activities, http://www.infoplease.com/ipa/A0921862.html, accessed 29 Sep 2012.
[26]
T. Joachims. Training linear svms in linear time. In Proc. KDD, pages 217--226, 2006.
[27]
E. Kirda and C. Kruegel. Protecting users against phishing attacks. Comput. J., 49(5):554--561, Sept. 2006.
[28]
I. Kononenko. Estimating attributes: analysis and extensions of relief. In Proc. ECML, pages 171--182, 1994.
[29]
O. A. McBryan. GENVL and WWWW: Tools for taming the web. In Proc. WWW, page 15, CERN, Geneva, 1994.
[30]
G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, (2):81--97, March.
[31]
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proc. AAAI, pages 25--30, 2008.
[32]
NielsenWire. What americans do online: Social media and games dominate activity. http://blog.nielsen.com/nielsenwire/online_mobile/whatamericans-do-online-social-media-and-games-dominateactivity/, accessed 29 Sep 2012.
[33]
J. R. Quinlan. Induction of decision trees. Mach. Learn., 1(1):81--106, mar 1986.
[34]
R. B. Segal and J. O. Kephart. Mailcat: an intelligent assistant for organizing e-mail. In Proc. AGENTS, pages 276--282, 1999.
[35]
J. Shen, L. Li, T. G. Dietterich, and J. L. Herlocker. A hybrid learning system for recognizing user tasks from desktop activities and email messages. In Proc. IUI, pages 86--92, 2006.
[36]
D. Smith. A business case for subsecond response time: Faster is better. In Computerworld, pages 1--11, 1983.
[37]
S. Stamou, A. Ntoulas, V. Krikos, P. Kokosis, and D. Christodoulakis. Classifying web data in directory structures. In Proc. APWeb, pages 238--249, 2006.
[38]
D. Sullivan. Top Internet Activities? Search & Email, Once Again, http://searchengineland.com/top-internet-activitiessearch-email-once-again-88964, accessed 29 Sep 2012.

Index Terms

  1. Predicting Download Directories for Web Resources

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)
    June 2014
    506 pages
    ISBN:9781450325387
    DOI:10.1145/2611040
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Aristotle University of Thessaloniki

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Directory Prediction
    2. UI assistance
    3. Web Browsing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WIMS '14

    Acceptance Rates

    WIMS '14 Paper Acceptance Rate 41 of 90 submissions, 46%;
    Overall Acceptance Rate 140 of 278 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 62
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media