Volume 13 Number 5 (May. 2018)
Home > Archive > 2018 > Volume 13 Number 5 (May. 2018) >
JSW 2018 Vol.13(5): 300-316 ISSN: 1796-217X
doi: 10.17706/jsw.13.5.300-316

Web Crawling and Processing with Limited Resources for Business Intelligence and Analytics Applications

Loredana M. Genovese, Filippo Geraci*

Institute for Informatics and Telematics, CNR, Via G. Moruzzi, 1 Pisa, Italy.

Abstract—Business intelligence (BI) is the activity of extracting strategic information from big data. The benefits of this activity for enterprises span from the reduction of the operative costs due to a more sensible internal organization to a more productive and aware decision process. To be effective, BI relies heavily on the availability of a huge amount of (possibly high-quality) data. The steady decrease of costs for acquiring, storing and analyzing large knowledge bases has motivated big companies to invest in BI technologies. Until now, instead, SMEs (Small and Medium-sized Companies) are excluded from the benefits of BI because of their limited budget and resources. In this paper we show that a satisfactory BI activity is possible even in presence of a small budget. Our ultimate goal is not necessarily that of proposing novel solutions but providing the practitioners with a sort of hitchhiker’s guide to a cost-effective web-based BI. In particular, we discuss how the Web can be used as a cheap yet reliable source of information where crawling, data cleaning and classification can be achieved using a limited amount of CPU, storage space and bandwidth.

Index Terms—Big data analytics, business intelligence, spam detection, web classification, web crawling.

[PDF]

Cite: Loredana M. Genovese, Filippo Geraci, "Web Crawling and Processing with Limited Resources for Business Intelligence and Analytics Applications," Journal of Software vol. 13, no. 5, pp. 300-316, 2018.

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]