ABSTRACT
Searching for certain subjects of articles that are disseminated throughout scientific journals would be a time-consuming task, as it would necessitate scouring many digital libraries or journal websites. This process can be performed efficiently by utilizing web scraping technology, in which a scraper is used to extract web page content into more organized and structured datasets. This paper proposes a customized web scraper called ”Research Scraper” that will extract content from scientific journal websites, allowing users to access all results from a single search interface. The proposed technique is simple to use and can help with the process of analyzing publications in a specific field. This paper presents and explains the development steps, system design, and technologies that will be used in the implementation phase.
Supplemental Material
Available for Download
Presentation slides
- Rabiyatou Diouf, Edouard Ngor Sarr, Ousmane Sall, Babiga Birregah, Mamadou Bousso, and Sény Ndiaye Mbaye. 2019. Web scraping: state-of-the-art and areas of application. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 6040–6042.Google ScholarCross Ref
- [2] Import.io.2022. https://www.import.io/.Google Scholar
- Yesi Novaria Kunang, Susan Dian Purnamasari, 2018. Web scraping techniques to collect weather data in South Sumatera. In 2018 International Conference on Electrical Engineering and Computer Science (ICECOS). IEEE, 385–390.Google Scholar
- Software Innovation Lab LLC. 2021. Data Miner. https://data-miner.io/.Google Scholar
- Deepak Kumar Mahto and Lisha Singh. 2016. A dive into Web Scraper world. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 689–693.Google Scholar
- Ryan Mitchell. 2018. Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.Google Scholar
- [7] Octoparse.2021. https://www.octoparse.com/.Google Scholar
- D Pratiba, MS Abhay, Akhil Dua, Giridhar K Shanbhag, Neel Bhandari, and UTKARSH SINGH. 2018. Web Scraping And Data Acquisition Using Google Scholar. In 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS). IEEE, 277–281.Google Scholar
- [9] Simplescaper.2020. https://simplescraper.io/.Google Scholar
- [10] Helium Scraper Software.2021. https://www.heliumscraper.com/.Google Scholar
- K Sundaramoorthy, R Durga, and S Nagadarshini. 2017. Newsone—an aggregation system for news using web scraping method. In 2017 International Conference on Technical Advancements in Computers and Communications (ICTACC). IEEE, 136–140.Google ScholarCross Ref
Recommendations
Effective Web Data Extraction with Ducky
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumThe World Wide Web has become an invaluable source of data. However, extracting useful information from the vastness of the web can become a challenge as depending on the amount of data needed, manual extraction or creation of web scraping programs may ...
Browser GUI for generating web data extraction rules in Ducky
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesTo benefit from the invaluable data in the World Wide Web, manual extraction or creation of web scraping programs may be necessary. However, these processes can be tedious and complicated. To address these, we have proposed Ducky, which is a Web data ...
Current challenges in web crawling
ICWE'13: Proceedings of the 13th international conference on Web EngineeringWeb crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...
Comments