LLM-Based Automating Product Information Retrieval for Industry Analysis: A Real-World Application

Liao, Chen; Cheng, Gang; Huang, Shilei; Yao, Lin

doi:10.1007/978-3-031-77954-1_8

Chen Liao^11,12,
Gang Cheng^11,12,
Shilei Huang^11,12 &
…
Lin Yao¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15426))

Included in the following conference series:

International Conference on Cognitive Computing

214 Accesses

Abstract

In the rapidly evolving digital landscape, effective retrieval of product information from enterprise websites is crucial for enterprise research, industry analysis, and strategic planning, which rely on accurate and comprehensive data. In this context, a “product" is defined as any tangible item, a solution, or a service. This paper proposes a novel method for extracting such product data-such as product name, category, description, and specifications-directly from company websites. Our approach leverages the capabilities of Large Language Models (LLMs) to enhance the accuracy and automation of the web-page information retrieval process. The adoption of LLMs allows for a more sophisticated extraction and organization of data, overcoming the limitations of conventional methods. Currently, there is a notable absence of open-source or commercial databases that comprehensively cover enterprise products, making it challenging to conduct comparative studies. Our proposed method aims to fill this gap, providing a tool for gathering product information that can be used to assess competitive differences.

1. Supported by Shenzhen Science and Technology Program (No:GJHZ20220913144201002).

2. Supported by IER Foundation 2022(IERF202203).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A self-verifying clustering approach to unsupervised matching of product titles

Article 13 February 2020

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

An Exploratory Study on Utilising the Web of Linked Data for Product Data Mining

Article Open access 17 October 2022

References

Alarte, J., Silva, J.: Page-level main content extraction from heterogeneous webpages. ACM Trans. Knowl. Discov. Data 15(6) (2021). https://doi.org/10.1145/3451168
Arora, S., et al.: Ask me anything: a simple strategy for prompting language models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=bhUPJnS2g0X
Dalvi, B.B., Cohen, W.W., Callan, J.: In: WebSets: extracting sets of entities from the web using unsupervised information extraction. In: WSDM 2012, pp. 243–252. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2124295.2124327
Huang, Y., et al.: Large language models for networking: applications, enabling techniques, and challenges. IEEE Netw. (2024)
Google Scholar
Joby, P.P.: Expedient information retrieval system for web pages using the natural language modeling. J. Artif. Intell. Capsule Netw. 2(2), 100–110 (2020)
Article Google Scholar
Kumar, A., Morabia, K., Wang, J., Chang, K.C.C., Schwing, A.: Cova: context-aware visual attention for webpage information extraction. arXiv (2021). https://doi.org/10.48550/arXiv.2110.12320
Ling, C., et al.: Domain specialization as the key to make large language models disruptive: a comprehensive survey (2024). https://arxiv.org/abs/2305.18703
Liu, J., Lin, L., Cai, Z., Wang, J., Kim, H.J.: Deep web data extraction based on visual information processing. J. Ambient Intell. Humanized Comput. 15(2) (2024)
Google Scholar
Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)
Article Google Scholar
Patil, R., Gudivada, V.: A review of current trends, techniques, and challenges in large language models (llms). Appl. Sci. 14(5) (2024). https://www.mdpi.com/2076-3417/14/5/2074
Ramalingam, M., Saranya, D., ShankarRam, R., Chinnasamy, P., Ramprathap, K., Kalaiarasi, A.: An automated framework for dynamic web information retrieval using deep learning. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2022)
Google Scholar
Shaukat, K., Masood, N., Khushi, M.: A novel approach to data extraction on hyperlinked webpages. Appl. Sci. 9(23) (2019). https://www.mdpi.com/2076-3417/9/23/5102
Vaswani, A.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Wang, C., Wei, P.: A novel web page text information extraction method. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 2213–2218 (2019)
Google Scholar
Wang, Q., Fang, Y., Ravula, A., Feng, F., Quan, X., Liu, D.: Webformer: the web-page transformer for structure information extraction. In: Proceedings of the ACM Web Conference 2022, pp. 3124–3133. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3485447.3512032
Yang, J., et al.: Harnessing the power of llms in practice: a survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)
Article Google Scholar
Yang, R., Tan, T.F., Lu, W., Thirunavukarasu, A.J., Ting, D.S.W., Liu, N.: Large language models in health care: development, applications, and challenges. Health Care Sci. 2(4), 255–263 (2023). https://onlinelibrary.wiley.com/doi/abs/10.1002/hcs2.61
Zhang, M., Yang, Z., Ali, S., Ding, W.: Web page information extraction service based on graph convolutional neural network and multimodal data fusion. In: 2021 IEEE International Conference on Web Services (ICWS), pp. 681–687 (2021)
Google Scholar
Zhou, H., et al.: Large language model (llm) for telecommunications: a comprehensive survey on principles, key techniques, and opportunities (2024). https://arxiv.org/abs/2405.10825

Download references

Author information

Authors and Affiliations

IMSL Shenzhen Key Lab, PKU-HKUST Shenzhen Hong Kong Institution, ShenZhen, China
Chen Liao, Gang Cheng, Shilei Huang & Lin Yao
Shenzhen Raisound Technologies, Co., Ltd., ShenZhen, China
Chen Liao, Gang Cheng & Shilei Huang

Authors

Chen Liao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Shilei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Yao .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Ruifeng Xu
SF Technology Co., Ltd, Shenzhen, China
Huan Chen
Hohai University, Nanjing, China
Yirui Wu
Shenzhen University, Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, C., Cheng, G., Huang, S., Yao, L. (2025). LLM-Based Automating Product Information Retrieval for Industry Analysis: A Real-World Application. In: Xu, R., Chen, H., Wu, Y., Zhang, LJ. (eds) Cognitive Computing - ICCC 2024. ICCC 2024. Lecture Notes in Computer Science, vol 15426. Springer, Cham. https://doi.org/10.1007/978-3-031-77954-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-77954-1_8
Published: 29 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77953-4
Online ISBN: 978-3-031-77954-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LLM-Based Automating Product Information Retrieval for Industry Analysis: A Real-World Application