Skip to main content

LLM-Based Automating Product Information Retrieval for Industry Analysis: A Real-World Application

  • Conference paper
  • First Online:
Cognitive Computing - ICCC 2024 (ICCC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15426))

Included in the following conference series:

  • 214 Accesses

Abstract

In the rapidly evolving digital landscape, effective retrieval of product information from enterprise websites is crucial for enterprise research, industry analysis, and strategic planning, which rely on accurate and comprehensive data. In this context, a “product" is defined as any tangible item, a solution, or a service. This paper proposes a novel method for extracting such product data-such as product name, category, description, and specifications-directly from company websites. Our approach leverages the capabilities of Large Language Models (LLMs) to enhance the accuracy and automation of the web-page information retrieval process. The adoption of LLMs allows for a more sophisticated extraction and organization of data, overcoming the limitations of conventional methods. Currently, there is a notable absence of open-source or commercial databases that comprehensively cover enterprise products, making it challenging to conduct comparative studies. Our proposed method aims to fill this gap, providing a tool for gathering product information that can be used to assess competitive differences.

1. Supported by Shenzhen Science and Technology Program (No:GJHZ20220913144201002).

2. Supported by IER Foundation 2022(IERF202203).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alarte, J., Silva, J.: Page-level main content extraction from heterogeneous webpages. ACM Trans. Knowl. Discov. Data 15(6) (2021). https://doi.org/10.1145/3451168

  2. Arora, S., et al.: Ask me anything: a simple strategy for prompting language models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=bhUPJnS2g0X

  3. Dalvi, B.B., Cohen, W.W., Callan, J.: In: WebSets: extracting sets of entities from the web using unsupervised information extraction. In: WSDM 2012, pp. 243–252. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2124295.2124327

  4. Huang, Y., et al.: Large language models for networking: applications, enabling techniques, and challenges. IEEE Netw. (2024)

    Google Scholar 

  5. Joby, P.P.: Expedient information retrieval system for web pages using the natural language modeling. J. Artif. Intell. Capsule Netw. 2(2), 100–110 (2020)

    Article  Google Scholar 

  6. Kumar, A., Morabia, K., Wang, J., Chang, K.C.C., Schwing, A.: Cova: context-aware visual attention for webpage information extraction. arXiv (2021). https://doi.org/10.48550/arXiv.2110.12320

  7. Ling, C., et al.: Domain specialization as the key to make large language models disruptive: a comprehensive survey (2024). https://arxiv.org/abs/2305.18703

  8. Liu, J., Lin, L., Cai, Z., Wang, J., Kim, H.J.: Deep web data extraction based on visual information processing. J. Ambient Intell. Humanized Comput. 15(2) (2024)

    Google Scholar 

  9. Ozkaya, I.: Application of large language models to software engineering tasks: opportunities, risks, and implications. IEEE Softw. 40(3), 4–8 (2023)

    Article  Google Scholar 

  10. Patil, R., Gudivada, V.: A review of current trends, techniques, and challenges in large language models (llms). Appl. Sci. 14(5) (2024). https://www.mdpi.com/2076-3417/14/5/2074

  11. Ramalingam, M., Saranya, D., ShankarRam, R., Chinnasamy, P., Ramprathap, K., Kalaiarasi, A.: An automated framework for dynamic web information retrieval using deep learning. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2022)

    Google Scholar 

  12. Shaukat, K., Masood, N., Khushi, M.: A novel approach to data extraction on hyperlinked webpages. Appl. Sci. 9(23) (2019). https://www.mdpi.com/2076-3417/9/23/5102

  13. Vaswani, A.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  14. Wang, C., Wei, P.: A novel web page text information extraction method. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 2213–2218 (2019)

    Google Scholar 

  15. Wang, Q., Fang, Y., Ravula, A., Feng, F., Quan, X., Liu, D.: Webformer: the web-page transformer for structure information extraction. In: Proceedings of the ACM Web Conference 2022, pp. 3124–3133. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3485447.3512032

  16. Yang, J., et al.: Harnessing the power of llms in practice: a survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)

    Article  Google Scholar 

  17. Yang, R., Tan, T.F., Lu, W., Thirunavukarasu, A.J., Ting, D.S.W., Liu, N.: Large language models in health care: development, applications, and challenges. Health Care Sci. 2(4), 255–263 (2023). https://onlinelibrary.wiley.com/doi/abs/10.1002/hcs2.61

  18. Zhang, M., Yang, Z., Ali, S., Ding, W.: Web page information extraction service based on graph convolutional neural network and multimodal data fusion. In: 2021 IEEE International Conference on Web Services (ICWS), pp. 681–687 (2021)

    Google Scholar 

  19. Zhou, H., et al.: Large language model (llm) for telecommunications: a comprehensive survey on principles, key techniques, and opportunities (2024). https://arxiv.org/abs/2405.10825

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liao, C., Cheng, G., Huang, S., Yao, L. (2025). LLM-Based Automating Product Information Retrieval for Industry Analysis: A Real-World Application. In: Xu, R., Chen, H., Wu, Y., Zhang, LJ. (eds) Cognitive Computing - ICCC 2024. ICCC 2024. Lecture Notes in Computer Science, vol 15426. Springer, Cham. https://doi.org/10.1007/978-3-031-77954-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77954-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77953-4

  • Online ISBN: 978-3-031-77954-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics