Profile generation from web sources: an information extraction system

Ranjan, Rishabh; Vathsala, H.; Koolagudi, Shashidhar G.

doi:10.1007/s13278-021-00827-y

Profile generation from web sources: an information extraction system

Original Article
Published: 11 November 2021

Volume 12, article number 2, (2022)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Rishabh Ranjan¹,
H. Vathsala² &
Shashidhar G. Koolagudi³

691 Accesses
4 Citations
Explore all metrics

Abstract

The Internet space has a vast collection of information which is not always structured. These sources of information such as social media, news articles, blogs, speeches and videos often contain information that could be utilized to generate decision making tools such as reports about events and individuals. Using this information is a long and tedious process if done manually. Over the years, a lot of research has been done in data mining and natural language processing techniques to facilitate the consumption of this vast amount of data. The current work describes ProfileGen, an information extraction system that uses a variety of these data sources to form a profile of a given person. There are two parts to this application: The first part uses information publicly available on social media sites, news articles on news websites and blogs and compiles this information to form a corpus about the given person, and in the second part, the information is ranked using machine learning techniques, so as to provide information in the order of importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Profile generation system using artificial intelligence for information recovery and analysis

Article 15 April 2020

Pablo Chamoso, Álvaro Bartolomé, … Fernando De La Prieta

Automatic Extraction of Profiles from Web Pages

Internet Data Extraction and Analysis for Profile Generation

References

Adnan K, Akbar R (2019) An analytical study of information extraction from unstructured and multidimensional big data. J Big Data 6:1
Article Google Scholar
Ambavi H, Garg A, Sharma M, Sharma R, Choudhari J, Singh M (2019) BioGen: automated biography generation. In: 2019 ACM/IEEE joint conference on digital libraries (JCDL), pp. 21–24. IEEE
Amir G, Murtaza H (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144
Article Google Scholar
Arulanandam R, Savarimuthu B, Purvis M (2014) Extracting crime information from online newspaper articles. In: Proceedings of the second Australasian web conference—Volume 155, Auckland, New Zealand, pp. 31-38
Barzilay R, Noemie E, Kathleen M (2001) Sentence ordering in multidocument summarization. In: Proceedings of the first international conference on Human language technology research
Biadsy F, Hirschberg J, Filatova E (2008) An unsupervised approach to biography pro-duction using Wikipedia. In: Proceedings of the 46th annual meeting of the association for computational linguistics
Bird S, Edward L, Ewan K (2009) Natural language processing with python. O‘Reilly Media Inc
MATH Google Scholar
Crystal D (1997) A dictionary of linguistics and phonetics. 4th edition
David B, Smith Noah A (2014) Unsupervised discovery of biographical structure from text. Trans Assoc Comput Linguist 2:363–376
Article Google Scholar
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Filatova E, Prager J (2005) Tell me what you do and I’ll tell you what you are: Learning occupation-related activities for biographies
Finkel JR, Grenager T, Manning CD (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd annual meeting of the association for computational linguistics (ACL), pp. 363–370
Garera N, Yarowsky D (2009) Structural, transitive and latent models for biographic fact extraction. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics (EACL), pp. 300–308
Garrido AL, Buey MG, Muñoz G, Casado-Rubio JL (2016) Information extraction on weather forecasts with semantic technologies. Natural language processing and information systems (NLDB). Lecture notes in computer science. vol 9612. Springer
Gogar T, Hubacek O, Sedivy J (2016) Deep neural networks for web page information extraction. Artificial intelligence applications and innovations (AIAI). IFIP advances in information and communication technology, vol 475. Springer
Honnibal, Matthew and Montani, Ines and Van Landeghem, Sofie and Boyd, Adriane (2020). spaCy: Industrial-strength Natural Language Processing in Python. Zenodo. https://doi.org/10.5281/zenodo.1212303
Article Google Scholar
Kumar S, Agarwal N, Lim M, Liu H (2009) Mapping socio-cultural dynamics in indonesian blogosphere. In: Proceedings of the third international conference on computational cultural dynamics
Lauw H, Shafer JC, Agrawal R, Ntoulas A (2010) Homophily in the digital world: a livejournal case study. Internet Comput 14(2):15–23
Article Google Scholar
Lee H, Chang A, Peirsman Y, Chambers N, Surdeanu M, Jurafsky D (2013) Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput Linguist 39(4):885
Article Google Scholar
Lee H, Peirsman Y, Chang A, Chambers N, Surdeanu M, Jurafsky D (2011) Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 Shared Task. In Proceedings of the CoNLL-2011 shared task
Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram cooccurrence statistics. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics. pp 71–78
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55–60
Mirończuk MM (2019) information extraction system for transforming unstructured text data in fire reports into structured forms: a polish case study. Fire Technol. https://doi.org/10.1007/s10694-019-00891-z
Article Google Scholar
Nallapati R, Zhou B, Gulcehre C, Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Conference on computational natural language learning (CoNLL)
Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics
Nikola M, Cassie G, Robert H, Goran N (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22(1):55–78
Article Google Scholar
Qiu JX, Gao S, Alawad M, Schaefferkoetter N, Alamudun F, Yoon HJ, Wu XC, Tourassi G (2019) Semi-supervised information extraction for cancer pathology reports. In: IEEE EMBS international conference on biomedical and health informatics (BHI)
Raghunathan K, Lee H, Rangarajan S, Chambers N, Surdeanu M, Jurafsky D, Manning CD (2010) A multi-pass sieve for coreference resolution EMNLP-2010, Boston, USA
Recasens M, de Marneffe MC, Potts C (2013) The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL
Ritterman J, Osborne M, Klein E (2009) Using prediction markets and twitter to predict swine flu pandemic. In: Proceedings of the 1st international workshop on mining social media, pp. 9–17
Ulicny B, Kokar M, Matheus C (2010) Metrics for monitoring a socialpolitical blogosphere: a malaysian case study. Internet Comput 14(2):34–44
Google Scholar
Zhou L, Ticrea M, Hovy E (2005) Multi-document biography summarization

Download references

Author information

Authors and Affiliations

Birla Institute of Technology, Mesra, India
Rishabh Ranjan
Centre for Development of Advanced Computing, Bengaluru, India
H. Vathsala
National Insititute of Technology Karnataka, Surathkal, India
Shashidhar G. Koolagudi

Authors

Rishabh Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
H. Vathsala
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. Vathsala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ranjan, R., Vathsala, H. & Koolagudi, S.G. Profile generation from web sources: an information extraction system. Soc. Netw. Anal. Min. 12, 2 (2022). https://doi.org/10.1007/s13278-021-00827-y

Download citation

Received: 23 July 2020
Revised: 06 June 2021
Accepted: 03 July 2021
Published: 11 November 2021
DOI: https://doi.org/10.1007/s13278-021-00827-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profile generation from web sources: an information extraction system

Abstract

Access this article

Similar content being viewed by others

Profile generation system using artificial intelligence for information recovery and analysis

Automatic Extraction of Profiles from Web Pages

Internet Data Extraction and Analysis for Profile Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Profile generation from web sources: an information extraction system

Abstract

Access this article

Similar content being viewed by others

Profile generation system using artificial intelligence for information recovery and analysis

Automatic Extraction of Profiles from Web Pages

Internet Data Extraction and Analysis for Profile Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation