Effective social post classifiers on top of search interfaces

Rivas, Ryan; Hristidis, Vagelis

doi:10.1007/s10618-021-00768-2

Effective social post classifiers on top of search interfaces

Published: 12 June 2021

Volume 35, pages 1809–1829, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

325 Accesses
2 Altmetric
Explore all metrics

Abstract

Applying text classification to find social media posts relevant to a topic of interest is the focus of a substantial amount of research. A key challenge is how to select a good training set of posts to label. This problem has traditionally been solved using active learning. However, this assumes access to all posts of the collection, which is not realistic in many cases, as social networks impose constraints on the number of posts that can be retrieved through their search APIs. To address this problem, which we refer as the training post retrieval over constrained search interfaces problem, we propose several keyword selection algorithms that, given a topic, generate an effective set of keyword queries to submit to the search API. The returned posts are labeled and used as a training dataset to train post classifiers. Our experiments compare our proposed keyword selection algorithms to several baselines across various topics from three sources. The results show that the proposed methods generate superior training sets, which is measured by the balanced accuracy of the trained classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Mining of Relevant and Informative Posts from Text Forums

A Set-Based Training Query Classification Approach for Twitter Search

Improving Short Text Classification Using Public Search Engines

Data Availability Statement

The datasets used in our experiments were collected from DailyStrength, Reddit, and Misra (2018).

Notes

This value was used by the experiments in Wang et al. (2016).

References

Ahmad S, Asghar MZ, Alotaibi FM, Awan I (2019) Detection and classification of social media-based extremist affiliations using sentiment analysis techniques. Hum Cent Comput Inf Sci 9:24
Article Google Scholar
Balsamo D, Bajardi P, Panisson A (2019) Firsthand opiates abuse on social media: monitoring geospatial patterns of interest through a digital cohort. Proc WWW 2019:2572–2579
Google Scholar
Bissoyi S, Mishra BK, Patra MR (2016) Recommender systems in a patient centric social network—a survey. Proc SCOPES 2016:386–389
Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–46
Article Google Scholar
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. Proc ICPR 2010:3121–3124
Google Scholar
Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. Addison-Wesley, Boston
Google Scholar
de Lira VM, Macdonald C, Ounis I, Perego R, Renso C, Times VC (2019) Event attendance classification in social media. Inform Process Manag 56(3):687–703
Article Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. Proc SIGKDD 2008:213–220
Google Scholar
Goudjil M, Koudil M, Bedda M, Ghoggali N (2018) A novel active learning method using SVM for text classification. Int J Automat Comput 15(3):290–298
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. Proc EMNLP 2014:1746–1751
Google Scholar
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Li C, Xing J, Sun A, Ma Z (2016) Effective document labeling with very few seed words: a topic model approach. Proc CIKM 2016:85–94
Google Scholar
Li C, Zhou W, Ji F, Duan Y, Chen H (2018) A deep relevance model for zero-shot document filtering. In: Proc 56th annu meeting ACL, pp 2300–2310
Li H, Liu B, Mukherjee A, Shao J (2014) Spotting fake reviews using positive-unlabeled learning. Comput Sist 18(3):467–475
Google Scholar
Li R, Wang S, Cheng KC (2013) Towards social data platform: automatic topic-focused monitor for Twitter stream. Proc VLDB Endow 6(14):1966–1977
Article Google Scholar
Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. Proc IJCAI 2003:587–592
Google Scholar
Misra R (2018) News category dataset. ResearchGate. https://doi.org/10.13140/RG.2.2.20331.18729
Pearson K (1895) Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 58(347–352):240–242
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Pohl D, Bouchachia A, Hellwagner H (2018) Batch-based active learning: application to social media data for crisis management. Expert Syst Appl 93:232–244
Article Google Scholar
Proskurnia J, Mavlyutov R, Castillo C, Aberer K, Cudre-Mauroux P (2017) Efficient document filtering using vector space topic expansion and pattern-mining: the case of event detection in microposts. Proc CIKM 2017:457–466
Google Scholar
Rao J, Yang W, Zhang Y, Ture F, Lin J (2019) Multi-perspective relevance matching with hierarchical convnets for social media search. In: Proc 33rd AAAI conf artif intell, pp 232–240
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proc LREC 2010 workshop new challenges NLP frameworks, pp 45–50
Rivas R, Sadah SA, Guo Y, Hristidis V (2020) Classification of health-related social media posts: evaluation of post content classifier models and analysis of user demographics. JMIR Pub Health Surv 6(2):e14952
Article Google Scholar
Ruiz E, Hristidis V, Ipeirotis PG (2014) Efficient filtering on hidden document streams. In: Proc ICWSM
Sadri M, Mehrotra S, Yu Y (2016) Online adaptive topic focused tweet acquisition. Proc. CIKM 2016:2353–2358
Google Scholar
Shen S, Murzintcev N, Song C, Cheng C (2017) Information retrieval of a disaster event from cross-platform social media. Inf Discov Deliv 45(4):220–226
Google Scholar
Smailovic J, Grcar M, Lavrac N, Znidarsic M (2014) Stream-based active learning for sentiment analysis in the financial domain. Inf Sci 285:181–203
Article Google Scholar
Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276
Article Google Scholar
Wang S, Chen Z, Liu B, Emery S (2016) Identifying search keywords for finding relevant social media posts. In: Proc 30th AAAI conf artif intell, pp 3052–3058
Zhang Y, Lease M, Wallace BC (2017) Active discriminative text representation learning. In: Proc 31st AAAI conf artif intell, pp 3386–3392
Zhang Y, Zhao P, Cao J, Ma W, Huang J, Wu Q, Tan M (2018) Online adaptive asymmetric active learning for budgeted imbalanced data. Proc SIGKDD 2018:2768–2777
Google Scholar

Download references

Funding

This work was supported by NSF Grants IIS-1838222 and IIS-1901379.

Author information

Authors and Affiliations

University of California, Riverside, Riverside, CA, USA
Ryan Rivas & Vagelis Hristidis

Authors

Ryan Rivas
View author publications
You can also search for this author in PubMed Google Scholar
Vagelis Hristidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan Rivas.

Ethics declarations

Code availability

The code used in our experiments is available from: https://github.com/rriva002/Training-Post-Retrieval.

Additional information

Responsible editor: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautmann

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 416 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivas, R., Hristidis, V. Effective social post classifiers on top of search interfaces. Data Min Knowl Disc 35, 1809–1829 (2021). https://doi.org/10.1007/s10618-021-00768-2

Download citation

Received: 20 September 2020
Accepted: 20 May 2021
Published: 12 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10618-021-00768-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective social post classifiers on top of search interfaces

Abstract

Access this article

Similar content being viewed by others

Mining of Relevant and Informative Posts from Text Forums

A Set-Based Training Query Classification Approach for Twitter Search

Improving Short Text Classification Using Public Search Engines

Data Availability Statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 416 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective social post classifiers on top of search interfaces

Abstract

Access this article

Similar content being viewed by others

Mining of Relevant and Informative Posts from Text Forums

A Set-Based Training Query Classification Approach for Twitter Search

Improving Short Text Classification Using Public Search Engines

Data Availability Statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 416 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation