Skip to main content
Log in

A novel framework for social web forums’ thread ranking based on semantics and post quality features

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Online discussion forums are a valuable source of knowledge. Users may share or exchange ideas by posting content in the form of questions and answers. With the increasing volume of online content in the form of forums, finding relevant information in forums can be a challenging task and knowledge management and quality assurance of this content are of critical importance. Although online discussion forums offer search services, in most cases only keyword search is provided. In keyword search techniques, such as cosine similarity, lexical overlap between query and document terms is considered; however, these techniques do not consider the context or meaning of the terms, thus failed to retrieve the relevant documents. Earlier content-based research efforts for improving the performance of thread retrieval were primarily based on cosine similarity technique. Cosine similarity technique assigns term-weights based on term-frequency and inverse-document frequency; however, this technique does not consider discussion semantics which may lead to less effective document retrieval. To address these issues, we have proposed two thread ranking techniques for online discussion forums: (1) threads are ranked on the basis of a semantic similarity score between posts and (2) threads are ranked based on their participants’ reputation and posts’ quality. The proposed work provides a performance comparison between semantic similarity techniques and cosine similarity techniques along with reputation and post quality features in thread ranking process. Experimental results obtained using a real online forum dataset demonstrate that the proposed techniques have significantly improved thread ranking performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://wordnet.princeton.edu/.

  2. http://wordnet.princeton.edu.

  3. http://www.cyberemotions.eu/data.html.

References

  1. Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, (2008), pp 665–674

  2. Wan X (2007) A novel document similarity measure based on earth mover’s distance. Inf Sci 177:3718–3730

    Article  Google Scholar 

  3. Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, (2008), pp 183–194

  4. Li B, Jin T, Lyu MR, King I, Mak B (2012) Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference Companion on World Wide Web, (2012), pp 775–782

  5. Li C, Yin J, Zhao J (2014) Using improved ICA method for hyperspectral data classification. Arab J Sci Eng 39:181–189

    Article  Google Scholar 

  6. Cong G, Wang L, Lin CY, Song Y-I, Sun Y (2008) Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 467–474

  7. Singh A, Raghu D (2012) Retrieving similar discussion forum threads: a structure based approach. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2012), pp 135–144

  8. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523

    Article  Google Scholar 

  9. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780

  10. Vallet D, Cantador I, Jose JM (2010) Personalizing web search with folksonomy-based user and document profiles. In: Advances in information retrieval, ed: Springer, pp 420–431

  11. Varelas G, Voutsakis E, Raftopoulou P, Petrakis EG, Milios EE (2005) Semantic similarity methods in wordNet and their application to information retrieval on the web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, (2005), pp 10–16

  12. Mohler M, Mihalcea R (2009) Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp 567–575

  13. Liu G, Wang R, Buckley J, Zhou HM (2011) A WordNet-based semantic similarity measure enhanced by internet-based knowledge. In: SEKE, (2011), pp 175–178

  14. Kannan V, Srinivasan G. Yet another way of ranking web documents based on semantic similarity

  15. Bhatia S, Mitra P (2010) Adopting inference networks for online thread retrieval. In: AAAI, pp 1300–1305

  16. Elsas JL, Carbonell JG (2009) It pays to be picky: an evaluation of thread retrieval in online forums. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 714–715

  17. Jain AK, Dubes RC (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  18. Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341

    Article  Google Scholar 

  19. Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. WordNet Electron Lex Database 49:265–283

    Google Scholar 

  20. Meng L, Huang R, Gu J (2013) A review of semantic similarity measures in wordnet. Int J Hybrid Inf Technol 6:1–12

    Google Scholar 

  21. Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EG, Milios E (2006) Information retrieval by semantic similarity. Int J Semantic Web Inf Syst 2:55–73

    Article  Google Scholar 

  22. Pasca M, Harabagiu S (2001) The informative role of WordNet in open-domain question answering. In: Proceedings of NAACL-01 Workshop on WordNet and Other Lexical Resources, pp 138–143

  23. Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 752–762

  24. Corley C, Mihalcea R (2005) Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp 13–18

  25. Tari L, Tu PH, Lumpkin B, Leaman R, Gonzalez G, Baral C (2007) Passage relevancy through semantic relatedness. In: TREC

  26. Chahal P, Singh M, Kumar S (2013) Ranking of web documents using semantic similarity. In: International Conference on Information Systems and Computer Networks (ISCON), pp 145–150

  27. kralja Aleksandra B. The role of semantic similarity for intelligent question routing

  28. Seo J, Croft WB, Smith DA (2011) Online community search using conversational structures. Inf Retr 14:547–571

    Article  Google Scholar 

  29. Faisal ChMS, Daud A, Usman A (2017) Expert ranking using reputation and answer quality of co-existing users. Int Arab J Inf Technol 14(2)

  30. Cho JH, Sondhi P, Zhai C, Schatz BR (2014) Resolving healthcare forum posts via similar thread retrieval. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 33–42

  31. Jeon J, Croft WB, Lee JH, Park S (2006) A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (2006), pp 228–235

  32. Lee J-T, Yang M-C, Rim H-C (2014) Discovering high-quality threaded discussions in online forums. J Comput Sci Technol 29:519–531

    Article  Google Scholar 

  33. Wang GA, Wang HJ, Li J, Fan W (2014) Mining knowledge sharing processes in online discussion forums. In: System Sciences (HICSS), 2014 47th Hawaii International Conference on, 2014, pp 3898–3907

  34. Gottipati S, Lo D, Jiang J (2011) Finding relevant answers in software forums. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, (2011), pp 323–332

  35. Wang H, Wang C, Zhai C, Han J (2011) Learning online discussion structures by conditional random fields.In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2011), pp 435–444

  36. Zhu T, Wang B, Wu B, Zhu C (2012) Topic correlation and individual influence analysis in online forums. Expert Syst Appl 39:4222–4232

    Article  Google Scholar 

  37. Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2014) Syntactic n-grams as machine learning features for natural language processing. Expert Syst Appl 41:853–860

    Article  Google Scholar 

  38. Kim SN, Wang L, Baldwin T (2010) Tagging and linking web forum posts. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, (2010), pp 192–202

  39. Albaham AT, Salim N, Adekunle OI (2014) Leveraging post level quality indicators in online forum thread retrieval. In: Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), (2014), pp 417–425

  40. Deepak P, Visweswariah K. Unsupervised solution post identification from discussion forums

  41. Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, (2013), pp 99–108

  42. Hong L, Davison BD (2009) A classification-based approach to question answering in discussion boards.In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 171–178

  43. John BM, Chua AY-K, Goh DH-L (2011) What makes a high-quality user-generated answer? Internet Comput IEEE 15:66–71

    Article  Google Scholar 

  44. Toba H, Ming Z-Y, Adriani M, Chua T-S (2014) Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Inf Sci 261:101–115

    Article  MathSciNet  Google Scholar 

  45. Li Y-M, Liao T-F, Lai C-Y (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manag 48:978–994

    Article  Google Scholar 

  46. Wang XJ, Tu X, Feng D, Zhang L (2009) Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 179–186

  47. Ren Z, Ma J, Wang S, Liu Y (2011) Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, (2011), pp 879–884

  48. Sack W (2003) Conversation map: a content-based Usenet newsgroup browser. In: From Usenet to CoWebs, ed: Springer, 2003, pp 92–109

  49. Shi L, Sun B, Kong L, Zhang Y (2009) Web forum Sentiment analysis based on topics. In: Computer and Information Technology, 2009. CIT’09. Ninth IEEE International Conference on 2009:148–153

  50. Kardan AA, Ebrahimi M (2013) A novel approach to hybrid recommendation systems based on association rules mining for content recommendation in asynchronous discussion groups. Inf Sci 219:93–110

    Article  Google Scholar 

  51. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3:235–244

    Article  Google Scholar 

  52. Xu Z, Chen M, Weinberger K, Sha F (2012) An alternative text representation to TF-IDF and Bag-of-Words. In: Proceedings of 21st ACM Conference of Information and Knowledge Management (CIKM), (2012)

  53. Grozin VA, Gusarova NF, Dobrenko NV (2015) Feature selection for language independent text forum summarization. In: Knowledge engineering and semantic Web, ed: Springer, 2015, pp 63–71

  54. Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, (2001), pp 427–433

  55. Gopalan N, Batri K (2007) Adaptive selection of top-m retrieval strategies for data fusion in information retrieval. Int J Soft Comput 2:11–16

    Google Scholar 

  56. Fox EA, Shaw JA (1994) Combination of multiple searches. NIST Special Publication SP, pp 243–243

  57. Biyani P, Bhatia S, Caragea C, Mitra P (2012) Thread specific features are helpful for identifying subjectivity orientation of online forum threads, in COLING, (2012), pp 295–310

  58. Bhatia S, Biyani P, Mitra P (2012) Classifying user messages for managing web forum data

  59. Kardan AA, Omidvar A, Behzadi M (2012) Context based expert finding in online communities using social network analysis. Int J Comput Sci Res Appl 2:79–88

    Google Scholar 

  60. Shah C, Pomerantz J (2010) Evaluating and predicting answer quality in community QA. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2010), pp 411–418

  61. Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. Ann Arbor MI 48113:161–175

    Google Scholar 

  62. Kumar N, Srinathan K (2008) Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, (2008), pp 199–208

  63. Shah U, Finin T, Joshi A, Cost RS, Matfield J (2002) Information retrieval on the semantic web, in Proceedings of the Eleventh International Conference on Information and Knowledge Management, (2002), pp 461–468

  64. Wang X, McCallum A, Wei X (2007) Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Data mining, (2007). ICDM 2007. Seventh IEEE International Conference on 2007:697–702

  65. Baldwin T, Martinez D, Penman RB (2007) Automatic thread classification for Linux user forum information access. In: Proceedings of the Twelfth Australasian Document Computing Symposium (ADCS 2007), 2007, pp 72–9

  66. Duan H, Zhai C (2011) Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: Advances in information retrieval, ed: Springer, (2011), pp 350–361

  67. Lapata M (2006) Automatic evaluation of information ordering: Kendall’s tau. Comput Linguistics 32:471–484

    Article  MATH  Google Scholar 

  68. Rijsbergen CJV (1979) Information retrieval. Butterworth-Heinemann, Newton

    MATH  Google Scholar 

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2061978).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ch. Muhammad Shahzad Faisal or Seungmin Rho.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Faisal, C.M.S., Daud, A., Imran, F. et al. A novel framework for social web forums’ thread ranking based on semantics and post quality features. J Supercomput 72, 4276–4295 (2016). https://doi.org/10.1007/s11227-016-1839-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1839-z

Keywords

Navigation