Skip to main content
Log in

Accuracy of inter-researcher similarity measures based on topical and social clues

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers’ perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. It may be argued that Google Scholar (http://scholar.google.com) and derivatives, such as ArnetMiner (Tang et al. 2008) (http://arnetminer.org) meet this need. These search engines surely are helpful for finding document related to a query (e.g., bibliometrics). However, they do not succeed in taking a researcher’s name as input for recommending him/her papers or other researcher names that would be relevant for his/her overall scientific activity (as we intend to do in this paper).

  2. Subject to charges like the ACM Portal (http://portal.acm.org) and SpringerLink (http://springerlink.com) or free like CiteSeerX (http://citeseerx.ist.psu.edu), DBLP (http://www.informatik.uni-trier.de/~ley/db) or arXiv (http://arxiv.org).

  3. http://www.ncbi.nlm.nih.gov/pubmed.

  4. Trec stands for the Text REtrieval Conference (see Voorhees and Harman 2005).

  5. Available for download at http://trec.nist.gov/trec_eval.

  6. http://dblp.uni-trier.de/xml.

  7. http://www.linkedin.com.

  8. A demonstration can be seen at http://www.irit.fr/~Guillaume.Cabanac/expeSimT.

  9. “in the absence of significance tests, performance differences of less than 5% must be disregarded \(\ldots\) broadly characterize performance differences, assumed significant, as noticeable if the difference is of the order of 5–10%, and as material if it is more than 10%.” Spärck Jones (1974) as cited by Sanderson (2010, p. 313).

References

  • Adomavicius, G., & Tuzhilin, A (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. doi:10.1109/TKDE.2005.99

    Article  Google Scholar 

  • Agarwal, N., Haque, E., Liu, H., & Parsons, L. (2005). Research paper recommender systems: A subspace clustering approach. In W. Fan, Z. Wu, & J. Yang (Eds.), WAIM’05: Proceedings of the 6th international conference on web-age information management. LNCS (Vol. 3739, pp. 475–491). New York: Springer. doi:10.1007/11563952_42.

  • Alonso, O., Rose, D. E., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42(2), 9–15. doi:10.1145/1480506.1480508.

    Article  Google Scholar 

  • Balabanović, M., & Shoham, Y. (1997). Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3), 66–72. doi:10.1145/245108.245124.

    Article  Google Scholar 

  • Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12), 29–38. doi:10.1145/138859.138861.

    Article  Google Scholar 

  • Ben Jabeur, L., Tamine, L., & Boughanem, M. (2010). A social model for Literature Access: Towards a weighted social network of authors. In RIAO’10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.

  • Biryukov, M. (2008). Co-author network analysis in DBLP: Classifying personal names. In MCO’08: Proceedings of the 2nd international conference on modelling, computation and optimization in information systems and management sciences. Communications in computer and information science (Vol. 14, pp. 399–408). New York: Springer. doi:10.1007/978-3-540-87477-5_43.

  • Bogers, T., & van den Bosch, A. (2008). Recommending scientific articles using CiteULike. In RecSys’08: Proceedings of the 4th ACM conference on recommender systems, ACM, New York, NY, USA (pp. 287–290). doi:10.1145/1454008.1454053.

  • Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR’00: Proceedings of the 23rd international ACM SIGIR conference, ACM, New York, NY, USA (pp. 33–40). doi:10.1145/345508.345543.

  • Buckley, C., & Voorhees, E. M. (2005). Retrieval system evaluation. In E. M. Voorhees & D. K. Harman (Eds.), TREC: Experiment and evaluation in information retrieval (Chap. 3, pp. 53–75). Cambridge, MA: MIT Press.

  • Cazella, S. C., & Campos Alvares, L. O. (2005). Modeling user’s opinion relevance to recommending research papers. In UM’05: Proceedings of the 10th international conference on user modeling. LNCS (Vol. 3538, pp. 327–331). New York: Springer. doi:10.1007/11527886_42.

  • Cleverdon, C. W. (1962). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. ASLIB Cranfield Research Project, Cranfield, UK.

  • Deng, H., King, I., & Lyu, M. R. (2008). Formal models for expert finding on DBLP bibliography data. In ICDM’08: Proceedings of the 8th IEEE international conference on data mining (pp. 163–172). Washington, DC: IEEE Computer Society. doi:10.1109/ICDM.2008.29.

  • Dolamic, L., & Savoy, J. (2010). When stopword lists make the difference. Journal of the American Society for Information Science and Technology, 61(1), 200–203. doi:10.1002/asi.21186.

    Article  Google Scholar 

  • Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. New York: Cambridge University Press.

    MATH  Google Scholar 

  • Elmacioglu, E., & Lee, D. (2005). On six degrees of separation in DBLP-DB and more. SIGMOD Record, 34(2), 33–40. doi:10.1145/1083784.1083791.

    Article  Google Scholar 

  • Fox, C. (1989). A stop list for general text. SIGIR Forum, 24(1–2), 19–21. doi:10.1145/378881.378888.

    Article  Google Scholar 

  • Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. K. Harman (Ed.), TREC-1: Proceedings of the first text retrieval conference, NIST, Gaithersburg, MD, USA (pp. 243–252).

  • Garfield, E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108–111. doi:10.1126/science.122.3159.108.

    Article  Google Scholar 

  • Garfield, E. (1996). What is the primordial reference for the phrase ‘Publish or perish’? The Scientist, 10(12), 11. http://www.the-scientist.com/article/display/17052.

  • Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association, 295(1), 90–93. doi:10.1001/jama.295.1.90.

    Article  Google Scholar 

  • Glenisson, P., Glänzel, W., Janssens, F., & Moor, B. D. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572. doi:10.1016/j.ipm.2005.03.021.

    Article  Google Scholar 

  • Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180. doi:10.1007/s11192-005-0208-0.

    Article  Google Scholar 

  • Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. B. (1992). Using collaborative filtering to weave an Information Tapestry. Communications of the ACM, 35(12), 61–70. doi:10.1145/138859.138867.

    Article  Google Scholar 

  • Gori, M., & Pucci, A. (2006). Research paper recommender systems: A random-walk based approach. In WI’06: Proceedings of the 5th IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, Los Alamitos, CA, USA (pp. 778–781). doi:10.1109/WI.2006.149.

  • Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5–53. doi:10.1145/963770.963772.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572. doi:10.1073/pnas.0507655102.

    Article  Google Scholar 

  • Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754. doi: 10.1007/s11192-010-0193-9.

    Article  MathSciNet  Google Scholar 

  • Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In N. Memon & R. Alhajj (Eds.), ASONAM’09: Proceedings of the 1st international conference on advances in social network analysis and mining, IEEE Computer Society (pp. 219–224). doi:ASONAM.2009.6.

  • Hubert, G., & Mothe, J. (2009). An adaptable search engine for multimodal information retrieval. Journal of the American Society for Information Science and Technology, 60(8), 1625–1634. doi:10.1002/asi.21091.

    Article  Google Scholar 

  • Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR’93: Proceedings of the 16th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 329–338). doi:10.1145/160688.160758

  • Hurtado Martín, G., Cornelis, C., & Naessens, H. (2009). Training a personal alert system for research information recommendation. In J. P. Carvalho, D. Dubois, U. Kaymak, & J. M. C. Sousa (Eds.), IFSA/EUSFLAT’09: Proceedings of the joint 2009 international Fuzzy systems association world congress and 2009 European Society of fuzzy logic and technology conference (pp. 408–413).

  • Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2010). Metadata impact on research paper similarity. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (Eds.), ECDL’10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 457–460). New York: Springer. doi:10.1007/978-3-642-15464-5_56.

  • Janas, J. M. (1977). Automatic recognition of the part-of-speech for English texts. Information Processing & Management, 13(4), 205–213. doi:10.1016/0306-4573(77)90001-2.

    Article  Google Scholar 

  • Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446. doi:10.1145/582415.582418.

    Article  Google Scholar 

  • Karoui, H., Kanawati, R., & Petrucci, L. (2006). COBRAS: Cooperative CBR system for bibliographical reference recommendation. In ECCBR’06: Proceedings of the 8th European conference on advances in case-based reasoning. LNCS (Vol. 4106, pp. 76–90). New York: Springer. doi:10.1007/11805816_8.

  • Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1–2), 1–224. doi:10.1561/1500000012.

    Google Scholar 

  • Klas, C. P., & Fuhr, N. (2000). A new effective approach for categorizing web documents. In Proceedings of the 22th BCS-IRSG colloquium on IR research.

  • Lee, J. H. (1997). Analyses of multiple evidence combination. In SIGIR’97: Proceedings of the 20th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 267–276). doi:10.1145/258525.258587.

  • Ley, M. (2002). The DBLP computer science bibliography: Evolution, research issues, perspectives. In A. H. F. Laender & A. L. Oliveira (Eds.), SPIRE’02 : Proceedings of the 9th international conference on string processing and information retrieval. LNCS (Vol. 2476, pp. 1–10). New York: Springer. doi:10.1007/3-540-45735-6_1.

  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 5–54.

    Google Scholar 

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In CSCW’02: Proceedings of the 2002 ACM conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 116–125). doi:10.1145/587078.587096.

  • McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: Avoiding pitfalls when recommending research papers. In CSCW ’06: Proceedings of the 2006 20th anniversary conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 171–180). doi:10.1145/1180875.1180903.

  • Micarelli, A., Sciarrone, F., & Marinilli, M. (2007). In Web document modeling. LNCS (Vol. 4321, pp. 155–192). New York: Springer. doi:10.1007/978-3-540-72079-9_5.

  • Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67.

    MathSciNet  Google Scholar 

  • Mimno, D., & McCallum, A. (2007). Mining a digital library for influential authors. In JCDL’07: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries, ACM, New York, NY, USA (pp. 105–106). doi:10.1145/1255175.1255196.

  • Mittelbach, F., & Goossens, M. (2005). \({\hbox{L}}{\hbox{\sc a}}{\hbox{T}}_{\rm{E}}{\hbox{X}}\) companion (2nd ed.). Boston, MA: Pearson Education.

    Google Scholar 

  • Montaner, M., López, B., & de la Rosa, J. L. (2003). A taxonomy of recommender agents on the Internet. Artificial Intelligence Review, 19(4), 285–330. doi:10.1023/A:1022850703159.

    Article  Google Scholar 

  • Naak, A., Hage, H., & Aïmeur, E. (2009). A multi-criteria collaborative filtering approach for research paper recommendation in papyres. In MCETECH’09: Proceedings of the 4th international conference on E-technologies: Innovation in an open world. LNBIP (Vol. 26, pp. 25–39). New York: Springer. doi:10.1007/978-3-642-01187-0_3.

  • Porcel, C., López-Herrera, A. G., & Herrera-Viedma, E. (2009). A recommender system for research resources based on fuzzy linguistic modeling. Expert Systems with Applications, 36(3), 5173–5183. doi:10.1016/j.eswa.2008.06.038.

    Article  Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.

    Google Scholar 

  • Powley, B., & Dale, R. (2007). Evidence-based information extraction for high accuracy citation and author name identification. In RIAO’07: Proceedings of the 8th conference on information retrieval and its applications. CID, CDROM.

  • Reips, U. D. (2002). Standards for Internet-based experimenting. Experimental Psychology, 49(4), 243–256. doi:10.1026//1618-3169.49.4.243.

    Google Scholar 

  • Reips, U. D. (2007). The methodology of Internet-based experiments. In A. N. Joinson, K. Y. A. McKenna, T. Postmes, & U. D. Reips (Eds.), The Oxford handbook of Internet psychology. New York: Oxford University Press (Chap. 24, pp. 373–390).

  • Reips, U. D., & Lengler, R. (2005). The Web experiment list: A Web service for the recruitment of participants and archiving of Internet-based experiments. Behavior Research Methods, 37(2), 287–292.

    Article  Google Scholar 

  • Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (Eds.), ECDL’10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 216–227). New York: Springer. doi:10.1007/978-3-642-15464-5_23.

  • Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58. doi:10.1145/245108.245121.

    Article  Google Scholar 

  • Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems, 28(1), 4:1–4:38. doi:10.1145/1658377.1658381.

    Article  Google Scholar 

  • Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In UAI’04: Proceedings of the 20th annual conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia (pp. 487–494).

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523. doi:10.1016/0306-4573(88)90021-0.

    Article  Google Scholar 

  • Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. doi:10.1145/361219.361220.

    Article  MATH  Google Scholar 

  • Sanderson, M. (2010). Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval, 4(4), 247–375. doi:10.1561/1500000009.

    Article  MATH  Google Scholar 

  • Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In SIGIR’05: Proceedings of the 28th annual international ACM SIGIR conference, ACM, New York, NY, USA (pp. 162–169). doi:10.1145/1076034.1076064.

  • Spärck Jones, K. (1973). Index term weighting. Information Storage and Retrieval , 9(11), 619–633. doi:10.1016/0020-0271(73)90043-0.

    Article  Google Scholar 

  • Spärck Jones, K. (1974). Automatic indexing. Journal of Documentation, 30(4), 393–432. doi:10.1108/eb026588.

    Article  Google Scholar 

  • Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi:10.2307/2331554.

  • Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In KDD’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA (pp. 990–998). doi:10.1145/1401890.1402008.

  • Travers, J., & Milgram, S. (1969). An experimental study of the small world problem. Sociometry, 32(4), 425–443. doi:10.2307/2786545.

    Article  Google Scholar 

  • Tsatsaronis, G., Varlamis, I., Stamou, S., Nørvåg, K., & Vazirgiannis, M. (2009). Semantic relatedness hits bibliographic data. In WIDM’09: Proceeding of the 11th international workshop on Web information and data management, ACM, New York, NY, USA (pp. 87–90). doi:10.1145/1651587.1651607.

  • Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (Eds.), CLEF’01: Second workshop of the cross-language evaluation forum. LNCS (Vol. 2406, pp. 355–370). New York: Springer. doi:10.1007/3-540-45691-0_34.

  • Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, MA: MIT Press.

    Google Scholar 

  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. doi:10.2307/3001968.

    Article  Google Scholar 

  • Yan, E., & Ding, Y. (2009). Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60(10), 2107–2118. doi:10.1002/asi.21128.

    Article  Google Scholar 

  • Yang, Z., Hong, L., & Davison, B. D. (2010). Topic-driven multi-type citation network analysis. In RIAO’10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.

  • Zamparelli, R. (1998). Internet publications: Pay-per-use or pay-per-subscription? In C. Nikolaou & C. Stephanidis (Eds.), ECDL’98: Proceedings of the 2nd European conference on research and advanced technology for digital libraries. LNCS (Vol. 1513, pp. 635–636). New York: Springer. doi:10.1007/3-540-49653-X_38.

  • Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In ICDM’07: Proceedings of the 7th IEEE international conference on data mining (pp. 739–744). doi:10.1109/ICDM.2007.57.

Download references

Acknowledgments

The constructive criticisms and suggestions of the referees are warmly acknowledged. I am also grateful to the 71 volunteer researchers who took part in the experiment reported in this paper. Their feedback, comments, and insightful advice have been a source of stimulating thinking. Finally, I am indebted to Anaïs Lefeuvre for her involvement in this work as a research assistant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Cabanac.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cabanac, G. Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics 87, 597–620 (2011). https://doi.org/10.1007/s11192-011-0358-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-011-0358-1

Keywords

Navigation