Accuracy of inter-researcher similarity measures based on topical and social clues

Cabanac, Guillaume

doi:10.1007/s11192-011-0358-1

Accuracy of inter-researcher similarity measures based on topical and social clues

Published: 01 March 2011

Volume 87, pages 597–620, (2011)
Cite this article

Scientometrics Aims and scope Submit manuscript

Guillaume Cabanac¹

672 Accesses
26 Citations
2 Altmetric
Explore all metrics

Abstract

Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers’ perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

Factors affecting number of citations: a comprehensive review of the literature

Article 15 February 2016

The journal coverage of Web of Science and Scopus: a comparative analysis

Article 19 October 2015

Notes

It may be argued that Google Scholar (http://scholar.google.com) and derivatives, such as ArnetMiner (Tang et al. 2008) (http://arnetminer.org) meet this need. These search engines surely are helpful for finding document related to a query (e.g., bibliometrics). However, they do not succeed in taking a researcher’s name as input for recommending him/her papers or other researcher names that would be relevant for his/her overall scientific activity (as we intend to do in this paper).
Subject to charges like the ACM Portal (http://portal.acm.org) and SpringerLink (http://springerlink.com) or free like CiteSeer^X (http://citeseerx.ist.psu.edu), DBLP (http://www.informatik.uni-trier.de/~ley/db) or arXiv (http://arxiv.org).
http://www.ncbi.nlm.nih.gov/pubmed.
Trec stands for the Text REtrieval Conference (see Voorhees and Harman 2005).
Available for download at http://trec.nist.gov/trec_eval.
http://dblp.uni-trier.de/xml.
http://www.linkedin.com.
A demonstration can be seen at http://www.irit.fr/~Guillaume.Cabanac/expeSimT.
“in the absence of significance tests, performance differences of less than 5% must be disregarded \(\ldots\) broadly characterize performance differences, assumed significant, as noticeable if the difference is of the order of 5–10%, and as material if it is more than 10%.” Spärck Jones (1974) as cited by Sanderson (2010, p. 313).

References

Adomavicius, G., & Tuzhilin, A (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. doi:10.1109/TKDE.2005.99
Article Google Scholar
Agarwal, N., Haque, E., Liu, H., & Parsons, L. (2005). Research paper recommender systems: A subspace clustering approach. In W. Fan, Z. Wu, & J. Yang (Eds.), WAIM’05: Proceedings of the 6th international conference on web-age information management. LNCS (Vol. 3739, pp. 475–491). New York: Springer. doi:10.1007/11563952_42.
Alonso, O., Rose, D. E., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42(2), 9–15. doi:10.1145/1480506.1480508.
Article Google Scholar
Balabanović, M., & Shoham, Y. (1997). Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3), 66–72. doi:10.1145/245108.245124.
Article Google Scholar
Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12), 29–38. doi:10.1145/138859.138861.
Article Google Scholar
Ben Jabeur, L., Tamine, L., & Boughanem, M. (2010). A social model for Literature Access: Towards a weighted social network of authors. In RIAO’10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.
Biryukov, M. (2008). Co-author network analysis in DBLP: Classifying personal names. In MCO’08: Proceedings of the 2nd international conference on modelling, computation and optimization in information systems and management sciences. Communications in computer and information science (Vol. 14, pp. 399–408). New York: Springer. doi:10.1007/978-3-540-87477-5_43.
Bogers, T., & van den Bosch, A. (2008). Recommending scientific articles using CiteULike. In RecSys’08: Proceedings of the 4th ACM conference on recommender systems, ACM, New York, NY, USA (pp. 287–290). doi:10.1145/1454008.1454053.
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR’00: Proceedings of the 23rd international ACM SIGIR conference, ACM, New York, NY, USA (pp. 33–40). doi:10.1145/345508.345543.
Buckley, C., & Voorhees, E. M. (2005). Retrieval system evaluation. In E. M. Voorhees & D. K. Harman (Eds.), TREC: Experiment and evaluation in information retrieval (Chap. 3, pp. 53–75). Cambridge, MA: MIT Press.
Cazella, S. C., & Campos Alvares, L. O. (2005). Modeling user’s opinion relevance to recommending research papers. In UM’05: Proceedings of the 10th international conference on user modeling. LNCS (Vol. 3538, pp. 327–331). New York: Springer. doi:10.1007/11527886_42.
Cleverdon, C. W. (1962). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. ASLIB Cranfield Research Project, Cranfield, UK.
Deng, H., King, I., & Lyu, M. R. (2008). Formal models for expert finding on DBLP bibliography data. In ICDM’08: Proceedings of the 8th IEEE international conference on data mining (pp. 163–172). Washington, DC: IEEE Computer Society. doi:10.1109/ICDM.2008.29.
Dolamic, L., & Savoy, J. (2010). When stopword lists make the difference. Journal of the American Society for Information Science and Technology, 61(1), 200–203. doi:10.1002/asi.21186.
Article Google Scholar
Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. New York: Cambridge University Press.
MATH Google Scholar
Elmacioglu, E., & Lee, D. (2005). On six degrees of separation in DBLP-DB and more. SIGMOD Record, 34(2), 33–40. doi:10.1145/1083784.1083791.
Article Google Scholar
Fox, C. (1989). A stop list for general text. SIGIR Forum, 24(1–2), 19–21. doi:10.1145/378881.378888.
Article Google Scholar
Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. K. Harman (Ed.), TREC-1: Proceedings of the first text retrieval conference, NIST, Gaithersburg, MD, USA (pp. 243–252).
Garfield, E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108–111. doi:10.1126/science.122.3159.108.
Article Google Scholar
Garfield, E. (1996). What is the primordial reference for the phrase ‘Publish or perish’? The Scientist, 10(12), 11. http://www.the-scientist.com/article/display/17052.
Garfield, E. (2006). The history and meaning of the journal impact factor. Journal of the American Medical Association, 295(1), 90–93. doi:10.1001/jama.295.1.90.
Article Google Scholar
Glenisson, P., Glänzel, W., Janssens, F., & Moor, B. D. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572. doi:10.1016/j.ipm.2005.03.021.
Article Google Scholar
Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180. doi:10.1007/s11192-005-0208-0.
Article Google Scholar
Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. B. (1992). Using collaborative filtering to weave an Information Tapestry. Communications of the ACM, 35(12), 61–70. doi:10.1145/138859.138867.
Article Google Scholar
Gori, M., & Pucci, A. (2006). Research paper recommender systems: A random-walk based approach. In WI’06: Proceedings of the 5th IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, Los Alamitos, CA, USA (pp. 778–781). doi:10.1109/WI.2006.149.
Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5–53. doi:10.1145/963770.963772.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572. doi:10.1073/pnas.0507655102.
Article Google Scholar
Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754. doi: 10.1007/s11192-010-0193-9.
Article MathSciNet Google Scholar
Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In N. Memon & R. Alhajj (Eds.), ASONAM’09: Proceedings of the 1st international conference on advances in social network analysis and mining, IEEE Computer Society (pp. 219–224). doi:ASONAM.2009.6.
Hubert, G., & Mothe, J. (2009). An adaptable search engine for multimodal information retrieval. Journal of the American Society for Information Science and Technology, 60(8), 1625–1634. doi:10.1002/asi.21091.
Article Google Scholar
Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR’93: Proceedings of the 16th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 329–338). doi:10.1145/160688.160758
Hurtado Martín, G., Cornelis, C., & Naessens, H. (2009). Training a personal alert system for research information recommendation. In J. P. Carvalho, D. Dubois, U. Kaymak, & J. M. C. Sousa (Eds.), IFSA/EUSFLAT’09: Proceedings of the joint 2009 international Fuzzy systems association world congress and 2009 European Society of fuzzy logic and technology conference (pp. 408–413).
Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2010). Metadata impact on research paper similarity. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (Eds.), ECDL’10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 457–460). New York: Springer. doi:10.1007/978-3-642-15464-5_56.
Janas, J. M. (1977). Automatic recognition of the part-of-speech for English texts. Information Processing & Management, 13(4), 205–213. doi:10.1016/0306-4573(77)90001-2.
Article Google Scholar
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446. doi:10.1145/582415.582418.
Article Google Scholar
Karoui, H., Kanawati, R., & Petrucci, L. (2006). COBRAS: Cooperative CBR system for bibliographical reference recommendation. In ECCBR’06: Proceedings of the 8th European conference on advances in case-based reasoning. LNCS (Vol. 4106, pp. 76–90). New York: Springer. doi:10.1007/11805816_8.
Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1–2), 1–224. doi:10.1561/1500000012.
Google Scholar
Klas, C. P., & Fuhr, N. (2000). A new effective approach for categorizing web documents. In Proceedings of the 22th BCS-IRSG colloquium on IR research.
Lee, J. H. (1997). Analyses of multiple evidence combination. In SIGIR’97: Proceedings of the 20th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 267–276). doi:10.1145/258525.258587.
Ley, M. (2002). The DBLP computer science bibliography: Evolution, research issues, perspectives. In A. H. F. Laender & A. L. Oliveira (Eds.), SPIRE’02 : Proceedings of the 9th international conference on string processing and information retrieval. LNCS (Vol. 2476, pp. 1–10). New York: Springer. doi:10.1007/3-540-45735-6_1.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 5–54.
Google Scholar
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.
MATH Google Scholar
McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In CSCW’02: Proceedings of the 2002 ACM conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 116–125). doi:10.1145/587078.587096.
McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: Avoiding pitfalls when recommending research papers. In CSCW ’06: Proceedings of the 2006 20th anniversary conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 171–180). doi:10.1145/1180875.1180903.
Micarelli, A., Sciarrone, F., & Marinilli, M. (2007). In Web document modeling. LNCS (Vol. 4321, pp. 155–192). New York: Springer. doi:10.1007/978-3-540-72079-9_5.
Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67.
MathSciNet Google Scholar
Mimno, D., & McCallum, A. (2007). Mining a digital library for influential authors. In JCDL’07: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries, ACM, New York, NY, USA (pp. 105–106). doi:10.1145/1255175.1255196.
Mittelbach, F., & Goossens, M. (2005). \({\hbox{L}}{\hbox{\sc a}}{\hbox{T}}_{\rm{E}}{\hbox{X}}\) companion (2nd ed.). Boston, MA: Pearson Education.
Google Scholar
Montaner, M., López, B., & de la Rosa, J. L. (2003). A taxonomy of recommender agents on the Internet. Artificial Intelligence Review, 19(4), 285–330. doi:10.1023/A:1022850703159.
Article Google Scholar
Naak, A., Hage, H., & Aïmeur, E. (2009). A multi-criteria collaborative filtering approach for research paper recommendation in papyres. In MCETECH’09: Proceedings of the 4th international conference on E-technologies: Innovation in an open world. LNBIP (Vol. 26, pp. 25–39). New York: Springer. doi:10.1007/978-3-642-01187-0_3.
Porcel, C., López-Herrera, A. G., & Herrera-Viedma, E. (2009). A recommender system for research resources based on fuzzy linguistic modeling. Expert Systems with Applications, 36(3), 5173–5183. doi:10.1016/j.eswa.2008.06.038.
Article Google Scholar
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Google Scholar
Powley, B., & Dale, R. (2007). Evidence-based information extraction for high accuracy citation and author name identification. In RIAO’07: Proceedings of the 8th conference on information retrieval and its applications. CID, CDROM.
Reips, U. D. (2002). Standards for Internet-based experimenting. Experimental Psychology, 49(4), 243–256. doi:10.1026//1618-3169.49.4.243.
Google Scholar
Reips, U. D. (2007). The methodology of Internet-based experiments. In A. N. Joinson, K. Y. A. McKenna, T. Postmes, & U. D. Reips (Eds.), The Oxford handbook of Internet psychology. New York: Oxford University Press (Chap. 24, pp. 373–390).
Reips, U. D., & Lengler, R. (2005). The Web experiment list: A Web service for the recruitment of participants and archiving of Internet-based experiments. Behavior Research Methods, 37(2), 287–292.
Article Google Scholar
Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (Eds.), ECDL’10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 216–227). New York: Springer. doi:10.1007/978-3-642-15464-5_23.
Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58. doi:10.1145/245108.245121.
Article Google Scholar
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora. ACM Transactions on Information Systems, 28(1), 4:1–4:38. doi:10.1145/1658377.1658381.
Article Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In UAI’04: Proceedings of the 20th annual conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia (pp. 487–494).
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523. doi:10.1016/0306-4573(88)90021-0.
Article Google Scholar
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. doi:10.1145/361219.361220.
Article MATH Google Scholar
Sanderson, M. (2010). Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval, 4(4), 247–375. doi:10.1561/1500000009.
Article MATH Google Scholar
Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In SIGIR’05: Proceedings of the 28th annual international ACM SIGIR conference, ACM, New York, NY, USA (pp. 162–169). doi:10.1145/1076034.1076064.
Spärck Jones, K. (1973). Index term weighting. Information Storage and Retrieval , 9(11), 619–633. doi:10.1016/0020-0271(73)90043-0.
Article Google Scholar
Spärck Jones, K. (1974). Automatic indexing. Journal of Documentation, 30(4), 393–432. doi:10.1108/eb026588.
Article Google Scholar
Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi:10.2307/2331554.
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In KDD’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA (pp. 990–998). doi:10.1145/1401890.1402008.
Travers, J., & Milgram, S. (1969). An experimental study of the small world problem. Sociometry, 32(4), 425–443. doi:10.2307/2786545.
Article Google Scholar
Tsatsaronis, G., Varlamis, I., Stamou, S., Nørvåg, K., & Vazirgiannis, M. (2009). Semantic relatedness hits bibliographic data. In WIDM’09: Proceeding of the 11th international workshop on Web information and data management, ACM, New York, NY, USA (pp. 87–90). doi:10.1145/1651587.1651607.
Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (Eds.), CLEF’01: Second workshop of the cross-language evaluation forum. LNCS (Vol. 2406, pp. 355–370). New York: Springer. doi:10.1007/3-540-45691-0_34.
Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, MA: MIT Press.
Google Scholar
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. doi:10.2307/3001968.
Article Google Scholar
Yan, E., & Ding, Y. (2009). Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60(10), 2107–2118. doi:10.1002/asi.21128.
Article Google Scholar
Yang, Z., Hong, L., & Davison, B. D. (2010). Topic-driven multi-type citation network analysis. In RIAO’10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.
Zamparelli, R. (1998). Internet publications: Pay-per-use or pay-per-subscription? In C. Nikolaou & C. Stephanidis (Eds.), ECDL’98: Proceedings of the 2nd European conference on research and advanced technology for digital libraries. LNCS (Vol. 1513, pp. 635–636). New York: Springer. doi:10.1007/3-540-49653-X_38.
Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In ICDM’07: Proceedings of the 7th IEEE international conference on data mining (pp. 739–744). doi:10.1109/ICDM.2007.57.

Download references

Acknowledgments

The constructive criticisms and suggestions of the referees are warmly acknowledged. I am also grateful to the 71 volunteer researchers who took part in the experiment reported in this paper. Their feedback, comments, and insightful advice have been a source of stimulating thinking. Finally, I am indebted to Anaïs Lefeuvre for her involvement in this work as a research assistant.

Author information

Authors and Affiliations

Computer Science Department, IRIT UMR 5505 CNRS, University of Toulouse, 118 route de Narbonne, 31062, Toulouse Cedex 9, France
Guillaume Cabanac

Authors

Guillaume Cabanac
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Cabanac.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cabanac, G. Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics 87, 597–620 (2011). https://doi.org/10.1007/s11192-011-0358-1

Download citation

Received: 14 November 2010
Published: 01 March 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11192-011-0358-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accuracy of inter-researcher similarity measures based on topical and social clues

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Factors affecting number of citations: a comprehensive review of the literature

The journal coverage of Web of Science and Scopus: a comparative analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accuracy of inter-researcher similarity measures based on topical and social clues

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Factors affecting number of citations: a comprehensive review of the literature

The journal coverage of Web of Science and Scopus: a comparative analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation