Abstract
In this position paper, we comment on various approaches to the delineation of scientific fields or domains, a typical prerequisite for a wide class of bibliometric studies. There is growing evidence that this meso-level, between micro targets of typical IR and large disciplines handled by macro-level bibliometric studies, takes full advantage of hybrid approaches. Firstly, delineation tasks gain to combine the a priori thinking of traditional IR, which typically involves clearly targeted expectations, and the a posteriori thinking of bibliometric mapping, where the decisions are built on external structuring of the domain in a wider context. The combination of the two ways of thought is far from new, with IR increasingly building on bibliometric networks for query expansion, and bibliometrics building on IR for evaluating and refining its outcomes. Secondly, delineation benefits from the multi-network perspective, which gives different representations of the scientific topics, usually all the more converging than the objects are dense and well separated. Focusing on two basic networks—words and citations—various sequences or combinations of operations are discussed. Bibliometrics and IR, especially when properly combined in multi-network approaches, provide an efficient toolbox for studies of domains delimitation. It should be recalled however that the context of such studies is often loaded with policy stakes that ask for cautious supervision and consultation processes.
Similar content being viewed by others
Notes
Amongst Thomson-Reuters nomenclatures, the "subject categories" of SCI classification allow for overlaps mostly in terms of journals or journals sections.
For a typology of IR models and the perspective of the "cognitive actor", see Ingwersen and Järvelin (2005).
Assume articles B and A share the theoretical background and C and A share the domain of application. In bibliographic coupling articles B and C may both be attracted by A on quite different semantic aspects, while without epistemic relation. The argument is already found in Martyn (1964). Even mitigated by statistical aggregation, it expresses the cost associated to the statistical efficiency of bibliometric clustering. The use of hard-clustering, simple and fast, worsens this limitation. An overlapping technique might classify A, once with B, once with C. IR scholars warned against the holistic character of several mapping techniques, source of noise including for query expansion purposes.
we are indebted to an anonymous referee for stressing this point.
Even Latourian citations or negative citations do not add much noise to co-citation topics.
High-precision is expected from a strategy focused on strong forms—heavy intersections—with check of the specific words/references of the native clusters c and w; high-recall is expected from a strategy based on the full content of c and w clusters with strong overlaps. Intermediary strategies can focus on intuitive groupings of areas along the diagonal sequence.
for example by ruling out papers without a given number or proportion of specific references.
References
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD, 207.
Ahlgren, P., & Colliander, C. (2009). Document-document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.
Archambault E., Beauchesne O. H., & Caruso J. (2011) Towards a multilingual, comprehensive and open scientific journal ontology, in Proceedings 13th ISSI Conference, Durban, South Africa.
Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Bassecoulard, E., & Zitt, M. (1999). Indicators in a research institute: A multi-level classification of scientific journals. Scientometrics, 44(3), 23–345.
Benzecri, J. P. (1973) La place de l’a priori, Encyclopedia Universalis, 17, Organum, 11–24.
Benzecri, J. P., et al. (1981). Pratique de l’analyse des données : Linguistique et lexicologie. Paris: Dunod.
Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College & Research Libraries News, 68(5). www.ala.org/ala/acrl/acrlpubs/crlnews/backissues2007/may2007/eigenfactor.cfm.
Blair, D. C. (2003). Information retrieval and the philosophy of language. Annual Review of Information Science and Technology, 37, 3–50.
Blondel V. D., Guillaume J. L., Lambiotte R., & Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 10008.
Börner, K., Chen, C. M., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37, 179–255.
Börner, K., Glänzel, W., Scharnhorst, A., & van den Besselaar, P. (2011). Modeling science: studying the structure and dynamics of science. Scientometrics, 89, 347–348.
Bornmann, L., & Daniels, H. D. (2008). What do citation counts measure? A review of studies on citation behavior. Journal of Documentation, 64(1), 45–80.
Boyack, K. W., Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? JASIST, 61(12), 2389–2404.
Boyack, K., & Klavans, R. (2013). Creation of a highly detailed, dynamic, global model and map of science, forthcoming in JASIST. doi:10.1002/asi.22990.
Boyack, K., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. JASIST, 64(9), 1759–1767.
Braam, R. R., Moed, H. F., & Van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis. I Structural aspects. JASIS, 42(4), 233–251.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and Isdn Systems, 30(1–7), 107–117.
Cadot M., & Lelu, A. (2011). Combining Explicitness and Classifying Performance via MIDOVA Lossless Representation for Qualitative Datasets. International Journal on Advances in Software, 5(1–2), 1–16.
Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual co-citation: Augmenting co-citation analysis and its applications. JASIST, 61(6), 1130–1143.
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191–235.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.
Carayol, N., & Roux, P. (2009). Knowledge flows and the geography of networks: A strategic model of small world formation. Journal of Economic Behavior & Organization, 71(2), 414–427.
Carpineto, G., & Romano, C. (2012). A survey of automatic query expansion in information retrieval. ACM-CSUR, 44(1), 1.
Chavalarias, D., & Cointet, J. P. (2013). Phylomemetic patterns in science evolution—The rise and fall of scientific fields. PLoS ONE, 8(2), e54847.
Chen, C. M. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIS, 57(3), 359–377.
Chen, C. M., Ibekwe-Sanjuan, F., & Hou, J. (2010). The structure and dynamics of co-citation clusters: A multiple-perspective co-citation analysis. JASIST, 61(7), 1386–1409.
Cronin, B. (1984). The citation process; The role and significance of citations in scientific communication (p. 103). London: Taylor Graham.
de Beaver, D., & Rosen, R. (1979). Studies in scientific collaboration. Part II. Scientific co-authorship, resarch productivity and visibility in the French Scientific Elite, 1799–1830. Scientometrics, 1(2), 133–149.
Deerwester, S., Dumai, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. JASIST, 41(6), 391–407.
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? JASIST, 59(1), 51–62.
Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24926. doi:10.1371/journal.pone.0024926.
Garfield, E. (1967). Primordial concepts, citation indexing and historio-bibliography. Journal Library History, 2, 235–249.
Garfield, E., & Sher, I. H. (1993). Keywords-Plus(Tm) -Algorithmic derivative indexing. JASIST, 44(5), 298–299.
Garfield, E., Pudovkin, A. I., & Istomin, V. S. (2003). Why do we need algorithmic historiography? JASIST, 54(5), 400–412.
Gilbert, G. N. (1977). Referencing as persuasion. Studies of Science, 7, 113–122.
Gilbert, N. (1997). A simulation of the structure of academic science. Sociological Research Online, 2(2), 3. http://www.socresonline.org.uk/socresonline/2/2/3.html.
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.
Glänzel, W., & Schubert, A. (2003). A new classification of the science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.
Gläser, J., Lange, S., Laudel, G., & Schimank, U. (2010). The Limits of Universality: How field-specific epistemic conditions affect authority relations and their consequences. In R. Whitley, J. Gläser, & L. Engwall (Eds.), Reconfiguring knowledge production: Changing authority relationships in the sciences and their consequences for intellectual innovation (pp. 291–324). Oxford: Oxford University Press.
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. Journal of Documentation, 57(6), 715–740.
Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of inversion seeking and retrieval in context (p. 436). Berlin: Springer.
Janssens, F., Glanzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.
Jardine, N., & van Rijsbergen, C. J. (1971). The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7, 217–240.
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10–25.
Kostoff, R. N., delRio, J. A., Humenik, J. A., Garcia, E. O., & Ramirez, A. M. (2001). Citation mining: Integrating text mining and bibliometrics for research user profiling. JASIST, 52(13), 1148–1156.
Larivière, V., Archambault, E., & Gingras, Y. (2008). Long-term variations in the aging of scientific literature: from exponential growth to steady-state science (1900–2004). JASIST, 59(2), 288–296.
Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.
Latour, B. (1987). Science in action: How to follow Scientists and Engineers through society. Cambridge: Harvard University Press.
Laurens, P., Zitt, M., & Bassecoulard, E. (2010). Delineation of the genomics field by hybrid citation-lexical methods: Interaction with experts and validation process. Scientometrics, 82(3), 647–662.
Lelu, A. (1994). Clusters and factors: Neural algorithms for a novel representation of huge and highly multidimensional data sets. In E. Diday & Y. Lechevallier (Eds.), New approaches in classification and data analysis (pp. 241–248). Berlin: Springer.
Leydesdorff, L., & Cozzens, S. E. (1993). The delineation of specialties in terms of journals using the dynamic journal set of the science citation Index. Scientometrics, 26, 133–154.
Liu, S., & Chen, C. M. (2013). The differences between latent topics in abstracts and citation contexts of citing papers. JASIST, 64(3), 627–639.
Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. JASIST, 61(6), 1105–1119.
Marshakova, I. V. (1973). Document coupling system based on references taken from science citation Index (in Russian). Nauchno-TeknicheskayaInformatsiya, Ser. 2 6.3.
Martyn, J. (1964). Bibliographic coupling. Journal of Documentation, 20(4), 236.
Mc Cain, K. W. (1983). The author co-citation structure of macroeconomics. Scientometrics, 5(5), 277–289.
McCain, K.W. (1989). Descriptor and citation retrieval in the medical behavioral sciences literature: Retrieval over-laps and novelty distribution. JASIS, 40(2), 110–114.
Morris, S. A., Yen, G., Wu, Z., & Asnake, B. (2003). Time line visualization of research fronts. JASIST, 54(5), 413–422.
Mullins, N. C., Hargens, L. L., Hecht, P. K., & Kick, E. L. (1977). The group structure of co-citation clusters: A comparative study. American Sociological Review, 42, 552–562.
Mutschke, P., & Quan-Haase, A. (2001). Collaboration and cognitive structures in social science research fields: Towards socio-cognitive analysis in information systems. Scientometrics, 52(3), 487–502.
Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89, 349–364.
Narin, F., Pinski, G., & Gee, H. H. (1976). Structure of the biomedical literature. Journal of the American Society for Information Science, 27(1), 25–45.
Narin, F., & Noma, E. (1985). Is technology becoming science? Scientometrics, 7(3), 369–381.
Noyons, E. C. M. (1999). Bibliometric mapping as a science policy and research management tool. Leiden: Leiden University DSWO Press.
Palacios-Huerta, I., & Volij, O. (2004). The measurement of intellectual influence. Econometrica, 72(3), 963–977.
Pao, M. L. (1993). Term and citation retrieval -a field-study. Information Processing and Management, 29(1), 95–112.
Papadimitriou, C., Raghavan, P., Tamaki H. & Vempala S. (1998). Latent semantic indexing: A probabilistic analysis, PODS Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems. 159–168.
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12, 297–312.
Polanco, X., Grivel, L. & Royauté, J. (1995). How to do things with terms in informetrics : Terminological variation and stabilization as science watch indicators. In M. Koenig (Ed.), Proceedings of the 5th ISSI Intl Conference (River Forest IL, June 7-10, 1995) 435–444: Learned Information, Medford NJ.
Price, D. J. de Solla. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Price, D. J. de Solla. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.
Rafols, I., Porter, A. L., & Leydesdorff, L. (2010). Science overlay maps: A new tool for research policy and library management. JASIS, 61(9), 1871–1887.
Ritchie A., Robertson S. & Teufel S. (2008) Comparing citation context for information retrieval, CIKM’08, Proceedings 17th ACM Conference on Information and knowledge management 213–222.
Rocchio, J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing (pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.
Ross, N. C. M., & Wolfram, D. (2000). End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine. JASIST, 51(10), 949–958.
Rosvall, M., & Bergstrom, C. (2008). Maps of information flows reveal structures in complex networks. PNAS, 105, 1118.
Roth, C., & Cointet, J. P. (2010). Social and semantic coevolution in Knowledge. Social Networks, 32(1), 16–29.
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. JASIST, 41(4), 288–297.
Scharnhorst, A., Börner, K., & van den Besselaar, P. (Eds.). (2012). Models of science dynamics: Encounters between complexity theory and information sciences (Understanding Complex Systems). Berlin: Springer.
Small, H. (1973). Co-citation in the scientific literature : A new measure of the relationship between two documents. JASIS, 24(4), 265–269.
Small, H. (1980). Co-citation context analysis and the structure of paradigms. Journal of Documentation, 36(3), 183–196.
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.
Teufel S., Siddharthan A. & Tidhar D. (2006) Automatic classification of citation function, Proceedings EMNLP ‘06 Proceedings 2006 Conference on Empirical Methods in Natural Language Processing.
van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377–393.
Waltmann, L., & van Eck, N. (2012). A new methodology for constructing a publication-level classification system of science. JASIS, 63(12), 2378–2392.
Watts, C., & Gilbert, N. (2011). Does cumulative advantage affect collective learning in science? An agent-based simulation, Scientometrics, 89(1), 437–463.
White, H. D., & Griffith, B. C. (1981). Author co-citation: A literature measure of intellectual structure. JASIS, 32(3), 163–172.
Zitt, M., & Bassecoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.
Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531.
Zitt, M., Ramanana-Rahary, S., & Bassecoulard, E. (2005). Relativity of citation performance and excellence measures: From cross-field to cross-scale effects of field-normalisation. Scientometrics, 63(2), 373–401.
Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? JASIST, 62(1), 19–39. doi:10.1002/asi.21440.
Zitt M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. JASIST, 59(11), 1856–1860.
Acknowledgments
The author thanks Alain Lelu, Université de Franche-Comté and Loria, Nancy, Elise Bassecoulard, formerly Inra-Lereco, and anonymous referees, for helpful remarks; Patricia Laurens and Antoine Schoen, ESIEE, Marne la Vallée, for permission to use the genomics map, from our previous co-work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zitt, M. Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation. Scientometrics 102, 2223–2245 (2015). https://doi.org/10.1007/s11192-014-1482-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1482-5