Skip to main content
Log in

Microsoft Academic: is the phoenix getting wings?

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this article, we compare publication and citation coverage of the new Microsoft Academic with all other major sources for bibliometric data: Google Scholar, Scopus, and the Web of Science, using a sample of 145 academics in five broad disciplinary areas: Life Sciences, Sciences, Engineering, Social Sciences, and Humanities. When using the more conservative linked citation counts for Microsoft Academic, this data-source provides higher citation counts than both Scopus and the Web of Science for Engineering, the Social Sciences, and the Humanities, whereas citation counts for the Life Sciences and the Sciences are fairly similar across these three databases. Google Scholar still reports the highest citation counts for all disciplines. When using the more liberal estimated citation counts for Microsoft Academic, its average citations counts are higher than both Scopus and the Web of Science for all disciplines. For the Life Sciences, Microsoft Academic estimated citation counts are higher even than Google Scholar counts, whereas for the Sciences they are almost identical. For Engineering, Microsoft Academic estimated citation counts are 14% lower than Google Scholar citation counts, whereas for the Social Sciences this is 23%. Only for the Humanities are they substantially (69%) lower than Google Scholar citations counts. Overall, this first large-scale comparative study suggests that the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis. We therefore conclude that the Microsoft Academic Phoenix is undeniably growing wings; it might be ready to fly off and start its adult life in the field of research evaluation soon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. Earlier articles on the same dataset (Harzing et al. 2014; Harzing and Alakangas 2016) included an error in the number of observations by discipline, which were accidentally reversed for the Sciences and Life Sciences. This did not impact on any of the articles statistics or conclusions, but the error was corrected for this paper. Furthermore, we had to remove one academic in the Life Sciences from the original sample of 146 academics as his name was so common that it was impossible to achieve reliable search results.

  2. Once queries were defined, repeating Microsoft Academic searches took less than 10 min for the entire sample of 145 academics. Due to the much longer necessary delays between requests, Google Scholar searches took several hours, but did not require continuous attention. Scopus and Web of Science searches took up to a full day and required continuous attention as searches involved quite a number of steps for each individual academic.

  3. Scopus and the Web of Science also contain stray publications, and often—especially for authors with non-journal publications—a far larger number than Google Scholar and Microsoft Academic. However, strays are not shown when using the general search options, most commonly employed for bibliometric studies. For the first author, Scopus reports no less than 442 secondary documents, in addition to the 71 documents shown in the general search. The Web of Science Cited Reference Search would have shown a similar number if she had not submitted weekly data change reports for years, requesting the merging of stray publications into their respective master records. For the first author’s record, both databases thus have more stray publications than either Google Scholar or Microsoft Academic.

  4. Since MA sources publication records from the entire web, it often finds multiple versions of the same article, and in many cases, they don’t agree on the details. A machine learning based system corroborates multiple accounts of the same publication, and only if a confidence threshold is passed does MA deem the record credible and assigns a unique “paper entity ID” to it. A citing paper can fail the test and not get an entity ID if MA cannot verify its claimed publication venue, or authorships. The same verification is conducted on each referred article as well. A citation can fail the test for the same aforementioned reasons, or if the paper title is changed. If the test fails because of the publication date, the system can self-correct as more corroborative evidence is observed from the web crawl (Wang 2016).

  5. Estimated citation counts are using a technique statisticians have developed to estimate the true size of a population if one can only observe a small portion, but can afford to sample multiple times. The math allows taking a portion of the data, counting how many “new” items are not seen before, and inferring how small a portion was sampled. MA’s “linked” citations are a statistical sample of the true citations each paper receives. MA can also find other samples from the web, including GS, other publishers’ websites, etc. MA combines all these as multiple samples and applies the size estimation formula on them. The estimation quality is better if the statistics from samples agree more with one another. As a result, the variance in the estimated counts is not uniform. For fields that have done a better job to put publications online, there are smaller differences between MA and GS results (Wang 2016).

References

  • Delgado-López-Cózar, E., & Repiso-Caballero, R. (2013). El impacto de las revistas de comunicación: comparando Google Scholar Metrics, Web of Science y Scopus. Comunicar: Revista Científica de Comunicación y Educación, 21(41), 45–52.

    Article  Google Scholar 

  • Harzing, A. W. (2007) Publish or Perish. http://www.harzing.com/pop.htm.

  • Harzing, A. W. (2016). Microsoft Academic (search): A Phoenix arisen from the ashes? Scientometrics, 108(3), 1637–1647.

    Article  Google Scholar 

  • Harzing, A. W., & Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics, 106(2), 787–804.

    Article  Google Scholar 

  • Harzing, A. W., Alakangas, S., & Adams, D. (2014). hIa: An individual annual h-index to accommodate disciplinary and career length differences. Scientometrics, 99(3), 811–821.

    Article  Google Scholar 

  • Herrmannova, D., & Knoth, P. (2016). An analysis of the microsoft academic graph. D-Lib Magazine, 22(9/10).

  • Hirsch, J. E. (2005) An index to quantify an individual’s scientific research output. arXiv:physics/0508025. v5 29 September 2006.

  • Orduña-Malea, E., Martín-Martín, A., M. Ayllon, J., & Delgado Lopez-Cozar, E. (2014). The silent fading of an academic search engine: the case of Microsoft Academic Search. Online Information Review, 38(7), 936–953.

    Article  Google Scholar 

  • Wang, K. (2016) Personal communication with Kuansan Wang, Managing Director at Microsoft Research Outreach, 31 October 2016.

  • Wildgaard, L. (2015). A comparison of 17 author-level bibliometric indicators for researchers in Astronomy, Environmental Science, Philosophy and Public Health in Web of Science and Google Scholar. Scientometrics, 104(3), 873–906.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Wil Harzing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harzing, AW., Alakangas, S. Microsoft Academic: is the phoenix getting wings?. Scientometrics 110, 371–383 (2017). https://doi.org/10.1007/s11192-016-2185-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2185-x

Keywords

Navigation