skip to main content
research-article

The Neural Hype and Comparisons Against Weak Baselines

Published: 17 January 2019 Publication History

Abstract

Recently, the machine learning community paused in a moment of self-reflection. In a widelydiscussed paper at ICLR 2018, Sculley et al. [13] wrote: "We observe that the rate of empirical advancement may not have been matched by consistent increase in the level of empirical rigor across the field as a whole." Their primary complaint is the development of a "research and publication culture that emphasizes wins" (emphasis in original), which typically means "demonstrating that a new method beats previous methods on a given task or benchmark". An apt description might be "leaderboard chasing"-and for many vision and NLP tasks, this isn't a metaphor. There are literally centralized leaderboards1 that track incremental progress, down to the fifth decimal point, some persisting over years, accumulating dozens of entries.
Sculley et al. remind us that "the goal of science is not wins, but knowledge". The structure of the scientific enterprise today (pressure to publish, pace of progress, etc.) means that "winning" and "doing good science" are often not fully aligned. To wit, they cite a number of papers showing that recent advances in neural networks could very well be attributed to mundane issues like better hyperparameter optimization. Many results can't be reproduced, and some observed improvements might just be noise.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, Maryland, 2004.
[2]
J. Arguello, M. Crane, F. Diaz, J. Lin, and A. Trotman. Report on the SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum, 49(2):107--116, 2015.
[3]
T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. EvaluatIR: An online tool for evaluating and comparing IR systems. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 833-- 833, 2009.
[4]
T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: Adhoc retrieval results since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 601--610, 2009.
[5]
R. Benham, J. S. Culpepper, L. Gallagher, X. Lu, and J. Mackenzie. Towards efficient and effective query variant generation. In Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, Bertinoro, Italy, 2018.
[6]
J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, pages 365--374, New York, NY, USA, 2014. ACM.
[7]
J. P. A. Ioannidis. Why most published research findings are false. PLoS Med, 2(8):e124, 2005.
[8]
S. Kharazmi, F. Scholer, D. Vallet, and M. Sanderson. Examining additivity and weak baselines. ACM Transactions on Information Systems, 34(4):Article 23, 2016.
[9]
J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. Toward reproducible baselines: The open-source IR reproducibility challenge. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016), pages 408--420, Padua, Italy, 2016.
[10]
H. M¨uhleisen, T. Samar, J. Lin, and A. de Vries. Old dogs are great at new tricks: Column stores for ir prototyping. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, pages 863--866, 2014.
[11]
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In Proceedings of the 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017), Chaminade, California, 2017.
[12]
T. Pfeiffer and R. Hoffmann. Large-scale assessment of the effect of popularity on the reliability of research. PLoS ONE, 4(6):e5996, 2009.
[13]
D. Sculley, J. Snoek, A. Rahimi, and A. Wiltschko. Winner's curse? On pace, progress, and empirical rigor. In Proceedings of the 6th International Conference on Learning Representations, Workshop Track (ICLR 2018), 2018.
[14]
A. Trotman, A. Puurula, and B. Burgess. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium, ADCS '14, pages 58:58--58:65, 2014.
[15]
E. M. Voorhees. Overview of the TREC 2004 Robust Track. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, Maryland, 2004.
[16]
P. Yang, H. Fang, and J. Lin. Anserini: Enabling the use of Lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17, pages 1253--1256, 2017.
[17]
P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using Lucene. Journal of Data and Information Quality, 10(4):Article 16, 2018.

Cited By

View all
  • (2025)From Data to Decisions: The Power of Machine Learning in Business RecommendationsIEEE Access10.1109/ACCESS.2025.353269713(17354-17397)Online publication date: 2025
  • (2025)A Reproducible Analysis of Sequential Recommender SystemsIEEE Access10.1109/ACCESS.2024.352204913(5762-5772)Online publication date: 2025
  • (2024)Comparative Study of Filtering Methods for Scientific Research Article RecommendationsBig Data and Cognitive Computing10.3390/bdcc81201908:12(190)Online publication date: 16-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum
ACM SIGIR Forum  Volume 52, Issue 2
December 2018
177 pages
ISSN:0163-5840
DOI:10.1145/3308774
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2019
Published in SIGIR Volume 52, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)12
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)From Data to Decisions: The Power of Machine Learning in Business RecommendationsIEEE Access10.1109/ACCESS.2025.353269713(17354-17397)Online publication date: 2025
  • (2025)A Reproducible Analysis of Sequential Recommender SystemsIEEE Access10.1109/ACCESS.2024.352204913(5762-5772)Online publication date: 2025
  • (2024)Comparative Study of Filtering Methods for Scientific Research Article RecommendationsBig Data and Cognitive Computing10.3390/bdcc81201908:12(190)Online publication date: 16-Dec-2024
  • (2024)The information retrieval experiment platform (extended abstract)Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/931(8405-8410)Online publication date: 3-Aug-2024
  • (2024)Towards Effective and Efficient Sparse Neural Information RetrievalACM Transactions on Information Systems10.1145/363491242:5(1-46)Online publication date: 29-Apr-2024
  • (2024)Data Augmentation for Sample Efficient and Robust Document RankingACM Transactions on Information Systems10.1145/363491142:5(1-29)Online publication date: 29-Apr-2024
  • (2024)REFORMS: Consensus-based Recommendations for Machine-learning-based ScienceScience Advances10.1126/sciadv.adk345210:18Online publication date: 3-May-2024
  • (2024)Investigating Reproducibility in Deep Learning-Based Software Fault Prediction2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00038(306-317)Online publication date: 1-Jul-2024
  • (2024)An Interpretable Alternative to Neural Representation Learning for Rating Prediction - Transparent Latent Class Modeling of User Reviews2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651464(1-8)Online publication date: 30-Jun-2024
  • (2024)Performance Comparison of Session-Based Recommendation Algorithms Based on GNNsAdvances in Information Retrieval10.1007/978-3-031-56066-8_12(115-131)Online publication date: 15-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media