research-article

The Neural Hype and Comparisons Against Weak Baselines

Author:

Jimmy LinAuthors Info & Claims

ACM SIGIR Forum, Volume 52, Issue 2

Pages 40 - 51

https://doi.org/10.1145/3308774.3308781

Published: 17 January 2019 Publication History

Abstract

Recently, the machine learning community paused in a moment of self-reflection. In a widelydiscussed paper at ICLR 2018, Sculley et al. [13] wrote: "We observe that the rate of empirical advancement may not have been matched by consistent increase in the level of empirical rigor across the field as a whole." Their primary complaint is the development of a "research and publication culture that emphasizes wins" (emphasis in original), which typically means "demonstrating that a new method beats previous methods on a given task or benchmark". An apt description might be "leaderboard chasing"-and for many vision and NLP tasks, this isn't a metaphor. There are literally centralized leaderboards1 that track incremental progress, down to the fifth decimal point, some persisting over years, accumulating dozens of entries.

Sculley et al. remind us that "the goal of science is not wins, but knowledge". The structure of the scientific enterprise today (pressure to publish, pace of progress, etc.) means that "winning" and "doing good science" are often not fully aligned. To wit, they cite a number of papers showing that recent advances in neural networks could very well be attributed to mundane issues like better hyperparameter optimization. Many results can't be reproduced, and some observed improvements might just be noise.

References

[1]

N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, Maryland, 2004.

[2]

J. Arguello, M. Crane, F. Diaz, J. Lin, and A. Trotman. Report on the SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum, 49(2):107--116, 2015.

Digital Library

[3]

T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. EvaluatIR: An online tool for evaluating and comparing IR systems. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 833-- 833, 2009.

Digital Library

[4]

T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: Adhoc retrieval results since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 601--610, 2009.

Digital Library

[5]

R. Benham, J. S. Culpepper, L. Gallagher, X. Lu, and J. Mackenzie. Towards efficient and effective query variant generation. In Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, Bertinoro, Italy, 2018.

[6]

J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, pages 365--374, New York, NY, USA, 2014. ACM.

Digital Library

[7]

J. P. A. Ioannidis. Why most published research findings are false. PLoS Med, 2(8):e124, 2005.

[8]

S. Kharazmi, F. Scholer, D. Vallet, and M. Sanderson. Examining additivity and weak baselines. ACM Transactions on Information Systems, 34(4):Article 23, 2016.

Digital Library

[9]

J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. Toward reproducible baselines: The open-source IR reproducibility challenge. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016), pages 408--420, Padua, Italy, 2016.

[10]

H. M¨uhleisen, T. Samar, J. Lin, and A. de Vries. Old dogs are great at new tricks: Column stores for ir prototyping. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '14, pages 863--866, 2014.

Digital Library

[11]

A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, S. Santurkar, A. Tomasic, S. Toor, D. V. Aken, Z. Wang, Y. Wu, R. Xian, and T. Zhang. Self-driving database management systems. In Proceedings of the 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017), Chaminade, California, 2017.

[12]

T. Pfeiffer and R. Hoffmann. Large-scale assessment of the effect of popularity on the reliability of research. PLoS ONE, 4(6):e5996, 2009.

[13]

D. Sculley, J. Snoek, A. Rahimi, and A. Wiltschko. Winner's curse? On pace, progress, and empirical rigor. In Proceedings of the 6th International Conference on Learning Representations, Workshop Track (ICLR 2018), 2018.

[14]

A. Trotman, A. Puurula, and B. Burgess. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium, ADCS '14, pages 58:58--58:65, 2014.

Digital Library

[15]

E. M. Voorhees. Overview of the TREC 2004 Robust Track. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, Maryland, 2004.

[16]

P. Yang, H. Fang, and J. Lin. Anserini: Enabling the use of Lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17, pages 1253--1256, 2017.

Digital Library

[17]

P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using Lucene. Journal of Data and Information Quality, 10(4):Article 16, 2018.

Digital Library

Cited By

Gangadharan KPurandaran AMalathi KSubramanian BJeyaraj RJung S(2025)From Data to Decisions: The Power of Machine Learning in Business RecommendationsIEEE Access10.1109/ACCESS.2025.353269713(17354-17397)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532697
Betello FPurificato ASiciliano FTrappolini GBacciu ATonellotto NSilvestri F(2025)A Reproducible Analysis of Sequential Recommender SystemsIEEE Access10.1109/ACCESS.2024.352204913(5762-5772)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3522049
El Alaoui DRiffi JSabri AAghoutane BYahyaouy ATairi H(2024)Comparative Study of Filtering Methods for Scientific Research Article RecommendationsBig Data and Cognitive Computing10.3390/bdcc81201908:12(190)Online publication date: 16-Dec-2024
https://doi.org/10.3390/bdcc8120190
Show More Cited By

Recommendations

The neural hype, justified!: a recantation

One year ago, in the SIGIR Forum issue of December 2018, I ranted about the "neural hype" [9]. One year later, I write again to publicly recant my heretical beliefs. What a difference a year makes! In accelerated "deep learning" time, a year seems like ...
Math, Data or Hype Driven?

An old story tells about a group of people who encounter an elephant in the dark and touch on of its parts. Each person, given they know only what they can feel, comes up with a different view of what an elephant is. I was reminded of it by the three ...
Analyzing Entrepreneurship: Analyzing the Motivation, True Grit, and The Hustle of a Successful Entrepreneur

Comments

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum

ACM SIGIR Forum Volume 52, Issue 2

December 2018

177 pages

ISSN:0163-5840

DOI:10.1145/3308774

Editors:
Claudia Hauff
Delft University of Technology. The Netherlands
,
Craig Macdonald
University of Glasgow, Glasgow, UK

Issue’s Table of Contents

Copyright © 2019 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2019

Published in SIGIR Volume 52, Issue 2

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
532
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)12

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gangadharan KPurandaran AMalathi KSubramanian BJeyaraj RJung S(2025)From Data to Decisions: The Power of Machine Learning in Business RecommendationsIEEE Access10.1109/ACCESS.2025.353269713(17354-17397)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532697
Betello FPurificato ASiciliano FTrappolini GBacciu ATonellotto NSilvestri F(2025)A Reproducible Analysis of Sequential Recommender SystemsIEEE Access10.1109/ACCESS.2024.352204913(5762-5772)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3522049
El Alaoui DRiffi JSabri AAghoutane BYahyaouy ATairi H(2024)Comparative Study of Filtering Methods for Scientific Research Article RecommendationsBig Data and Cognitive Computing10.3390/bdcc81201908:12(190)Online publication date: 16-Dec-2024
https://doi.org/10.3390/bdcc8120190
Fröbe MReimer JMacAvaney SDeckers NReich SBevendorff JStein BHagen MPotthast MLarson K(2024)The information retrieval experiment platform (extended abstract)Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/931(8405-8410)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/931
Formal TLassance CPiwowarski BClinchant S(2024)Towards Effective and Efficient Sparse Neural Information RetrievalACM Transactions on Information Systems10.1145/363491242:5(1-46)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3634912
Anand ALeonhardt JSingh JRudra KAnand A(2024)Data Augmentation for Sample Efficient and Robust Document RankingACM Transactions on Information Systems10.1145/363491142:5(1-29)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3634911
Kapoor SCantrell EPeng KPham TBail CGundersen OHofman JHullman JLones MMalik MNanayakkara PPoldrack RRaji IRoberts MSalganik MSerra-Garcia MStewart BVandewiele GNarayanan A(2024)REFORMS: Consensus-based Recommendations for Machine-learning-based ScienceScience Advances10.1126/sciadv.adk345210:18Online publication date: 3-May-2024
https://doi.org/10.1126/sciadv.adk3452
Mukhtar AJannach DWotawa F(2024)Investigating Reproducibility in Deep Learning-Based Software Fault Prediction2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00038(306-317)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00038
Serra GTiňo PXu ZYao X(2024)An Interpretable Alternative to Neural Representation Learning for Rating Prediction - Transparent Latent Class Modeling of User Reviews2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651464(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651464
Shehzad FJannach D(2024)Performance Comparison of Session-Based Recommendation Algorithms Based on GNNsAdvances in Information Retrieval10.1007/978-3-031-56066-8_12(115-131)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_12
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents