Call for Papers

Deadline: October 30, 2018, 11:59 PM UTC

The one day NeurIPS 2018 Workshop: Critiquing and Correcting Trends in Machine Learning calls for papers that critically examine current common practices and/or trends in methodology, datasets, empirical standards, publication models, or any other aspect of machine learning research. Though we are happy to receive papers that bring attention to problems for which there is no clear immediate remedy, we particularly encourage papers which propose a solution or indicate a way forward. Papers should motivate their arguments by describing gaps in the field. Crucially, this is not a venue for settling scores or character attacks, but for moving machine learning forward as a scientific discipline.

To help guide submissions, we have split up the call for papers into the follows tracks. Please indicate the intended track when making your submission. Papers are welcome from all subfields of machine learning. If you have a paper which you feel falls within the remit of the workshop but does not clearly fit one of these tracks, please contact the organizers at:

Bad Practices (1-4 pages)
Papers that highlight common bad practices or unjustified assumptions at any stage of the research process. These can either be technical shortfalls in a particular machine learning subfield, or more procedural bad practices of the ilk of those discussed in [17]. When possible, papers should also try to highlight work which does not fall foul of these bad practices, as examples of how they can be avoided.

Flawed Intuitions or Unjustified Assumptions (3-4 pages)
Papers that call into question commonly held intuitions or provide clear evidence either for or against assumptions that are regularly taken for granted without proper justification. For example, we would like to see papers which provide empirical assessments to test out metrics, verify intuitions, or compare popular current approaches with historic baselines that may have unfairly fallen out of favour (see e.g. [2]). Such submissions are encouraged regardless of whether these assessments ultimately result in positive or negative results. We would also like to see work which provides results which makes us rethink our intuitions or the assumptions we typically make.

Negative Results (3-4 pages)
Papers which show failure modes of existing algorithms or suggest new approaches which one might expect to perform well but which do not. The aim of the latter of these is to provide a venue for work which might otherwise go unpublished but which is still of interest to the community, for example by dissuading other researchers from similar ultimately unsuccessful approaches. Though it is inevitably preferable that papers are able to explain why the approach performs poorly, this is not essential if the paper is able to demonstrate why the negative result is of interest to the community in its own right.

Research Process (1-4 pages)
Papers which provide carefully thought through critiques, provide discussion on, or suggest new approaches to areas such as the conference model, the reviewing process, the role of industry in research, open sourcing of code and data, institutional biases and discrimination in the field, research ethics, reproducibility standards, and allocation of conference tickets.

Debates (1-2 pages)
Short proposition papers which discuss issues either affecting all of machine learning or significantly sized subfields (e.g. reinforcement learning, Bayesian methods, etc). Selected papers will be used as the basis for instigating online forum debates before the workshop, leading up to live discussions on the day itself.

Open Problems (1-4 pages/short talks)
Papers that describe either (a) unresolved questions in existing fields that need to be addressed, (b) desirable operating characteristics for ML in particular application areas that have yet to be achieved, or (c) new frontiers of machine learning research that require rethinking current practices (e.g., error diagnosis for when many ML components are interoperating within a system, automating dataset collection/creation).

Submission Instructions
Papers should be submitted as pdfs using the NeurIPS LaTeX style file. Author names should be anonymized.

Acceptance and Registrations
All accepted papers will be made available through the workshop website and presented as a poster. Selected papers will also be given contributed talks. We are able to add a moderate number of accepted paper authors to the pool of reserved tickets. In the event that the number of accepted papers exceeds our reserved ticket allocation, assignments to the reserved ticket pool will be allocated based on review scores. We further have a small number of complimentary workshop registrations that will be handed out to selected papers. If any authors are unable to attend the workshop due visa, ticketing, or funding issues, they will be allowed to provide a video presentation for their work that will be made available through the workshop website in lieu of a poster presentation.

Please submit papers here:


Recently there have been calls to make machine learning more reproducible, less hand-tailored, fair, and generally more thoughtful about how research is conducted and put into practice. These are hallmarks of a mature scientific field and will be crucial for machine learning to have the wide-ranging, positive impact it is expected to have. Without careful consideration, we as a field risk inflating expectations beyond what is possible. To address this, this workshop aims to better understand and to improve all stages of the research process in machine learning.

A number of recent papers have carefully considered trends in machine learning as well as the needs of the field when used in real-world scenarios [1-18]. Each of these works introspectively analyzes what we often take for granted as a field. Further, many propose solutions for moving forward. The goal of this workshop is to bring together researchers from all subfields of machine learning to highlight open problems and widespread dubious practices in the field, and crucially, to propose solutions. We hope to highlight issues and propose solutions in areas such as:
- Common practices [1, 8]
- Implicit technical and empirical assumptions that go unquestioned [2, 3, 5, 7, 11, 12, 13, 17, 18]
- Shortfalls in publication and reviewing setups [15, 16]
- Disconnects between research focus and application requirements [9, 10, 14]
- Surprising observations that make us rethink our research priorities [4, 6]

The workshop program is a collection of invited talks, alongside contributed posters and talks. For some of these talks, we plan a unique open format of 10 minutes of talk + 10 minutes of follow up discussion. Additionally, a separate panel discussion will collect researchers with a diverse set of viewpoints on the current challenges and potential solutions. During the panel, we will also open the conversation to the audience. The discussion will further be open to an online Q&A which will be solicited prior to the workshop.

A key expected outcome of the workshop is a collection of important open problems at all levels of machine learning research, along with a record of various bad practices that we should no longer consider to be acceptable. Further, we hope that the workshop will make inroads in how to address these problems, highlighting promising new frontiers for making machine learning practical, robust, reproducible, and fair when applied to real-world problems.

We are soliciting questions for the Panel Discussion here!:

Schedule (Rooms 511 ABDE)

08:30 Opening remarks Brooks Paige
08:40 Invited talk But it’s hard! Zachary Lipton
09:05 Invited talk Applied Machine Learning at Facebook Scale: Separating Opportunity from Hype Kim Hazelwood
09:30 Contributed talk Expanding search in the space of empirical ML Bronwyn Woods
09:40 Contributed talk Opportunities for machine learning research to support fairness in industry practice Kenneth Holstein
09:50 Spotlights Surprising Negative Results for Generative Adversarial Tree Search
Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments
Please Stop Explaining Black Box Models for High Stakes Decisions
Theoretical guarantees and empirical evaluation: how large is the gap?
Can VAEs Generate Novel Examples?
Language GANs Falling Short
10:20 Poster Session 1 & Coffee break
11:10 Invited talk Correcting and Critiquing Trends in Interpretable ML Finale Doshi-Velez
11:35 Invited talk Knowing when not to trust: Reliable Prediction by Leveraging Causal Mechanisms Suchi Saria
12:00 Lunch
13:30 Invited talk Are we making progress? Sebastian Nowozin
13:55 Contributed talk Using Cumulative Distribution Based Performance Analysis to Benchmark Models Scott Jordan
14:05 Invited talk Reviewing models, prepublication models, and other things that tend to make people upset Charles Sutton
14:30 Contributed talk On Avoiding Tragedy of the Commons in the Peer Review Process D. Sculley
14:40 Spotlights Conference ticket allocation via non-uniform random selection to address systemic biases
Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research [video link]
What's in a name? It's time to nip NIPS
Code as Scholarship: Extensible Software Experiments
15:00 Poster Session 2 & Coffee break
15:30 Panel on Research Process Suchi Saria, Charles Sutton, Finale Doshi-Velez, Hanna Wallach, Rich Caruana, Zachary Lipton
Feel free to add questions for the panel discussion:
Panel Moderator:
Tom Rainforth
16:30 Poster Session 3

Accepted Papers

Spotlight Talk Surprising Negative Results for Generative Adversarial Tree Search [paper]
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Emma Brunskill, Zachary Lipton and Animashree Anandkumar.
Deriving the Recurrent Neural Network Definition and RNN Unrolling Using Signal Processing [paper]
Alex Sherstinsky.
Critiquing Intuitions for Learning Rate Restarts [paper]
Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong and Richard Socher.
Lebesgue Regression [paper]
Yotam Hechtlinger, Niccolo Dalmasso, Alessandro Rinaldo and Larry Wasserman.
Oral Opportunities for machine learning research to support fairness in industry practice [paper]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé Iii, Miroslav Dudík and Hanna Wallach.
An Evaluation of the Human-Interpretability of Explanation [paper]
Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Samuel Gershman, Been Kim and Finale Doshi-Velez.
Spotlight Talk Conference ticket allocation via non-uniform random selection to address systemic biases [paper]
Jessica Thompson, Laurent Dinh, Layla El Asri and Nicolas Le Roux.
Do Deep Convolutional Network Layers Need to be Trained End-to-End? [paper]
Eugene Belilovsky, Michael Eickenberg and Edouard Oyallon.
Characterising activation functions by their backward dynamics around forward fixed points [paper]
Pieter-Jan Hoedt, Sepp Hochreiter and Günter Klambauer.
Bad practices in evaluation methodology relevant to class-imbalanced problems [paper]
Jan Brabec and Lukas Machlica.
On the Evaluation of Common-Sense Reasoning in Natural Language Understanding [paper]
Paul Trichelair, Ali Emami, Jackie Cheung, Adam Trischler, Kaheer Suleman and Fernando Diaz.
Questioning the assumptions behind fairness solutions [paper]
Rebekah Overdorf, Bogdan Kulynych, Ero Balsa, Carmela Troncoso and Seda Guerses.
Rethinking Layer-wise Feature Amounts in Convolutional Neural Network Architectures [paper]
Martin Mundt, Sagnik Majumder, Tobias Weis and Visvanathan Ramesh.
Spotlight Talk Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research [paper]
Peter Henderson and Emma Brunskill.
Towards Optimal Design of Datasets and Validation Scheme for Autonomous Driving [paper]
Michal Uricar, Pavel Krizek, David Hurych and Senthil Yogamani.
Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data [paper]
Tzu-Yu Liu, Ajay Kannan, Adam Drake, Marvin Bertin and Nathan Wan.
Spotlight Talk Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments [paper]
Kaleigh Clary, Emma Tosch, John Foley and David Jensen.
Spotlight Talk Please Stop Explaining Black Box Models for High Stakes Decisions [paper]
Cynthia Rudin.
Causal importance of orientation selectivity for generalization in image recognition [paper]
Jumpei Ukita.
Life at the edge: accelerators and obstacles to emerging ML-enabled research fields [paper]
Soukayna Mouatadid and Steve Easterbrook.
Generalization in Deep Reinforcement Learning [paper]
Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, David Jensen and Michael Littman.
On the Implicit Assumptions of GANs [paper]
Ke Li and Jitendra Malik.
Oral Expanding search in the space of empirical ML [paper]
Bronwyn Woods.
Refactoring Machine Learning [paper]
Andrew Ross and Jessica Forde.
Evaluating Generative Adversarial Networks on Explicitly Parameterized Distributions [paper]
Shayne O'Brien, Matthew Groh and Abhimanyu Dubey.
The Sheepdog and the Telescope: Application Paradigms in Machine Learning [paper]
David Mimno.
Spotlight Talk What's in a name? It's time to nip NIPS [paper]
Daniela Witten, Elana Fertig, Anima Anandkumar and Jeff Dean.
Spotlight Talk Theoretical guarantees and empirical evaluation: how large is the gap? [paper]
Marina Meila and Yali Wan.
Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design [paper]
Ryan-Rhys Griffiths, Philippe Schwaller and Alpha Lee.
Oral Avoiding a Tragedy of the Commons in the Peer Review Process [paper]
D. Sculley, Jasper Snoek and Alex Wiltschko.
Oral Using Cumulative Distribution Based Performance Analysis to Benchmark Models [paper]
Scott Jordan, Daniel Cohen and Phillip Thomas.
Spotlight Talk Can VAEs Generate Novel Examples? [paper]
Alican Bozkurt, Babak Esmaeili, Dana Brooks, Jennifer Dy and Jan-Willem Van de Meent.
Visual Dialogue without Vision or Dialogue [paper]
Daniela Massiceti, Puneet Dokania, Siddharth Narayanaswamy and Phil Torr.
Spotlight Talk Code as Scholarship: Extensible Software Experiments [paper]
Jessica Forde.
Spotlight Talk Language GANs Falling Short [paper]
Massimo Caccia, Lucas Caccia, Laurent Charlin, William Fedus, Hugo Larochelle and Joelle Pineau.
Generalization in anti-causal learning [paper]
Niki Kilbertus, Giambattista Parascandolo and Bernhard Schölkopf.


Please direct any questions to


Mania, H., Guy, A., & Recht, B. (2018). Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055.
Rainforth, T., Kosiorek, A. R., Le, T. A., Maddison, C. J., Igl, M., Wood, F., & Teh, Y. W. (2018). Tighter variational bounds are not necessarily better. ICML.
Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1521-1528). IEEE.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
Mescheder, L., Geiger, A., Nowozin S. (2018) Which Training Methods for GANs do actually Converge? ICML
Daumé III, H. (2009). Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815
Urban, G., Geras, K. J., Kahou, S. E., Wang, O. A. S., Caruana, R., Mohamed, A., ... & Richardson, M. (2016). Do deep convolutional nets really need to be deep (or even convolutional)?.
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2017). Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560.
Narayanan, M., Chen, E., He, J., Kim, B., Gershman, S., & Doshi-Velez, F. (2018). How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv preprint arXiv:1802.00682.
Schulam, S., Saria S. (2017). Reliable Decision Support using Counterfactual Models. NIPS.
Rahimi, A. (2017). Let's take machine learning from alchemy to electricity. Test-of-time award presentation, NIPS.
Lucic, M., Kurach, K., Michalski, M., Gelly, S., Bousquet, O. (2018). Are GANs Created Equal? A Large-Scale Study. arXiv preprint arXiv:1711.10337.
Le, T.A., Kosiorek, A.R., Siddharth, N., Teh, Y.W. and Wood, F., (2018). Revisiting Reweighted Wake-Sleep. arXiv preprint arXiv:1805.10469.
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mané, D., (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
Sutton, C. (2018) Making unblinding manageable: Towards reconciling prepublication and double-blind review.
Langford, J. (2018) ICML Board and Reviewer profiles.
Lipton, Zachary C., and Jacob Steinhardt (2018). "Troubling Trends in Machine Learning Scholarship." arXiv preprint arXiv:1807.03341.
Kaushik, Divyansh, and Zachary C. Lipton (2018). "How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks." arXiv preprint arXiv:1808.04926.