Research Highlights

My research focuses on two goals: (1) dataset uncertainty estimation, (2) the synergy of artificial intelligence to enable human intelligence. To this end, I established confident learning, a family of theory and algorithms for characterizing, finding, and learning with label errors in datasets, and cleanlab, the official Python framework for machine learning and deep learning with noisy labels in datasets. For an overview of my published research, please visit Google Scholar.

In addition to my MIT research, I am Chief AI Scientist at Knowledge AI, the principal author of the L7 machine learning blog, a rapper by the name of PomDP the PhD rapper, and a contingent research scientist at Oculus Research.

I’ve helped tens-hundreds of researchers build affordable state-of-the-art deep learning machines. My work in this area is available in L7 machine learning blog

Selected papers / projects

Confident Learning: Estimating Uncertainty for Dataset Labels, Curtis G. Northcutt, Lu Jiang, & Isaac L. Chuang, arXiv pre-print, 2019. [paper | code | blog]

A family of theory and algorithms for characterizing, finding, and learning with label errors in datasets. Confident learning improves on state-of-the-art (2019) approaches for learning with noisy labels by 30% on CIFAR benchmarks in the presence of high label noise.

cleanlab [code | docs]

The official Python framework for machine learning and deep learning with noisy labels in datasets.

L7 Machine Learning Blog [l7.curtisnorthcutt.com]

An ML research blog focusing on deep learning, learning with noisy labels, and the synergy of machine and human learning.

Synergy of Machine Learning and Human Learning [slides]

A tutorial-style framing of topics in the field and problem-solution slide-pairs for online education.

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels, Curtis G. Northcutt, Tailin Wu, & Isaac L. Chuang, Uncertainty in Artificial Intelligence (UAI), 2017. [paper | code | arXiv]

A state-of-the-art, robust, time-efficient, general algorithm for classification with noisy labels.

Comment Ranking Diversification in Forum Discussions, Curtis G. Northcutt, Kimberly Leon, & Naichun Chen, Learning at Scale, 2017. [paper | code | free-access]

A simple re-ranking algorithm that improves the fairness and representation of diverse opinions in online commenting and forums.

Deterring cheating in online environments, Henry Corrigan-Gibbs, Nakull Gupta, Curtis G. Northcutt, Edward Cutrell, & William Thies, ACM Transactions on Computer-Human Interaction (TOCHI), 2015. [paper | free-access]

A neat set of experiments that use a “honeypot” to find cheaters in online courses and demonstrates experimentally the reduction of student cheating for various honor codes and warnings before an exam.

Detecting and preventing “multiple-account” cheating in massive open online courses, Curtis G. Northcutt, Andrew Ho, & Isaac L. Chuang, Computers & Education, 2016. [paper | code | arXiv]

A simple-to-implement cheating detection algorithm used by MITx and HarvardX online courses to detect a widespread form of cheating in MOOCs.

Security of Cyber-Physical Systems: A Generalized Algorithm for Intrusion Detection and Determining Security Robustness of Cyber Physical Systems using Logical Truth Tables, Curtis G. Northcutt, Vanderbilt Undergraduate Research Journal, 2013. [paper]

A simple solution for intrusion detection in cyber physical systems.