Workshop: Theory towards Brains, Machines and MindsWorkshop: Theory towards Brains, Machines and Minds

Title: Robust Deep Learning: Challenges and New Directions

Bo Han, RIKEN Center for Advanced Intelligence Project


It is challenging to train deep neural networks robustly with noisy labels, as the capacity of deep neural networks is so high that they can totally overfit on these noisy labels. In this talk, I will introduce three orthogonal techniques in robust deep learning with noisy labels, namely data perspective “estimating the noise transition matrix”; training perspective “training on selected samples”; and regularization perspective “conducting scaled stochastic gradient ascent”. First, as an approximation of real-world corruption, noisy labels are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, the accuracy of classifiers can be improved by estimating this matrix. We present a human-assisted approach called “Masking”. Masking conveys human cognition of invalid class transitions, and naturally speculates the structure of the noise transition matrix. Given the structure information, we only learn the noise transition probability to reduce the estimation burden. Second, motivated by the memorization effects of deep networks, which shows networks fit clean instances first and then noisy ones, we present a new paradigm called “Co-teaching” even combating with extremely noisy labels. We train two networks simultaneously. First, in each mini-batch data, each network filters noisy instances based on the memorization effects. Then, it teaches the remaining instances to its peer network for updating the parameters. To tackle the consensus issue in Co-teaching, we propose a robust learning paradigm called “Co-teaching+”, which bridges the “Update by Disagreement” strategy with the original Co-teaching. Third, deep networks inevitably memorize some noisy labels, which will degrade their generalization. We propose a meta algorithm called “Pumpout” to overcome the problem of memorizing noisy labels. By using scaled stochastic gradient ascent, Pumpout actively squeezes out the negative effects of noisy labels from the training model, instead of passively forgetting these effects. We leverage Pumpout to robustify two representative methods: MentorNet and Backward Correction.