Department Seminar Series

Theoretical understanding of learning dynamics in modern deep learning

13th June 2023, 13:00 add to calenderAshton Lecture Theatre
Dr. Zhanxing Zhu
Changping National Lab, China

Abstract

It has been a long-standing debate that “Is deep learning alchemy or science? ”, since the success of deep learning mostly relies on various engineering design and tricks, lack of theoretical foundation. Unfortunately, the underlying mechanism of deep learning is still mysterious, which severely limits its further development from both theoretical and application aspects.

In this talk, I will introduce some of our attempts on theoretically understanding deep learning, mainly focusing on analyzing its training dynamics and tricks, including gradient descent, stochastic gradient descent (SGD), batch normalization and adversarial training. 1) We analyze the implicit regularization property of gradient descent and SGD from both local and global point of view, i.e. interpreting why SGD could find well generalizing minima compared with other alternatives; (2) We comprehensively reveal the learning dynamics of SGD with batch normalization and weight decay, named as Spherical Motion Dynamics, and show how the dynamics achieve its equilibrium state. (3) We theoretically characterize the implicit bias of adversarial training scheme that is widely used for improving the robustness resistant to adversarial attacks. Our results theoretically justify the longstanding conjecture that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness. These new findings shed light on understanding the deep learning towards opening this black-box and also inspires new algorithmic design. Finally, a landscape for next generation of deep learning will be provided for further discussion.
add to calender (including abstract)