My name is Yang Song (宋飏, Sòng Yáng), and I am a third year Ph.D. student at Computer Science Department, Stanford University. My advisor is Stefano Ermon. Prior to joining Stanford, I obtained my Bacheor's degree in Mathematics and Physics from Tsinghua University. During my undergraduate study, I have been fortunate enough to work with Jun Zhu, Raquel Urtasun, Richard Zemel and Alexander Schwing.

I am generally interested in machine learning theory and applications, especially in generative models and AI safety. You can contact me at A@B, where A = yangsong and B = cs.stanford.edu.

[My Google Scholar profile]



Computer Science Department, Stanford University, California, USA

  • Ph.D. student in Computer Science.
Jun. - Sep. 2017

Machine Intelligence and Perception Group, Microsoft Research, Cambridge, UK

Research internship advised by Dr. Nate Kushman.

Aug. 2012 - Aug. 2016

Department of Physics, Tsinghua University, Beijing, China

  • B.S. in Mathematics and Physics.
  • Research assistant in Prof. Jun Zhu's group.
Jul. - Sep. 2015

Machine Learning Group, Department of Computer Science, University of Toronto, Toronto, Canada

Research internship advised by Prof. Raquel Urtasun and Prof. Richard Zemel.

Jul. 2014

Melbourne Graduate School of Science, University of Melbourne, Melbourne, Australia

A special summer camp for interdisciplinary study of Mathematics, Physics and Chemistry


Constructing Unrestricted Adversarial Examples with Generative Models

Yang Song, Rui Shu, Nate Kushman, Stefano Ermon

32nd Conference on Neural Information Processing Systems, Montréal, Canada. (NeurIPS 2018)

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose unrestricted adversarial examples, a new threat model where the attackers are not restricted to small norm-bounded perturbations. Different from perturbation-based attacks, we propose to synthesize unrestricted adversarial examples entirely from scratch using conditional generative models. Specifically, we first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over data samples. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that unrestricted adversarial examples generated this way are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.

Accelerating Natural Gradient with Higher-Order Invariance

Yang Song, Jiaming Song, Stefano Ermon

35th International Conference on Machine Learning, Stockholm, Sweden. (ICML 2018)

An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.

PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples

Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, Nate Kushman

6th International Conference on Learning Representations, Vancouver, Canada. (ICLR 2018)

Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of image classifiers? In this paper, we show empirically that adversarial examples mainly lie in the low probability regions of the training distribution, regardless of attack types and targeted models. Using statistical hypothesis testing, we find that modern neural density models are surprisingly good at detecting imperceptible image perturbations. Based on this discovery, we devised PixelDefend, a new approach that purifies a maliciously perturbed image by moving it back towards the distribution seen in the training data. The purified image is then run through an unmodified classifier, making our method agnostic to both the classifier and the attacking method. As a result, PixelDefend can be used to protect already deployed models and be combined with other model-specific defenses. Experiments show that our method greatly improves resilience across a wide variety of state-of-the-art attacking methods, increasing accuracy on the strongest attack from 63% to 84% for Fashion MNIST and from 32% to 70% for CIFAR-10.

Kernel Bayesian Inference with Posterior Regularization

Yang Song, Jun Zhu, Yong Ren

30th Conference on Neural Information Processing Systems, Barcelona, Spain. (NIPS 2016)

We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.

Stochastic Gradient Geodesic MCMC Methods

Chang Liu, Jun Zhu, Yang Song

30th Conference on Neural Information Processing Systems, Barcelona, Spain. (NIPS 2016)

We propose two stochastic gradient MCMC methods for sampling from Bayesian posterior distributions defined on Riemann manifolds with a known geodesic flow, e.g. hyperspheres. Our methods are the first scalable sampling methods on these manifolds, with the aid of stochastic gradients. Novel dynamics are conceived and second-order integrators are developed. By adopting embedding techniques and the geodesic integrator, the methods do not require a global coordinate system of the manifold and do not involve inner iterations. Synthetic experiments show the validity of the method, and its application to the challenging inference for spherical topic models indicate practical usability and efficiency.

Training Deep Neural Networks via Direct Loss Minimization

Yang Song, Alexander Schwing, Richard Zemel, Raquel Urtasun

33rd International Conference on Machine Learning, New York City, USA. (ICML 2016)

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization approach to train deep neural networks, which provably minimizes the application-specific loss function. This is often non-trivial, since these functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise.

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

Yang Song, Jun Zhu.

30th AAAI Conference on Artificial Intelligence, Phoenix, USA. (AAAI 2016)

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.