• Welcome!
    This is Jùnchéng (Billy) Lì

    Ph.D. Student at Language Technology Institute Carnegie Mellon University

    Download CV

  • I am a

    I have worked on Machine Learning in Audio and Multimodal dataset

    Google Sholar

About Me

What is my Story?

Hi my name is Jùnchéng (Billy) Lì. 励骏成 I came back from the industry to finish my PhD study at CMU at fall 2019. I have spent 7 years on topics related to deep learning. I have worked on Deep Learning's applications in audio and multi-modal data. Recently, I have worked on understanding deep learning's vulnerabilities and robustness. I have always been fascinated by the magical effects of deep neural networks, meanwhile, the unexplainable behaviors of the neural nets kept haunting me. In my research careers, I have been influenced by various beliefs:
"Deep learning is the dark side, and convex optimization is the true justice."
"We have been wasting so much time building symbolic knowledge-based systems, and history proved that Alpha-zero and BERT are the only real things that worked."
"Human knowledge is the corner stone of AI, the only valid path to building AI is by teaching machines to think like humans." .......
It is very tempting for young research Jedis to fall believing in any of these "dogmas" since they are all very seductive to a certain group of people with specific background.
However, I believe there's a fine balance, a bridge that goes between all the communities: the ML community, the theory community, the NLP commmunity and the speech community... My goal is to build part of that bridge between the gap between theory and application during my PhD.
I am convinced that good research is not necessarily impactful, but impactful researches are usually dependent on excellent taste of topic, significant effort, bullet-proof writing, and necessary PR.
As I grow more experienced, I also think research itself greatly resembles value investing. Not only do we need to diversify our portfolio, we also need to put enough concentration in topics with growth. We don't have unlimited time and resources to spend, but we need to be patient and confident about whatever we chose to invest in. This process requires tremendous tenacity and a stable mindset to be able to stomach the up and downs. Never be arrogant, or you will get taught a lesson very soon!
I am currently on the academic job market for Fall 2023!

Robust Deep Learning

Multimodal Machine Learning

Audio/Speech Processing

Natural Language Processing

My Work

Recent Work

Towards Robust Large-scale audio/visual learning

Committee Members: Florian Metze (CMU, Meta), Shinji Watanabe (CMU), Emma Strubell (CMU), Daniel P Ellis (Google)

Audio Tagging Done Right

See the state-of-the-art of Audio Event Detection


On Adversarial Robustness of Large-scale Audio Visual Learning

Click the link below to check out the paper

Adversarial Music

Click the link below to check out the paper


Adversarial Camera Sticker

Click the links below to check out the related resources

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling

This work was collaboration with Yun Wang (Maigo), and it later became part of his thesis work. Check out the resources below:

Revisiting Disentanglement in VAE

Click to see our paper discussion.

times cited
best paper award
press coverage


Carnegie Mellon University
School of Computer Science
Language Technology Institute


Carnegie Mellon University
School of Computer Science
Language Technology Institute


Carnegie Mellon University College of Engineering


Carnegie Mellon University College of Engineering


Tongji University Image



Work Experience

Consultant 2022

Two Sigma Image

Quantitative Research: Curate evaluations and/or scoring of audio/video/imagery data, particularly those with economic impact.

Deep Learning Research Engineer 2018-2019

Bosch Center for Artificial Intelligence Image

Built up my theoretical background, trying to look at ML from a different angle.
- Applied robust machine learning algorithms to Bosch Autonomous driving project, improved system robustness by 50% in bad weather condition.
- Explored mulimodality embeddings to make use of multi-sensor input, trans- ferred the technology to Bosch business team.
- Developed occupancy detection solution using RGB-D sensor, and facilitated the transfer of technology to business unit.
- Applied representation learning to Bosch drier, improved energy efficiency by 5%.
- Generated 2 patents and top-tier AI conference publications.
- Mentored 2 interns and hired 5 members for the new team.

Research Intern 2014-2014

Pittsburgh Port Authority

Build website and manage database to visualize transportation data of Pittsburgh city( Recent 2 years), and analyze the data to provide optimization solutions to improve the current resource allocation.

My Specialty

My Skills

I have been coding ML and general software projects in the past 6 years.













Recent Blog

HTML5 Bootstrap Template by colorlib.com
Feb 18, 2022 | ML Blog |

Machine Learning Pointers

Everything I learned about Machine Learning (Updating)

HTML5 Bootstrap Template by colorlib.com
Aug 31, 2021 | Statistics | 4

Theory NoteBook

To Strenghthen my Theoretical Foundation (Updating)

HTML5 Bootstrap Template by colorlib.com
April 30, 2021 | Algorithm | 4

Coding Interviews

Going back to the basics (Updating)

HTML5 Bootstrap Template by colorlib.com
Feb 21, 2022 | LinearAlgebra | 4

Linear Algebra

Interesting things I learned about Linear Algebra (Updating)

What's New?

Here are some of my recent updates

The past 2 years were rough!

NeurIPS 2022

Will be presenting AudioMAE joint work with Bernie Huang at Meta.


Our paper won the Best Student Paper Award ! Here's the video of our presentation.

ISCSLP 2021 Tutorial

Tutorial on Robust Audio

Academic Paper Review


NeurIPS 2019 (Vancouver, BC)

This piece of music could stop Amazon Alexa from working --NewScientist

ICML 2019 (Long Beach, CA)

Video Presented at ICML 2019 about the adversarial camera sticker

ICASSP 2019 (Brighton, UK)

Slides Presented "A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling"

ICMR 2018 (Yokohama, Japan) Best Paper Award

Our paper won the Best Paper Award

ICASSP 2017 (New Orleans)

Presented two pieces of work: Environment Sound Classification and VGG for Sound

Get in Touch


Office 6513, Gates Hillman Center, 5000 Forbes Ave, Pittsburgh, PA 15217