- Neural Network Ensembles, L.K. Hansen, P. Salamon, 1990
- Neural Network Ensembles, Cross Validation, and Active Learning, Andres Krogh, Jesper Vedelsby, 1995
- Combining labeled and unlabeled data with co-training, A. Blum, T. Mitchell, 1998
- Ensemble Methods in Machine Learning, Thomas G. Dietterich, 2000
- Model Compression, Rich Caruana, 2006
- Learning with Pseudo-Ensembles, Philip Bachman, Ouais Alsharif, Doina Precup, 2014
- Dark knowledge, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2014
- Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, NIPS 2014 Workshop
- Distilling Model Knowledge, George Papamakarios, 2015
- Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization, Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal, 2015
- Learning Using Privileged Information: Similarity Control and Knowledge Transfer, Vladimir Vapnik, Rauf Izmailov, 2015
- FitNets: Hints for Thin Deep Nets, Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, ICLR 2015
- Adapting Models to Signal Degradation using Distillation, Jong-Chyi Su, Subhransu Maji, 2016
- Sequence-Level Knowledge Distillation, deeplearning-papernotes, Yoon Kim, Alexander M. Rush, EMNLP 2016
- Knowledge Distillation for Small-footprint Highway Networks, Liang Lu, Michelle Guo, Steve Renals, 2016
- Deep Model Compression: Distilling Knowledge from Noisy Teachers, Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016
- Cross Modal Distillation for Supervision Transfer, Saurabh Gupta, Judy Hoffman, Jitendra Malik, CVPR 2016
- Unifying distillation and privileged information, David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, ICLR 2016
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks, Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, IEEE S&P 2016
- MobileID: Face Model Compression by Distilling Knowledge from Neurons, Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang, Xiaoou Tang, AAAI 2016
- Recurrent Neural Network Training with Dark Knowledge Transfer, Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Antti Tarvainen, Harri Valpola, NeurIPS 2017
- Learning from Noisy Labels with Distillation, Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li, ICCV 2017
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, Zehao Huang, Naiyan Wang, 2017
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017
- Revisiting knowledge transfer for training object class detectors, Jasper Uijlings, Stefan Popov, Vittorio Ferrari, 2017
- Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net, Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks, Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017
- Knowledge Projection for Deep Neural Networks, Zhi Zhang, Guanghan Ning, Zhihai He, 2017
- Moonshine: Distilling with Cheap Convolutions, Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017
- Distilling a Neural Network Into a Soft Decision Tree, Nicholas Frosst, Geoffrey Hinton, 2017
- Do deep convolutional nets really need to be deep and convolutional?, Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, ICLR 2017
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, Sergey Zagoruyko, Nikos Komodakis, ICLR 2017
- Data-Free Knowledge Distillation for Deep Neural Networks, Raphael Gontijo Lopes, Stefano Fenu, Thad Starner, 2017
- Local Affine Approximators for Improving Knowledge Transfer, Suraj Srinivas and Francois Fleuret, 2017
- Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model, Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra, NeurIPS 2017
- Learning Efficient Object Detection Models with Knowledge Distillation, Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, Manmohan Chandraker, NeurIPS 2017
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning, Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, CVPR 2017
- Efficient Neural Architecture Search via Parameters Sharing, Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, ICML 2018
- Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge, Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton, 2018
- Defensive Collaborative Multi-task Training - Defending against Adversarial Attack towards Deep Neural Networks, Derek Wang, Chaoran Li, Sheng Wen, Yang Xiang, Wanlei Zhou, Surya Nepal, 2018
- Deep Co-Training for Semi-Supervised Image Recognition, Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, Alan Yuille, ECCV 2018
- Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples, Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2018
- Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling, Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang, 2018
- Large scale distributed neural network training through online distillation, Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton, ICLR 2018
- Born Again Neural Networks, Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, ICML 2018
- Knowledge Distillation by On-the-Fly Native Ensemble, Xu Lan, Xiatian Zhu, Shaogang Gong, NeurIPS 2018
- Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection, Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan, ACM MM 2018
- YASENN: Explaining Neural Networks via Partitioning Activation Sequences, Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018
- Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks, Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy, 2018
- A Generalized Meta-loss function for regression and classification using privileged information, Amina Asif, Muhammad Dawood, Fayyaz ul Amir Afsar Minhas, 2018
- Learning Transferable Architectures for Scalable Image Recognition, Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le, CVPR 2018
- Data Distillation: Towards Omni-Supervised Learning, Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, CVPR 2018
- Parallel WaveNet: Fast High-Fidelity Speech Synthesis, Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, ICML 2018
- Deep Mutual Learning, Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, CVPR 2018
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation, Sarah Tan, Rich Caruana, Giles Hooker, Yin Lou, 2018
- Self-supervised knowledge distillation using singular value decomposition, Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song, ECCV 2018
- KDGAN: Knowledge Distillation with Generative Adversarial Networks, Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi, NeurIPS 2018
- Efficient Video Classification Using Fewer Frames, Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra, 2019
- Knowledge Adaptation for Efficient Semantic Segmentation, Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019
- Structured Knowledge Distillation for Semantic Segmentation, Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang, CVPR 2019
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks, Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, 2019
- Relational Knowledge Distillation, Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, CVPR 2019
- A Comprehensive Overhaul of Feature Distillation, Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi, ICCV 2019
- Learning Metrics from Teachers: Compact Networks for Image Embedding, Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa, 2019
- Variational Information Distillation for Knowledge Transfer, Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai, CVPR 2019
- Knowledge Distillation via Route Constrained Optimization, Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu, ICCV 2019
- Knowledge Squeezed Adversarial Network Compression, Shu Changyong, Li Peng, Xie Yuan, Qu Yanyun, Dai Longquan, Ma Lizhuang, 2019
- Knowledge Flow: Improve Upon Your Teachers, Iou-Jen Liu, Jian Peng, Alexander G. Schwing, 2019
- Correlation Congruence for Knowledge Distillation, Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang, ICCV 2019
- Data-Free Learning of Student Networks, Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian, ICCV 2019
- Ensemble Distribution Distillation, Andrey Malinin, Bruno Mlodozeniec, Mark Gales, 2019
- Zero-Shot Knowledge Distillation in Deep Networks, Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty, ICML 2019
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation, Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma, ICCV 2019
- Deep Face Recognition Model Compression via Knowledge Transfer and Distillation, Jayashree Karlekar, Jiashi Feng, Zi Sian Wong, Sugiri Pranata, 2019
- Distilling Object Detectors with Fine-grained Feature Imitation, Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng, CVPR 2019
- When Does Label Smoothing Help?, Rafael Müller, Simon Kornblith, Geoffrey Hinton, NeurIPS 2019
- Graph-based Knowledge Distillation by Multi-head Attention Network, Seunghyun Lee, Byung Cheol Song, 2019
- Similarity-Preserving Knowledge Distillation, Frederick Tung, Greg Mori, ICCV 2019
- BAM! Born-Again Multi-Task Networks for Natural Language Understanding, Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le, ACL 2019
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation, Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy, ICCV 2019
- Self-Knowledge Distillation in Natural Language Processing, Sangchul Hahn, Heeyoul Choi, 2019
- Patient Knowledge Distillation for BERT Model Compression, Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu, EMNLP 2019
- Positive-Unlabeled Compression on the Cloud, Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, Chunjing Xu, Dacheng Tao, Chang Xu, 2019
- On the Efficacy of Knowledge Distillation, Jang Hyun Cho, Bharath Hariharan, 2019
- Improving Generalization and Robustness with Noisy Collaboration in Knowledge Distillation, Elahe Arani, Fahad Sarfraz, Bahram Zonooz, 2019
- Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework, Srinidhi Hegde, Ranjitha Prasad, Ramya Hebbalaguppe, Vishwajith Kumar, 2019
- Knowledge Distillation from Internal Representations, Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo, 2019
- Rethinking Data Augmentation: Self-Supervision and Self-Distillation, Hankook Lee, Sung Ju Hwang, Jinwoo Shin, 2019
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf, NeurIPS 2019 EMC^2 Workshop
- Preparing Lessons: Improve Knowledge Distillation with Better Supervision, Tiancheng Wen, Shenqi Lai, Xueming Qian, 2019
- Stagewise Knowledge Distillation, Akshay Kulkarni, Navid Panchi, Shital Chiddarwar, 2019
- Graph Representation Learning via Multi-task Knowledge Distillation, Jiaqi Ma, Qiaozhu Mei, 2019
- Deep geometric knowledge distillation with graphs, Carlos Lassance, Myriam Bontonou, Ghouthi Boukli Hacene, Vincent Gripon, Jian Tang, Antonio Ortega, 2019
- MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks, Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai, 2019
- The State of Knowledge Distillation for Classification, Fabian Ruffy, Karanbir Chahal, 2019
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary, Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, AAAI 2019
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons, Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, AAAI 2019
- Dataset Distillation, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros, ICLR 2019
- Fast Human Pose Estimation, Feng Zhang, Xiatian Zhu, Mao Ye, CVPR 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning, Zhiqiang Shen, Zhankui He, Xiangyang Xue, AAAI 2019
- Distillation-Based Training for Multi-Exit Architectures, Mary Phuong, Christoph H. Lampert, ICCV 2019
- Knowledge Distillation via Instance Relationship Graph, Yufan Liu, Jiajiong Cao, Bing Li, Chunfeng Yuan, Weiming Hu, Yangxi Li, Yunqiang Duan, CVPR 2019
- Retaining Privileged Information for Multi-Task Learning, Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou, Li-Wei Lehman, KDD 2019
- Residual Knowledge Distillation, Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy, 2020
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing, Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, EMNLP 2020
- Subclass Distillation, Rafael Müller, Simon Kornblith, Geoffrey Hinton, 2020
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers, Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou, NeurIPS 2020
- Regularizing Class-wise Predictions via Self-knowledge Distillation, Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin, CVPR 2020
- GAN Compression: Efficient Architectures for Interactive Conditional GANs, Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han, CVPR 2020
- Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks, Lin Wang, Kuk-Jin Yoon, 2020
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices, Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou, ACL 2020
- FastBERT: a Self-distilling BERT with Adaptive Inference Time, Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju, ACL 2020
- Channel Distillation: Channel-Wise Attention for Knowledge Distillation, Zaida Zhou, Chaoran Zhuge, Xinwei Guan, Wen Liu, 2020
- ResKD: Residual-Guided Knowledge Distillation, Xuewei Li, Songyuan Li, Bourahla Omar, Fei Wu, Xi Li, 2020
- Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko, NeurIPS 2020
- Knowledge Distillation Meets Self-Supervision, Guodong Xu, Ziwei Liu, Xiaoxiao Li, Chen Change Loy, ECCV 2020
- MGD: Matching Guided Distillation, Kaiyu Yue, Jiangfan Deng, Feng Zhou, ECCV 2020
- MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks, Zhiqiang Shen, Marios Savvides, 2020
- Reducing the Teacher-Student Gap via Spherical Knowledge Distillation, Jia Guo, Minghao Chen, Yao Hu, Chen Zhu, Xiaofei He, Deng Cai, 2020
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher, Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh, AAAI 2020
- Contrastive Representation Distillation, Yonglong Tian, Dilip Krishnan, Phillip Isola, ICLR 2020
- Revisit Knowledge Distillation: a Teacher-free Framework, Li Yuan, Francis E.H.Tay, Guilin Li, Tao Wang, Jiashi Feng, CVPR 2020
- Self-training with Noisy Student improves ImageNet classification, Qizhe Xie, Eduard Hovy, Minh-Thang Luong, Quoc V. Le, CVPR 2020
- TinyBERT: Distilling BERT for Natural Language Understanding, Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu, Findings of EMNLP 2020
- Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion, Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz, CVPR 2020
- General Instance Distillation for Object Detection, Xing Dai, Zeren Jiang, Zhao Wu, Yiping Bao, Zhicheng Wang, Si Liu, Erjin Zhou, CVPR 2021
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation, Mingi Ji, Seungjae Shin, Seunghyun Hwang, Gibeom Park, Il-Chul Moon, CVPR 2021
- Complementary Relation Contrastive Distillation, Jinguo Zhu, Shixiang Tang, Dapeng Chen, Shijie Yu, Yakun Liu, Aijun Yang, Mingzhe Rong, Xiaohua Wang, CVPR 2021
- Emerging Properties in Self-Supervised Vision Transformers (DINO), Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin, ICCV 2021
- MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis, Sergei Belousov, 2021
- Distilling Knowledge via Knowledge Review, Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia, CVPR 2021
- Hierarchical Self-supervised Augmented Knowledge Distillation, Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu, IJCAI 2021
- Causal Distillation for Language Models, Zhengxuan Wu, Atticus Geiger, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman, 2021
- Training data-efficient image transformers & distillation through attention (DeiT), Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou, ICML 2021
- Cross-Layer Distillation with Semantic Calibration, Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Yan Feng, Chun Chen, AAAI 2021
- Exploring Simple Siamese Representation Learning, Xinlei Chen, Kaiming He, CVPR 2021
- Channel-wise Knowledge Distillation for Dense Prediction, Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen, ICCV 2021
- MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers, Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei, Findings of ACL-IJCNLP 2021
- Knowledge Distillation: A Survey, Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao, 2021
- data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language, Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli, ICML 2022
- Progressive Distillation for Fast Sampling of Diffusion Models, Tim Salimans, Jonathan Ho, ICLR 2022
- How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting, Alessio Monti, Angelo Porrello, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara, 2022
- Decoupled Knowledge Distillation, Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang, CVPR 2022
- Knowledge Distillation with the Reused Teacher Classifier (SimKD), Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen, CVPR 2022
- Masked Generative Distillation, Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan, ECCV 2022
- Knowledge Distillation from A Stronger Teacher, Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu, NeurIPS 2022
- Localization Distillation for Dense Object Detection, Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Zuo, Qibin Hou, Ming-Ming Cheng, CVPR 2022
- iBOT: Image BERT Pre-Training with Online Tokenizer, Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong, ICLR 2022
- Focal and Global Knowledge Distillation for Detectors, Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan, CVPR 2022
- Symbolic Knowledge Distillation: from General Language Models to Commonsense Models, Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi, NAACL 2022
- Information Theoretic Representation Distillation, Roy Miles, Adrian Lopez Rodriguez, Krystian Mikolajczyk, BMVC 2022
- Consistency Models, Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever, ICML 2023
- TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation, David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu, 2023
- Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister, Findings of ACL 2023
- UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition, Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon, 2023
- Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao, 2023
- Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models, Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel, 2023
- MobileSAMv2: Faster Segment Anything to Everything, Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, Choong Seon Hong, 2023
- On Distillation of Guided Diffusion Models, Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans, CVPR 2023
- Considerations When Learning Additive Explanations for Black-Box Models, Sarah Tan, Giles Hooker, Paul Koch, Albert Gordo, Rich Caruana, 2023
- Curriculum Temperature for Knowledge Distillation, Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang, AAAI 2023
- Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping, Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, Tat-Jen Cham, 2024
- Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation, Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang, ICML 2024
- Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation, Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet, 2024
- Improved Distribution Matching Distillation for Fast Image Synthesis, Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, NeurIPS 2024
- Transferring Knowledge from Large Foundation Models to Small Downstream Models, Shikai Qiu, Boran Han, Danielle C. Maddix, Shuai Zhang, Yuyang Wang, Andrew Gordon Wilson, 2024
- DεpS: Delayed ε-Shrinking for Faster Once-For-All Training, Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov, 2024
- Simple Unsupervised Knowledge Distillation With Space Similarity, Aditya Singh, Haohan Wang, 2024
- Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment, Tianyu Peng, Jiajun Zhang, 2024
- Generative Prompt Internalization, Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo, 2024
- ScaleKD: Strong Vision Transformers Could Be Excellent Teachers, Jiawei Fan, Chao Li, Xiaolong Liu, Anbang Yao, NeurIPS 2024
- Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation, Jiaming Lv, Haoyuan Yang, Peihua Li, NeurIPS 2024
- Adversarial Diffusion Distillation, Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach, ECCV 2024
- One-step Diffusion with Distribution Matching Distillation, Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park, CVPR 2024
- MiniLLM: On-Policy Distillation of Large Language Models, Yuxian Gu, Li Dong, Furu Wei, Minlie Huang, ICLR 2024
- On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes, Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, Olivier Bachem, ICLR 2024
- Improved Techniques for Training Consistency Models, Yang Song, Prafulla Dhariwal, ICLR 2024
- Logit Standardization in Knowledge Distillation, Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, Xiaochun Cao, CVPR 2024
- CLIP-KD: An Empirical Study of CLIP Model Distillation, Chuanguang Yang, Zhulin An, Libo Huang, Junyu Bi, Xinqiang Yu, Han Yang, Boyu Diao, Yongjun Xu, CVPR 2024
- VkD : Improving Knowledge Distillation using Orthogonal Projections, Roy Miles, Ismail Elezi, Jiankang Deng, CVPR 2024
- Understanding the Role of the Projector in Knowledge Distillation, Roy Miles, Krystian Mikolajczyk, AAAI 2024
- Precision Shaking and DORPO: Conceptual Foundations of LLM Knowledge Distillation Methods, Áron Cserveni, 2024
- MaKD: Multi-aspect Knowledge Distillation with Large Language Model, Taegyeong Lee, et al., 2025
- A Comprehensive Survey on Knowledge Distillation, Amir M. Mansourian, Rozhan Ahmadi, Masoud Ghafouri, Amir Mohammad Babaei, Elaheh Badali Golezani, Zeynab Yasamani Ghamchi, Vida Ramezanian, Alireza Taherian, Kimia Dinashi, Amirali Miri, Shohreh Kasaei, 2025
- Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching, Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti, 2025
- Autoregressive Distillation of Diffusion Transformers, Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, 2025
- Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions, Luyang Fang, Xiaowei Yu, Jiazhang Cai, Yongkai Chen, Shushan Wu, Zhengliang Liu, Zhenyuan Yang, Haoran Lu, Xilin Gong, Yufang Liu, Terry Ma, Wei Ruan, Ali Abbasi, Jing Zhang, Tao Wang, Ehsan Latif, Weihang You, Hanqi Jiang, Wei Liu, Wei Zhang, Soheil Kolouri, Xiaoming Zhai, Dajiang Zhu, Wenxuan Zhong, Tianming Liu, Ping Ma, 2025
- CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation, Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu, 2025
- Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models, Tiezheng Zhang, Yitong Li, Yu-cheng Chou, Jieneng Chen, Alan Yuille, Chen Wei, Junfei Xiao, 2025
- Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations, Faisal Hamman, Pasan Dissanayake, Yanjun Fu, Sanghamitra Dutta, 2025
- EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data, Grégoire Petit, Nathan Palluau, Axel Bauer, Clemens Dlaska, 2025
- Scaling Reasoning Efficiently via Relaxed On-Policy Distillation, Jongwoo Ko, Sara Abdali, Young Jin Kim, Tianyi Chen, Pashmina Cameron, 2026
- An Empirical Study of Knowledge Distillation for Code Understanding Tasks, Ruiqi Wang, Zezhou Yang, Cuiyun Gao, Xin Xia, Qing Liao, 2026
- To Distill or Not to Distill: When Knowledge Transfer Undermines Safety of LLMs, 2026
- Dark knowledge, https://www.ttic.edu/dls-2014-2015/, Geoffrey Hinton, 2014
- Model Compression, Rich Caruana, 2016