Publications

Understanding the Role of the Projector in Knowledge Distillation

Published in Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-24), 2024

Explored a novel perspective of knowledge distillation through the training dynamics of the projector weights. We proposed a very simple distillation pipeline to attain a new state-of-the-art for the data efficient training of transformer models.

Recommended citation: Miles, R., & Mikolajczyk, K. (2023). Understanding the Role of the Projector in Knowledge Distillation. AAAI. https://arxiv.org/abs/2303.11098

MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23), 2023

Introduced a novel method for semi-supervised video object segmentation on mobile devices, using a unification of representation distillation and contrastive learning. We achieved competitive performance despite running up to ×5 faster, and with ×32 fewer parameters.

Recommended citation: Miles, R., & Kerim Yucel, M., & Manganelli, B., Saa-Garriga, A. (2021). MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation. CVPR. https://arxiv.org/abs/2303.07815

Information Theoretic Representation Distillation

Published in Proceedings of BMVA British Machine Vision Conference (BMVC’22), 2022

In this work, we proposed an information-theoretic setting for representation distillation. Using this framework, we introduce two novel distillation losses that are very simple and computationally inexpensive to adopt into most deep learning pipelines. We have shown the superiority of our approach compared to methods of similar computational costs on standard classification benchmarks. Furthermore, we have shown the applicability of our method to binary networks, whereby we begin to bridge the performance gap between full-precision and binary networks.

Recommended citation: Miles, R., Rodríguez, A. L., & Mikolajczyk, K. (2021). Information Theoretic Representation Distillation. BMVC https://bmvc2022.mpi-inf.mpg.de/0385.pdf

Reconstructing Pruned Filters using Cheap Spatial Transformations

Published in ICCV 2023 Workshop on Resource Efficient Deep Learning for Computer Vision, 2021

We present an efficient alternative to the convolutional layer using cheap spatial transformations. This construction exploits an inherent spatial redundancy of the learned convolutional filters to enable a much greater parameter efficiency, while maintaining the top-end accuracy of their dense counter-parts. Training these networks is modelled as a generalised pruning problem, whereby the pruned filters are replaced with cheap transformations from the set of non-pruned filters.

Recommended citation: Miles, R., & Mikolajczyk, K. (2021). Reconstructing Pruned Filters using Cheap Spatial Transformations. ICCVW. https://arxiv.org/abs/2110.12844

Compression of descriptor models for mobile applications

Published in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

In this paper we demonstrate the accuracy/performance tradeoffs of applying various factorisation and networks compression methods on CNN models used for local feature extraction. We have proposed a novel Convolution-DepthwisePointwise (CDP) layer that consists of a partitioned low and full rank decomposition of the weights that matches the naturally emergent structure of the pre-trained weights. The allocated dense connectivity for a subset of the input features helps maintain the top-end descriptor accuracy. We further demonstrate the generalisability of this idea onto large architectures, namely the SuperPoint model. In both cases, we are able to compress the models significantly, with minimal to no accuracy degradation

Recommended citation: Miles, R., & Mikolajczyk, K. (2021). Compression of descriptor models for mobile applications. ICASSP. https://ieeexplore.ieee.org/document/9414416

Cascaded channel pruning using hierarchical self-distillation

Published in Proceedings of BMVA British Machine Vision Conference (BMVC’20), 2020

In this paper, we propose an approach for filter-level pruning with hierarchical knowledge distillation based on the teacher, teaching-assistant, and student framework. Our method makes use of teaching assistants at intermediate pruning levels that share the same architecture and weights as the target student. We propose to prune each model independently using the gradient information from its corresponding teacher. By considering the relative sizes of each student-teacher pair, this formulation provides a natural trade-off between the capacity gap for knowledge distillation and the bias of the filter saliency updates. Our results show improvements in the attainable accuracy and model compression across the CIFAR10 and ImageNet classification tasks using the VGG16 and ResNet50 architectures.

Recommended citation: Miles, R., & Mikolajczyk, K. (2020). Cascaded channel pruning using hierarchical self-distillation. BMVC. https://www.bmvc2020-conference.com/assets/papers/0525.pdf

Roy Miles