Roy Miles

I am currently a Research Scientist at Huawei Noah's Ark UK. My research has been primarily on the topic of building efficient computer vision models and deploying them on resource constrained devices, such as mobile phones.

News

May 23, 2024	VeLoRA was accepted to NeurIPS 2024. [code]
Mar 1, 2024	Started a permanant role as a research scientist at Huawei!
Feb 26, 2024	VkD was accepted to CVPR 2024. [code / poster]
Dec 12, 2023	Passed my PhD with minor corrections [thesis, pdf]
Dec 9, 2023	SRD was accepted to AAAI 2024. [code, poster]
Jul 1, 2023	Started my internship at Huawei Noah's Ark Lab in the Computer Vision team.
Feb 27, 2023	MobileVOS was accepted to CVPR 2023. [poster]

Publications

Region-based Cluster Discrimination for Visual Representation Learning

Yin Xie, Kaicheng Yang, Xiang An, Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Roy Miles, Ismail Elezi and Jiankang Deng
In ICCV (highlight) 2025

Propose a novel Locality-Aware Cluster Contrastive Learning strategy. Our approach leverages local feature clustering and contrastive learning to improve the model's ability to understand and represent localized information.

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs

Yin Xie, Kaicheng Yang, Peirou Liang, Xiang An, Yongle Zhao, Yumeng Wang, Ziyong Feng, Roy Miles, Ismail Elezi, Jiankang Deng
In arXiv preprint 2025

Introduce a visual comprehension stage, which we call ViCToR (Visual Comprehension via Token Reconstruction), a novel pretraining framework for LMMs. ViCToR employs a learnable visual token pool and utilizes the Hungarian matching algorithm to select semantically relevant tokens from this pool for visual token replacement.

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

Roy Miles, Pradyumna Reddy, Ismail Elezi and Jiankang Deng
In NeurIPS 2024

Identify and characterise the important components needed for effective model convergence using gradient descent. In doing so we find that the intermediate activations used to implement backpropagation can be excessively compressed without incurring any degradation in performance.

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi and Jiankang Deng
In CVPR 2024

Proposed a novel constrained feature distillation method. This method is derived from a small set of core principles, which results in two emerging components: an orthogonal projection and a task-specific normalisation.

Understanding the Role of the Projector in Knowledge Distillation

Roy Miles and Krystian Mikolajczyk
In AAAI 2024

Revisit the efficacy of knowledge distillation as a function matching and metric learning problem. In doing so we verify three important design decisions, namely the normalisation, soft maximum function, and projection layers as key ingredients.

Learning to Project for Cross-Task Knowledge Distillation

Dylan Auty*, Roy Miles*, Benedikt Kolbeinsson*, Krystian Mikolajczyk
In BMVC 2024

Many KD methods prove ineffective when applied to this cross-task setting. To address this limitation, we propose a simple modification: the use of an inverted projection. We show that this drop-in replacement for a standard projector is effective by learning to disregard any task-specific features which might degrade the student's performance.

MobileVOS: Real-time Video Object Segmentation

Roy Miles, Mehmet Kerim Yucel, Bruno Manganelli and Albert Saa-Garriga
In CVPR 2024

Tackling the problem of semi-supervised video object segmentation on resource-constrained devices, such as mobile phones. We formulate this problem as a distillation task, whereby we demonstrate that small space-time-memory networks with finite memory can achieve competitive results with state of the art, but at a fraction of the computational cost.

Reconstructing Pruned Filters using Cheap Spatial Transformations

Roy Miles and Krystian Mikolajczyk
In ICCVW 2024

We present an efficient alternative to the convolutional layer using cheap spatial transformations. This construction exploits an inherent spatial redundancy of the learned convolutional filters to enable a much greater parameter efficiency, while maintaining the top-end accuracy of their dense counter-parts.

Information Theoretic Representation Distillation

Roy Miles*, Adrian Lopez Rodriguez*, Krystian Mikolajczyk
In BMVC 2022

Introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations.

Compression of Descriptor Models for Mobile Applications

Roy Miles, Krystian Mikolajczyk
In ICASSP 2021

This paper explicitly addresses the practical performance metrics of the state-of-the-art HardNet model. We observe a significant redundancy in the learned weights, which we exploit through the use of depthwise separable layers and an efficient Tucker decomposition.

Cascaded Channel Pruning using Hierarchical Self-Distillation

Roy Miles, Krystian Mikolajczyk
In BMVC 2020

Propose an approach for filter-level pruning with hierarchical knowledge distillation based on the teacher, teaching-assistant, and student framework. Our method makes use of teaching assistants at intermediate pruning levels that share the same architecture and weights as the target student.

See my Google Scholar for a full list.

Research and Industry Experience

Senior Research Scientist — Huawei Noah's Ark Lab

01/2023 — Present

Foundational research on knowledge distillation for vision-language models.
Involved in the recruitment and interviews stages for several permanent and internship positions.
Collaboration and leading several successful research projects.
Award for outstanding individual contributor in 2024.
Managed several interns with their topics on multi-modality learning.
Integrated and landed research on parameter efficient fine-tuning with the product camera teams.

Research Scientist Intern - Samsung Research UK

06/2022 — 01/2023

Semi-supervised video object segmentation for mobile devices
Novel unification of representation distillation and contrastive learning.
Achieved competitive performance despite running up to ×5 faster, and with ×32 fewer parameters.
Integrated research code into the Samsungs mobile product.
SRUK 2023 Best Paper Award for MobileVOS presented at CVPR. [blog]

Software Engineer Intern - Waymont Consulting

06/2017 — 09/2017

Developed a responsive web GUI with a C++ backend to automate signal processing tests.
Replaced previously manual testing system and increased efficiency by hundredfold.

Software Engineer Intern - Toshiba Research Europe

06/2016 — 09/2016

Designed out-of-tree blocks in C++ using the GNURadio signal processing environment.
Characterised and linearised a communication channel between two USRP devices.
Utilised the RC-5 Infrared standard for short-range and low-data rate IoT applications.

Software Engineer Intern - Toshiba Research Europe

06/2015 — 09/2015

Developed a GUI to operate a motor-driven positioner in conjunction with a VNA.
Measured various antenna patterns within an anechoic chamber.
VB.NET code to synchronize two X-Y positioners for analyzing propagation channels.

Education

PhD, Computer Vision and Machine Learning — Imperial College London

10/2018 — 12/2023

MEng, Electrical and Electronic Engineering — University of Bristol

09/2014 — 07/2018

Patents

Real-Time Video Object Segmentation. Patent #GB2626221. Samsung Research UK.
Region-aware Foundational Vision Model. Huawei Noah's Ark Lab UK. Patent Pending.
Visual Grounding for Vision-Language Models. Huawei Noah's Ark Lab UK. Patent Pending.
Training-Free Image Retouching. Huawei Noah's Ark Lab UK. Patent Pending.

Reviewing Experience

• ICCV 2025 • AAAI 2025 • PAMI 2023, 2024 • ECCV 2022 2024 • CVPR 2023, 2024, 2025 • NeurIPS 2023, 2024 • BMVC 2021, 2022 • Neurocomputing 2024 • ICSOC 2024 • Neural Networks 2025

NeurIPS 2024 outstanding reviewer

Download PDF Contact