Fun Projects
Robotics
Deep Reinforcement Learning
PROBLEM
Reinforcement Learning agents work well in simulation, but fail to transfer in the real world.
CONTRIBUTION
I showed by randomising everything in a simulation i.e. textures, lighting, colours, etc and adding noise to sensor data; We can train a more robust deep Learning model deployable even in the real-world.
Paper published in collaboration with
Depth Estimation
3D Computer Vision
PROBLEM
Many deep learning models have been proposed for 3D reconstruction from a video. However they are all temporally and geometrically inconsistent; resulting in flicker artefacts.
CONTRIBUTION
Created a novel video representation learning model, that learns to estimate 3D scene flow, depth and camera pose for joint optimisation; all from a standard video input. Trained in a self-supervised way (no ground truth required).
Production Pipeline
Computer Version
PROBLEM
Autonomous driving vehicles require multiple deep learning models to run in parallel for guiding vehicle motion. Doing it in an efficient way is paramount for deployment
CONTRIBUTION
Created Hitachi's multi-tasking Computer Vision pipeline. It performed monocular depth estimation, object detection, tracking and privacy preservation all operating at 60fps at 720p in the cloud. We also presented to and raised £100K grant from the UK Department of transport for further project support
Representation Learning
Visual Representation Learning
PROBLEM
Self-supervised representation learning has improved many visual transformer model's downstream performance. However very few works explore using high-level semantics for learning representations.
CONTRIBUTION
Proposed a new loss function and model architecture to improve a computer vision model's semantic understanding using generative modelling. Resulting in state-of-the-art downstream performance on ImageNet classification and COCO object detection.
Publication Accepted at CVPR 2023
Generative Modelling
Vision Language Models
PROBLEM
Inspired by OpenAI's CLIP and Dall-E, I wanted to create a similar model capable of generating high quality art given a description as input. Despite having fraction of the resources and training data.
CONTRIBUTION
To that extent I created a text-to-image art generative model by piecing together popular works from visual-language learning (like CLIP), generative modelling (like VQ-GAN), super-resolution (like SwinIR) and depth modelling (like MiDAS). Trained on Artstation data for just 3 days, achieving high semantic quality output art





