Fun Projects

Robotics

Deep Reinforcement Learning

PROBLEM

Reinforcement Learning agents work well in simulation, but fail to transfer in the real world.

CONTRIBUTION

I showed by randomising everything in a simulation i.e. textures, lighting, colours, etc and adding noise to sensor data; We can train a more robust deep Learning model deployable even in the real-world.

Paper published in collaboration with

Depth Estimation

3D Computer Vision

PROBLEM

Many deep learning models have been proposed for 3D reconstruction from a video. However they are all temporally and geometrically inconsistent; resulting in flicker artefacts.

CONTRIBUTION

Created a novel video representation learning model, that learns to estimate 3D scene flow, depth and camera pose for joint optimisation; all from a standard video input. Trained in a self-supervised way (no ground truth required).

Production Pipeline

Computer Version

PROBLEM

Autonomous driving vehicles require multiple deep learning models to run in parallel for guiding vehicle motion. Doing it in an efficient way is paramount for deployment

CONTRIBUTION

Created Hitachi's multi-tasking Computer Vision pipeline. It performed monocular depth estimation, object detection, tracking and privacy preservation all operating at 60fps at 720p in the cloud. We also presented to and raised £100K grant from the UK Department of transport for further project support

Representation Learning

Visual Representation Learning

PROBLEM

Self-supervised representation learning has improved many visual transformer model's downstream performance. However very few works explore using high-level semantics for learning representations.

CONTRIBUTION

Proposed a new loss function and model architecture to improve a computer vision model's semantic understanding using generative modelling. Resulting in state-of-the-art downstream performance on ImageNet classification and COCO object detection.
Publication Accepted at CVPR 2023

Generative Modelling

Vision Language Models

PROBLEM

Inspired by OpenAI's CLIP and Dall-E, I wanted to create a similar model capable of generating high quality art given a description as input. Despite having fraction of the resources and training data.

CONTRIBUTION

To that extent I created a text-to-image art generative model by piecing together popular works from visual-language learning (like CLIP), generative modelling (like VQ-GAN), super-resolution (like SwinIR) and depth modelling (like MiDAS). Trained on Artstation data for just 3 days, achieving high semantic quality output art