GPU & CUDA : technical insights of a panoramic stitcher

VideoStitch leverages nVidia’s CUDA technology to be the fastest available stitching engine on the market.

If you have compared VideoStitch with previous generation tools, you have discovered that it is significantly faster (depending on your graphics card and settings it can be over 20x). And you will see no quality difference over existing solutions. While we think VideoStitch will lead the way of a new generation of fast stitching engines, choosing CUDA was not obvious when we began coding. Here is an insider rationale behind this choice.


CUDA is a framework for GPU computing. According to nVidia :

what's gpu computing ?

GPU computing is the use of a GPU (graphics processing unit) together with a CPU to accelerate general-purpose scientific and engineering applications.
GPU computing offers unprecedented application performance by offloading compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user’s perspective, applications simply run significantly faster.

CPU + GPU is a powerful combination because CPUs consist of a few cores optimized for serial processing, while GPUs consist of thousands of smaller, more efficient cores designed for parallel performance. Serial portions of the code run on the CPU while parallel portions run on the GPU :

multiple cores CPU vs GPU

The VideoStitch team is dedicated to open software. This means we try to get as much compatibility as we can with other softwares using open formats. VideoStitch also makes use of open-source softwares and has already proudly provided patches to improve them. We think openness can lead to better technology and competitive prices for our end users.

However the choice of CUDA has not been made in this perspective. CUDA is a proprietary technology. It is only available with nVidia graphics cards. There exist alternative technologies, such as OpenCL or OpenACC, but no one has proved to be both as mature and performing as CUDA. We didn’t want to make compromise on the performance, because our ultimate goal is to make real-time application with large resolutions. As technology evolves, be sure we’re keeping an eye and are concerned with open standards.



Performance is at the heart of our concerns. Stitching is a demanding process in terms of both CPU and memory consumption. The VideoStitch team has performance experts to optimize every side of it. We chose the most modern technologies to leverage the finest control over parallelism and memory allocations.

GPUs, with their thousands of cores, are really efficient when it comes to highly parallel tasks. It is well suited for repetitive image processing and numerical simulations whereas stitching is a complex set of mathematical algorithms. It means we had to be tricky and think of a new stitching-engine design to scale on such highly parallel devices. When we began coding, it was not obvious we could get so much improvement. But we got more and more enthusiastic as time went by.

VideoStitch changes the way you work. Working on the GPU leaves room for CPU work : you can render on GPU while keeping you CPU free for other tasks. Or you can launch a CPU task in the background and yet work on VideoStitch with good performances ! What’s more, some versions of VideoStitch can take advantage of several GPUs on your system (up to eight actually).

Still we make no compromise on quality. Most of the state-of-the-art algorithms can be found on the public domain, and the team is led by 360 experts who know what really makes the difference. What I can tell is that we plan to get more useful and fun features in the near future !


How to choose a graphics card

We wrote a page to help you choose the graphics card which fits your needs.
Please tell us what you think about this article and the resources we share with you. We’ll be pleased to hear your opinion !



Romain Bouqueau is a core engineer on the VideoStitch solution and an expert in high-performance computing.