Efficient Raytracing of Sparse Voxel Octrees on an FPGA
Title
Move your mouse over any of the terms in the title to get an explanation of the term.
Efficient
Efficient just makes the thesis sound better.. I'm just kidding. The thesis is a continuation of a semester project which was called "Ray Casting of Sparse Voxel Octrees on an FPGA". The project was more of a proof of concept, and the design was very inefficient. The master's thesis focused on taking the design from a simple, barely working, slow implementation, to a robust and efficient one. Substantial efforts were made to find how parameters like cache sizes and number of cores affected the speed and resource use of the design.
Raytracing
Raytracing is a technique for drawing 3D images. In computer games, this is done by your GPU in a process called rasterisation. Rasterisation is sometimes called a "bag of tricks", as it doesn't attempt to simulate the behaviour of light. Raytracing, on the other hand, simulates light by tracing light rays in reverse. You follow a light ray from the camera and out into your 3D world, like laser beams. You find where the beam hits, and sample the color at that point. You will usually also trace a ray to relevant light sources to find out how bright the point is.
Sparse Voxel Octrees
Voxels are to 3D models what pixels are to 2D images. They represent a tiny cube of the world. A Sparse Voxel Octree (SVO) is just a data structure for storing voxels in an efficient way. The basic idea is that most of our world is empty air, or invisible internals of walls, bodies, etc. Sparse Voxel Octrees attempts to minimize the amount of data you use to represent empty or invisible spaces. An Octree divides the space in eight equal sub-spaces, or octants. If an octant is non-empty, you recursively divide it into eight new octants. This continues untill the octant is as small as you want it.
FPGA
Field-Programmable Gate Arrays are devices that can be used to make prototypes of digital integrated circuits. They are essentially the stem cells of integrated circuits; they can be configure to become functionally equivelant to any digital circuit.
In my thesis I implemented the raytracing technique on an FPGA in order to show that it could plausibly be implemented in a GPU, so that we could have hardware to accelerate raytracing in games and other graphical applications.
Video
In the following video I give a quick presentation of the thesis, show the software I've been using for verification and debugging, and show the final system running on the FPGA.
Source
Software Workbench
The SVO workbench was used to investigate the algorithms for raytracing of SVOs. It is a very inefficient software implementation, as it was just written to be as identical to the FPGA implementation as possible. It contains code to generate SVOs. It reads the voxel models in the binvox format. It also implements raytracing/rasterisation hybrid rendering. It has no use as an end-user application, but if you're playing with raytracing of SVOs yourself, maybe the code could be of use to you.
Get source
FPGA Implementation
This project is based on the ORPSoC system-on-chip platform. Specifically the sandbox version of Stefan Kristiansson's version of ORPSoC for Digilent Atlys.
The final system can render a 640x480 image in about 350ms at 50MHz. There is still very much room for improvement, and the system is dependent on data generated by the software workbench.
Get sourceORLink
I needed to transfer data to/from the data bus on the FPGA. To accomplish this I used FPGALink. But I needed an interface for the Wishbone bus used by ORPSoC, so I wrote a custom module in Verilog. I also wrote a command-line tool in C to help program the FPGA, and transfer data to/from memory.
Get sourceIllustrations
The SoC module that performs the ray tracing algorithm. It consists of a slave module which is addressable by the CPU, and can be used to configure the module through a few registers. It will also send an interrupt to the CPU when the ray tracing task is done. The scheduler/controller unit with load ray data from memory and setup one of the cores to trace that ray. In this system there could be 1 to 4 cores which can work in parallell. The scheduler and the cores all share a single memory channel, and the memory controller is responsible for arbitrating the memory requests while caching the requests from the cores.
This illustration just shows that all requests from the cores goes through a cache, which cache octree data.
The core of the ray tracing module is a state machine which controls a collection of datapaths. With the current control scheme it requires at least three 32-bit comparators, seven 32-bit adders, and two counters. Although further pipelining is of course possible.
This shows the state machine which controls the core of the ray tracing module. Please see the thesis for a more thorough description of the state machine. This should give you a rough idea of the simplicity of the algorithm though.
The final setup used to test the ray tracing system. Sparse Voxels Octrees and pre-calculated ray data was generated on a PC. These were uploaded along to the memory on the FPGA prototype board, along with a small piece of OpenRISC software. Ray tracing was started by sending a signal over USART, and when the operation is done, the OpenRISC software will report the results back over USART, while the rendered image is displayed over an HDMI port.