Auspiciously Wild
What if we build our virtual worlds out of grains of sand rather than sheets of paper? Woul it be efficient if we build hardware expressly for that task? Could we play with sand and paper at the same time?

Efficient Raytracing of Sparse Voxel Octrees on an FPGA

Title

Move your mouse over any of the terms in the title to get an explanation of the term.

Efficient

Efficient just makes the thesis sound better.. I'm just kidding. The thesis is a continuation of a semester project which was called "Ray Casting of Sparse Voxel Octrees on an FPGA". The project was more of a proof of concept, and the design was very inefficient. The master's thesis focused on taking the design from a simple, barely working, slow implementation, to a robust and efficient one. Substantial efforts were made to find how parameters like cache sizes and number of cores affected the speed and resource use of the design.

Raytracing

Raytracing is a technique for drawing 3D images. In computer games, this is done by your GPU in a process called rasterisation. Rasterisation is sometimes called a "bag of tricks", as it doesn't attempt to simulate the behaviour of light. Raytracing, on the other hand, simulates light by tracing light rays in reverse. You follow a light ray from the camera and out into your 3D world, like laser beams. You find where the beam hits, and sample the color at that point. You will usually also trace a ray to relevant light sources to find out how bright the point is.

Sparse Voxel Octrees

Voxels are to 3D models what pixels are to 2D images. They represent a tiny cube of the world. A Sparse Voxel Octree (SVO) is just a data structure for storing voxels in an efficient way. The basic idea is that most of our world is empty air, or invisible internals of walls, bodies, etc. Sparse Voxel Octrees attempts to minimize the amount of data you use to represent empty or invisible spaces. An Octree divides the space in eight equal sub-spaces, or octants. If an octant is non-empty, you recursively divide it into eight new octants. This continues untill the octant is as small as you want it.

FPGA

Field-Programmable Gate Arrays are devices that can be used to make prototypes of digital integrated circuits. They are essentially the stem cells of integrated circuits; they can be configure to become functionally equivelant to any digital circuit.

In my thesis I implemented the raytracing technique on an FPGA in order to show that it could plausibly be implemented in a GPU, so that we could have hardware to accelerate raytracing in games and other graphical applications.

Video

In the following video I give a quick presentation of the thesis, show the software I've been using for verification and debugging, and show the final system running on the FPGA.

Get Thesis

Source

Software Workbench

The SVO workbench was used to investigate the algorithms for raytracing of SVOs. It is a very inefficient software implementation, as it was just written to be as identical to the FPGA implementation as possible. It contains code to generate SVOs. It reads the voxel models in the binvox format. It also implements raytracing/rasterisation hybrid rendering. It has no use as an end-user application, but if you're playing with raytracing of SVOs yourself, maybe the code could be of use to you.

Get source

FPGA Implementation

This project is based on the ORPSoC system-on-chip platform. Specifically the sandbox version of Stefan Kristiansson's version of ORPSoC for Digilent Atlys.

The final system can render a 640x480 image in about 350ms at 50MHz. There is still very much room for improvement, and the system is dependent on data generated by the software workbench.

Get source

ORLink

I needed to transfer data to/from the data bus on the FPGA. To accomplish this I used FPGALink. But I needed an interface for the Wishbone bus used by ORPSoC, so I wrote a custom module in Verilog. I also wrote a command-line tool in C to help program the FPGA, and transfer data to/from memory.

Get source

Illustrations

The SoC module that performs the ray tracing algorithm. It consists of a slave module which is addressable by the CPU, and can be used to configure the module through a few registers. It will also send an interrupt to the CPU when the ray tracing task is done. The scheduler/controller unit with load ray data from memory and setup one of the cores to trace that ray. In this system there could be 1 to 4 cores which can work in parallell. The scheduler and the cores all share a single memory channel, and the memory controller is responsible for arbitrating the memory requests while caching the requests from the cores.

This illustration just shows that all requests from the cores goes through a cache, which cache octree data.

The core of the ray tracing module is a state machine which controls a collection of datapaths. With the current control scheme it requires at least three 32-bit comparators, seven 32-bit adders, and two counters. Although further pipelining is of course possible.

This shows the state machine which controls the core of the ray tracing module. Please see the thesis for a more thorough description of the state machine. This should give you a rough idea of the simplicity of the algorithm though.

The final setup used to test the ray tracing system. Sparse Voxels Octrees and pre-calculated ray data was generated on a PC. These were uploaded along to the memory on the FPGA prototype board, along with a small piece of OpenRISC software. Ray tracing was started by sending a signal over USART, and when the operation is done, the OpenRISC software will report the results back over USART, while the rendered image is displayed over an HDMI port.

This website uses many experimental browser features. Newest version of Chrome, Opera, Safari, or Firefox recommended.