CUDA Path Tracer

Github Repo
Tested on: Windows 10, AMD Ryzen 5800 HS with Radeon Graphics CPU @ 3.20GHz 16GB, NVIDIA GeForce RTX3060 Laptop 8GB

Implemeted Feature

Demos

Emissive Robot Car

Metal Bunny
Texture Mapping & Bump Mapping

Multiple Robots (Depth of Field)

Indoor Scene

Video Demo

`gltf` Load

In this pathtracer, supported scene format is gltf for its high expressive capability of 3D scenes. Please view this page for more details about gltf.

Eventually, during development, most scenes used for testing is directly exported from Blender. This enables a much higher flexibility for testing.

scenes/pathtracer_robots_demo.glb Link

Alt text

BVH

On host, we can construct and traverse BVH recursively. While in this project, our code run on GPU. Though recent cuda update allows recursive function execution on device, we cannot take that risk as raytracer is very performance-oriented. Recursive execution will slow down the kernel function, as it may bring dynamic stack size.

Thanks to this paper, a novel BVH constructing and traversing algorithm called MTBVH is adopted in this pathtracer. This method is stack-free.

This pathtracer only implements a simple version of MTBVH. Instead of constructing 6 BVHs and traversing one of them at runtime, only 1 BVH is constructed. It implies that this pathtracer still has the potential of speeding up.

With BVH & Without BVH:

With BVH	Without BVH

As expected, speedup is huge up to 40 times. With a more complex scene, BVH should give a higher speedup.

Texture Mapping & Bump Mapping

To enhance the details of mesh surfaces and gemometries, texture mapping is a must. Here we have not implemented mipmap on GPU, though it should not be that difficult to do so.

scenes/pathtracer_test_texture.glb Link

Before bump mapping	After bump mapping

Microfact BSDF

To use various material, bsdfs that are more complicated than diffuse/specular are required. Here, we will first implement the classic microfacet BSDF to extend the capability of material in this pathtracer.

This pathtracer uses the Microfacet implementation basd on pbrt.

Metallness = 1. Roughness 0 to 1 from left to right.

Please note that the sphere used here is not an actual sphere but an icosphere.

scenes/pathtracer_test_microfacet.glb Link

Microfacet Demo

With texture mapping implemented, we can use metallicRoughness texture now. Luckily, gltf has a good support over metallic workflow.

scenes/pathtracer_robot.glb Link

Metallic Workflow Demo

Direct Lighting & MIS

To stress the speed up of convergence in MIS, Russian-Roulette is disabled in this part’s rendering.

The tiny dark stripe is visible in some rendering result. This is because by default we do not allow double-sided lighting in this pathtracer.

By default, number of light sample is set to 3.

When sampling for the direction of next bounce, we have adopted importance sampling for bsdf for most of the time. It enhances the convergence speed for specular materials, as the sampling strategies greatly aligned with the expected radiance distribution on hemisphere. However, for diffuse/matte surfaces, this sampling strategies can be optimized, as the most affecting factors for the radiance distribution of these sort of materials is light instead of outgoing rays. Thus, sampling from light is also a valuable strategy to speedup convergence speed of raytracing rough surfaces.

In this demo scene, 3 metal plane are allocated with 4 cube lights. When we only sample bsdf, we can see that the expected radiance on the surface of metal plane converges. When we only sample light, we can see how the rougher part of the scene, the back white wall, has better converging speed. Hence, we are looking forward to a sampling strategy that combines the advantages of these two, which is multiple importance sampling.

scenes/pathtracer_mis_demo.glb Link

Only sample bsdf 500spp	Only sample light 500spp	MIS 500spp

To see more details about this part, see this part of pbrt or this post of mine.

Test on bunny scene. Faster convergence speed can be observed.

scenes/pathtracer_bunny_mis.glb Link

Without MIS 256spp	With MIS 256spp

Without MIS 5k spp	With MIS 5k spp

Depth of Field

In depth of field, we define two variables. focal_length & aperture.

More details can be viewed in this post.

Depth of Field (Aperture=0.3)

Future (If possible)

Cuda Side

More cuda optimization
- Bank conflict
- Loop unroll
  - Light sample loop (if multiple light rays)
- Higher parallelism (Use streams?)
Tile-based raytracing
- Potentially, it should increase the rendering speed, as it will maximize locallity within one pixel/tile. No more realtime camera movement though.