CUDA Path Tracer
- Github Repo
- Tested on: Windows 10, AMD Ryzen 5800 HS with Radeon Graphics CPU @ 3.20GHz 16GB, NVIDIA GeForce RTX3060 Laptop 8GB
Implemeted Feature
- Core
- Stream Compaction
- Diffuse & Specular
- Jittering (Antialiasing)
- First Bounce Cache
- Sort by material
- Load gltf
- BVH && SAH
- Texture mapping & bump mapping
- Environment Mapping
- Microfacet BSDF
- Emissive BSDF (with Emissive Texture)
- Direct Lighting
- Multiple Importance Sampling
- Depth of Field
- Tone mapping && Gamma Correction
Demos
- Emissive Robot Car
-
Metal Bunny
-
Texture Mapping & Bump Mapping
- Multiple Robots (Depth of Field)
- Indoor Scene
- Video Demo
gltf
Load
In this pathtracer, supported scene format is gltf
for its high expressive capability of 3D scenes. Please view this page for more details about gltf.
Eventually, during development, most scenes used for testing is directly exported from Blender. This enables a much higher flexibility for testing.
scenes/pathtracer_robots_demo.glb
Link
BVH
On host, we can construct and traverse BVH recursively. While in this project, our code run on GPU. Though recent cuda update allows recursive function execution on device, we cannot take that risk as raytracer is very performance-oriented. Recursive execution will slow down the kernel function, as it may bring dynamic stack size.
Thanks to this paper, a novel BVH constructing and traversing algorithm called MTBVH is adopted in this pathtracer. This method is stack-free.
This pathtracer only implements a simple version of MTBVH. Instead of constructing 6 BVHs and traversing one of them at runtime, only 1 BVH is constructed. It implies that this pathtracer still has the potential of speeding up.
- With BVH & Without BVH:
With BVH | Without BVH |
---|---|
![]() |
![]() |
As expected, speedup is huge up to 40 times. With a more complex scene, BVH should give a higher speedup.
Texture Mapping & Bump Mapping
To enhance the details of mesh surfaces and gemometries, texture mapping is a must. Here we have not implemented mipmap on GPU, though it should not be that difficult to do so.
scenes/pathtracer_test_texture.glb
Link
Before bump mapping | After bump mapping |
---|---|
![]() |
![]() |
Microfact BSDF
To use various material, bsdfs that are more complicated than diffuse/specular are required. Here, we will first implement the classic microfacet BSDF to extend the capability of material in this pathtracer.
This pathtracer uses the Microfacet implementation basd on pbrt.
Metallness = 1. Roughness 0 to 1 from left to right.
Please note that the sphere used here is not an actual sphere but an icosphere.
scenes/pathtracer_test_microfacet.glb
Link
With texture mapping implemented, we can use metallicRoughness
texture now. Luckily, gltf
has a good support over metallic workflow.
scenes/pathtracer_robot.glb
Link
Direct Lighting & MIS
To stress the speed up of convergence in MIS, Russian-Roulette is disabled in this part’s rendering.
The tiny dark stripe is visible in some rendering result. This is because by default we do not allow double-sided lighting in this pathtracer.
By default, number of light sample is set to 3.
When sampling for the direction of next bounce, we have adopted importance sampling for bsdf for most of the time. It enhances the convergence speed for specular materials, as the sampling strategies greatly aligned with the expected radiance distribution on hemisphere. However, for diffuse/matte surfaces, this sampling strategies can be optimized, as the most affecting factors for the radiance distribution of these sort of materials is light instead of outgoing rays. Thus, sampling from light is also a valuable strategy to speedup convergence speed of raytracing rough surfaces.
In this demo scene, 3 metal plane are allocated with 4 cube lights. When we only sample bsdf, we can see that the expected radiance on the surface of metal plane converges. When we only sample light, we can see how the rougher part of the scene, the back white wall, has better converging speed. Hence, we are looking forward to a sampling strategy that combines the advantages of these two, which is multiple importance sampling.
scenes/pathtracer_mis_demo.glb
Link
Only sample bsdf 500spp | Only sample light 500spp | MIS 500spp |
---|---|---|
![]() |
![]() |
![]() |
To see more details about this part, see this part of pbrt or this post of mine.
Test on bunny scene. Faster convergence speed can be observed.
scenes/pathtracer_bunny_mis.glb
Link
Without MIS 256spp | With MIS 256spp |
---|---|
![]() |
![]() |
Without MIS 5k spp | With MIS 5k spp |
![]() |
![]() |
Depth of Field
In depth of field, we define two variables. focal_length
& aperture
.
More details can be viewed in this post.
Depth of Field (Aperture=0.3) |
---|
![]() |
Future (If possible)
Cuda Side
- More cuda optimization
- Bank conflict
- Loop unroll
- Light sample loop (if multiple light rays)
- Higher parallelism (Use streams?)
- Tile-based raytracing
- Potentially, it should increase the rendering speed, as it will maximize locallity within one pixel/tile. No more realtime camera movement though.
Render Side
- Adaptive Sampling
- Mipmap
- ReSTIR
- Refractive
- True BSDF (Add some subsurface scattering if possible?)
- Volume Rendering
(Ready for NeRF)