4.6K Views
November 27, 23
スライド概要
■Overview
Strand Rendering
- A method for rendering anti-aliased hair will be presented.
Strand Lighting
- We will present an overview, implementation methods and optimization of multi-scattering, which is an important effect to achieve realistic hair saturation, depth, and volume.
Note: This is the contents of the publicly available CAPCOM Open Conference Professional RE:2023 videos, converted to slideshows, with some minor modifications.
■Prerequisites
Assumes knowledge about ray tracing and/or experience implementing it.
I'll show you just a little bit of the content !
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CAPCOM Open Conference Professional RE:2023
https://www.capcom-games.com/coc/2023/
Check the official Twitter for the latest information on CAPCOM R&D !
https://twitter.com/capcom_randd
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
株式会社カプコンが誇るゲームエンジン「RE ENGINE」を開発している技術研究統括によるカプコン公式アカウントです。 これまでの技術カンファレンスなどで行った講演資料を公開しています。 【CAPCOM オープンカンファレンス プロフェッショナル RE:2023】 https://www.capcom-games.com/coc/2023/ 【CAPCOM オープンカンファレンス RE:2022】 https://www.capcom.co.jp/RE2022/ 【CAPCOM オープンカンファレンス RE:2019】 http://www.capcom.co.jp/RE2019/
Resident Evil 4 Hair Discussion This presentation will be about the Strand representation in Resident Evil 4. This presentation will consist of two parts. ©CAPCOM 1
Strand Rendering Hello everyone. My presentation is on Strand Rendering. 1 ©CAPCOM 2
History of Hair Translucent hair era • Resident Evil 6 on PS3 / XBOX360 • Kajiya + Forward • Z-PrePass opaque areas and draw them as translucent Shader graph era • Resident Evil 7 --> present • Free material setting • Deferred, so Dithering is used to create translucency • Do what you can to make hair look right with the existing parameters Improvement of hair quality is desired • Limitations of TAA anti-aliasing • Inability to handle lighting properly, etc. The history of hair is very long, but I will base my explanation on Leon as that‘s what I worked on. In the PS3/XBOX360 era,we used translucent forward rendering. At that time, we Z-PrePassed the opaque areas to optimize the process. We also switched to different shaders based on the number of lights for optimization, etc. 2 Next is the PS4/XBOX One era. From Resident Evil 7 to the present, shader graphs were used to allow artists to use shaders freely. The lighting has also become richer, but the increased resolution makes it difficult to render with semi-transparency. We render in Deferred, with a Z-PrePass and Dithering. However, improvement of the quality of the hair has been requested by many. RE2 and RE4 on the right are both Deferred but very nice. However, there are some issues with the quality of anti-aliasing by TAA, and the lighting not being represented correctly. ©CAPCOM 3
The Coming of the Age of One Hair at a Time There was a very impressive soccer game on PS5! • Ray tracing is in demand, but improving character quality is also important! What is required for high quality hair • Hair with anti-aliasing • Believable lighting • Dynamic simulation Now, one day I saw a very famous soccer game rendering nice hair on PS5. The RE ENGINE rendering unit was mainly working on improving the look and feel of the PS5 generation with ray tracing... 3 But they woke up to the importance of improving the quality of the characters as well. The detailed information in EA's "Physics and Rendering Behind Frostbite's Hair," which was presented at DigitalDragon2020, was particularly helpful for our strand development. So, high quality hair is required. Now, I will talk about hair with anti-aliasing. ©CAPCOM 4
Final Results The finished product looks like this. The image above was taken in game runtime. 4 We load the hair via Alembic. In RE4, we used Ornatrix to create hair. ©CAPCOM 5
Overall Flow Execute the following loop in Strand rendering Hair Classification and Culling Hardware Rendering Lighting Software SemiTransparent Drawing Let's look at the whole process. Strand works in units of hair clumps. The hairs are rendered by first classifying and culling, then lighting,then hardware drawing, and then software semi-transparency drawing. ©CAPCOM 5 6
Hair Classification and Culling Hair classification by projected area of hair width • Hardware drawing for large line segments • Hair that is thin and needs transparency is done with software drawing • Magenta is drawn by hardware, green by software 1080p 4k 8k Now, for hair classification. We classify hairs by the projected area of the width of the hair. If you pay attention to the image, the color changes from green to purple depending on the resolution. 6 If the resolution is low, the hairs are rendered in software, and if the resolution is high, the hairs are rendered in hardware. A detailed explanation is given in the latter half of the section on software semi-transparent rendering. ©CAPCOM 7
FHD: Software rendering is dominant In FHD, software rendering is dominant. 7 ©CAPCOM 8
4K: Hardware rendering is increasing In 4K, hardware rendering increases. 8 ©CAPCOM 9
8K: Mostly hardware rendering In 8K, the hardware rendering is doing almost everything. 9 ©CAPCOM 10
Lighting Implemented with reference to Strand-based Hair Rendering in Frostbite[19] • TT's Azimuthal Roughness LUT was approximated in-house Illumination results are stored as vertex color • Separate rasterization and lighting • Easily adaptable to transparency Result • Image shows Single Scattering results for direct illumination • I don't know if that's correct, but it looks like it • Multiple scattering required except for black hair • Details in the section on Strand Lighting Single Scattering Next is lighting. Lighting is based on "Strand-based Hair Rendering in Frostbite,“ published in 2019. The lighting results are created prior to drawing as vertex colors. By pre-creating the lighting result, it can be used in both hardware and software rendering. 10 On the right is the result of Single Scattering with direct illumination. I honestly don't know if that's correct, but it looks like it. Except for when drawing black hair, you need to calculate Multiple Scattering. In the actual game, multiple scattering is used. More details will be explained in Strand Lighting. ©CAPCOM 11
Hardware Rendering Drawing large areas with the software rasterizer is costly Draw as a polygon looking in the direction of the camera • Calculate perfect motion velocity for each strand of hair • Antialiasing relies on TAA Next is the hardware rendering. No special optimization is performed here. Originally, we intended to do everything with the software rasterizer. 11 However, hardware support was added because of the heavy burden of drawing lines with width. The hair itself is drawn in the shape of a billboard looking in the direction of the camera, as shown in the figure below. At this time, the correct motion velocity is calculated for each hair. This means that Antialiasing relies on TAA. ©CAPCOM 12
Low resolution tends to break up lines Thus, drawing can be done with the hardware rasterizer. It looks about right, but the lines are a bit choppy and have gaps in places. 12 ©CAPCOM 13
Result of software drawing and compositing Compositing this with the software drawing mentioned earlier gives a nice look. 13 ©CAPCOM 14
Result of software drawing and compositing Let's zoom in for a comparison. On the left is the hardware + TAA. On the right is the software rasterizer. 14 ©CAPCOM 15
Preparation for Software Drawing Classification and culling at the beginning Prepare for drawing with 4 Dispatches 2nd Calculate memory size to register line segments in Froxel 1st Depth Creation Create a reduced depth buffer (1/16 each in portrait and landscape) for occlusion culling of line segments ・Rasterize the hairs and write them in Froxel ・Calculate the size required to store the index Counting is simply accomplished with Interlocked Add 4th Assign line index to Froxels 3rd Allocate line memory to Froxel ・Get offset address for linear buffer to write to based on the size stored in the Froxel ・ Write offset address to Froxel, reset Froxel counter to 0 While rasterizing the hair line segment again Offset address and count, Register line segment The above process operates at scale with any IHV There are four passes in preparing for software rendering. The first is to create a 1/16th size depth buffer for occlusion culling the hair strands. 15 Second, calculate each Voxel's size for registering the hair line segments in Froxels. We rasterize the hair here and use Interlocked Add on the Froxel to increment a counter. The third is to allocate memory. We allocate an offset address a linear buffer to store the result for each Froxel. This can be determined by using interlocked on the counted-up capacity of the counted-up capacity of the Froxel against he buffer's counter. We also reset the Froxel's capacity at this point. ©CAPCOM 16
Preparation for Software Drawing Classification and culling at the beginning Prepare for drawing with 4 Dispatches 2nd Calculate memory size to register line segments in Froxel 1st Depth Creation Create a reduced depth buffer (1/16 each in portrait and landscape) for occlusion culling of line segments ・Rasterize the hairs and write them in Froxel ・Calculate the size required to store the index Counting is simply accomplished with Interlocked Add 4th Assign line index to Froxels 3rd Allocate line memory to Froxel ・Get offset address for linear buffer to write to based on the size stored in the Froxel ・ Write offset address to Froxel, reset Froxel counter to 0 While rasterizing the hair line segment again Offset address and count, Register line segment The above process operates at scale with any IHV Finally, perform another software raster. Store the re-drawn line indices in the linear buffer. The destination is determined by the offset address of the Froxel and Interlocked. This is a wasteful operation, but since this operation scales with any IHV, we use it. ©CAPCOM 15 17
Software Semi-transparent Drawing Perform rasterization with a single compute shader • Group Thread operates on a tile-by-tile basis with 16x16 pixels as one tile • One thread is responsible for one strand of hair • Calculate translucency from projected area using the width of each vertex of the line segment • When drawing a line segment, a depth test is also performed from the depth of each vertex • Illumination result is linearly interpolated vertex color • Line drawing is processed in 2D using anti-aliasing lines • Provides clean anti-aliasing • Rasterize Froxels from front to back • This lets us terminate the process once the tiles are sufficiently opaque For Software Semi-Transparency, we perform rasterization in a single Compute Shader. A single tile is 16x16 pixels, and a group thread runs per tile. 16 One thread is responsible for one line segment. The width contained in the vertices of the line segment is used to calculate the translucency ratio in the projected area. The line segment itself is drawn in 2D using Antialiasing Line to draw a clean smooth line. Froxels are rasterized from the front. This is done so we can terminate when the pixels inside the tile are sufficiently opaque. ©CAPCOM 18
Semi-transparent in Any Order Multi-Layer Alpha Blending (MLAB) is implemented to realize semi-transparent rendering • A type of Order Independent Transparency (OIT) MLAB • Holds N working buffers (layers) • Using the depth of the input pixel, a sort is performed that preserves color, etc., in order from the front • The result is composited with transparency on the last layer • RE4 uses 8 layers, so depth and color are guaranteed up to 7 layers in effect Advantages • Simple Implementation • Fixed memory size Drawback • Need to guarantee atomicity (originally used Pixel Sync (RasterOrderedView)) The points to be rasterized are translucent and have depth and color. If this is simply drawn in the order of the process, it won't look very nice. 17 Therefore, in order to correctly represent translucency, we implemented Multi-Layer Alpha Blending, which was announced in 2014. This is a type of Order Independent transparency. Multi-Layer Alpha Blending is a simple algorithm. It keeps N buffers and uses the depth of the input pixel to store colors, etc., in order from the front. This corresponds to a sorting process of some kind. The outliers from the sorting process are semi-transparently merged with the last layer section. ©CAPCOM 19
Semi-transparent in Any Order Multi-Layer Alpha Blending (MLAB) is implemented to realize semi-transparent rendering • A type of Order Independent Transparency (OIT) MLAB • Holds N working buffers (layers) • Using the depth of the input pixel, a sort is performed that preserves color, etc., in order from the front • The result is composited with transparency on the last layer • RE4 uses 8 layers, so depth and color are guaranteed up to 7 layers in effect Advantages • Simple Implementation • Fixed memory size Drawback • Need to guarantee atomicity (originally used Pixel Sync (RasterOrderedView)) Since RE4 uses 8 layers, up to 7 are guaranteed to be in depth order. The advantage of this method is its simple implementation. It also works with a fixed memory size. The tradeoff is that Atomicity must be guaranteed. 17 Originally, a functionality called Raster Ordered View was used. ©CAPCOM 20
MLAB Implementation Implement work buffer on GroupSharedMemory • GroupSharedMemory is ultra-high bandwidth • Compresses and retains 5 elements of depth, translucency, and color to 64 bits • 16-bit depth + 16-bit Transmittance + 32-bit Color (R11G11B10Float) Atomicity Assurance • Switched to 64-bit InterlockedMax • Semi-transparency compositing • Implemented with CompareAndSwap and Loop In this case, the work buffer is implemented on GroupSharedMemory. GroupSharedMemory is ultra-wide bandwidth on the console, so it runs faster than normal memory writes. 18 Next, to guarantee Atomicity, a total of 5 elements (depth, translucency, and color) are compressed to 64 bits and retained. This is further bolstered with InterlockedMax. Also, translucent compositing is implemented by CompareAndSwap and Loop. ©CAPCOM 21
Atomicity Assurance
GPUs supporting 64-bit Interlocked
InterlockedMax Or Min (depending on the direction of Depth) to complete sorting
for (int i = 0; i < OIT_LAYER_COUNT; i++) {
InterlockedMax(image[x][y].layer[i].u64, fragment.u64, old_fragment.u64);
if (compare(fragment, old_fragment))
fragment = old_fragment;
}
For Atomicity assurance, simply use 64-bit Interlocked.
RE ENGINE uses Reverse Depth.
19
InterlockedMax is used to sort and exchange values so that higher values are placed in the front.
It works as shown.
©CAPCOM
22
If 64-bit Atomic is Not Supported
GPUs supporting only 32-bit Interlocked
• Propagation through exchange of information, but no guarantee of integrity
• It may not be correct, but it ensures that color information is not lost
for (int i = 0; i < OIT_LAYER_COUNT; i++) {
InterlockedMax(image[x][y].layer[i].u32.y, fragment.u32.y, old_fragment.u32.y
if (compare(fragment, old_fragment)) {
InterlockedExchange(image[x][y].layer[i].u32.x, f.u32.x, old_fragment.u32.x
fragment = old_fragment;
}
Certain hardware may not support 64-bit atomic.
On GPUs that only support 32-bit Interlocked, the results are likely to look correct, though they will not be perfect.
20
The response is to first perform InterlockedMax with 32-bit.
If the result is smaller than the current Fragment, InterlockedExchange is performed to swap with the new color.
There is a time lag between InterlockedMax and InterlockedExchange.
The result may be indeterminate, but we do the exchange to propagate the information.
Does this give us the correct result?
No, but it's a better approach than no approach at all.
©CAPCOM
23
Semi-transparency Compositing
Implemented with loop and compare and swap
Performs simple compositing at the end of Multi-Layer Alpha Blending
float4 c1 = getMLABFragmentColor(fragment);
uint count = 0;
do{
Fragment f0 = imageLayer[x][y].layer[OIT_LAYER_COUNT - 1];
float4 c0 = getMLABFragmentColor(f0);
float4 mergedColor;
mergedColor.rgb = c0.rgb + c1.rgb * c0.a;
mergedColor.a = c0.a * c1.a;
Fragment v = setupMLABFragment(mergedColor, getMLABFragmentDepth(f0));
Fragment ret;
InterlockedCompareExchange(
imageLayer[x][y].layer[OIT_LAYER_COUNT - 1].u64,
f0.u64,
v.u64,
ret.u64);
count += (ret.u64 != f0.u64) ? 1 : 0xfff; //Exit if exchange is successful, loop if not
} while ( count < 256); //Terminate appropriately
Now, the last step is to deal with the part of the image that is out of the layer.
The last buffer in Multi-Layer AlphaBlending is composited with transparency using CompareAndSwap.
24
After this last layer, it is more likely that the result will not be a correct semi-transparent blend in some areas.
However, the result seems to be working roughly right.
This is probably because the hair is composed of roughly the same color.
©CAPCOM
24
OIT 64-bit Interlocked This is OIT made with 64-bit Interlocked. It looks good. 22 ©CAPCOM 25
OIT 32-bit Interlocked This is OIT made with 32-bit Interlocked. It looks mostly good. 23 ©CAPCOM 26
Composite to Final Buffer TAA operation stopped due to pre-calculation of translucency • Output depth using Responsive Antialiasing and Transmittance threshold • Calculate motion velocity from appropriate joints (head or neck) using depth We'll now composite the result produced by the software rasterizer. The software rasterizer is already anti-aliased. To avoid unwanted blurring caused by TAA, use Transmittance as Responsive AA. Otherwise, excessive smearing will occur and the image will look unstable. 24 The depth output can also be turned on or off using the Transmittance threshold. This is done to reflect post-effects such as depth of field. The depth of the hair in the foremost foreground is used as the output depth. Finally, there is Motion Blur support. The Software Rasterizer calculates motion velocity using simple specific joints. It does not reflect the movement of the hair, but it instead uses movement of the neck to save processing time. (Or rather, it would not be possible to do it otherwise.) ©CAPCOM 27
Performance PS5 1920p@ CBR PS5 2160p@ CBR Setup 0.478 ms 0.526 ms Software Drawing 2.890 ms 2.986 ms Hardware Rendering 0.285 ms 0.496 ms Total 3.653 ms 4.008 ms Let's take a look at the rasterization-only time for a cutscene like the one on the right. With Checkerboard rendering at 4K, it takes about 4.0ms to render the hair. 25 With framerate priority, the processing time is approximately 3.6ms. This is in addition to the number of characters, simulation and lighting calculations. These are all GPU processing times and the CPU is doing nothing. The hair consumes a lot of performance in the display area, but it looks good. ©CAPCOM 28
Optimization: LOD Implement hair reduction into hair LOD • Shuffle the order of hairs by random numbers and change the amount of hairs displayed by the percentage of LOD • Scaling hair thickness by LOD percentage • Reduce amount of hair during non-cut scenes in-game to prioritize gameplay 100% 50% 12.5% 25% 6.25% Finally, optimization. Hair is a GPU-intensive process, as shown in the previous example. 26 For optimization, we included a function that thickens the hair when the percentage of hair displayed is reduced. This improves visual consistency and performance. In-game in RE4, hair is always displayed with 70% off on the console for performance reasons. ©CAPCOM 29
Summary and Future Goals High quality hair rendering is now possible • Anti-aliasing • Even handles hair thickness Future Goals • Automated hair reduction • Help to improve performance and appearance • Improved antialiasing during hardware drawing • Combine ConservativeRasterizer and GBAA and composite as OIT? • Further acceleration of software rendering • Somewhat fast, but there is room for algorithmic optimization • Froxels are done from the front, but we could divide into N pieces and execute, composite, etc. • Ray tracing • Currently shows non-stranded mesh Now, to summarize. We can now represent anti-aliased hair using a software rasterizer. Thick hair is also now supported. In addition, high quality shading has been added. 27 Future tasks include improving performance and quality by automating hair segmentation. OIT during hardware rendering is also under consideration, as well as other optimizations. Also support for ray tracing. Currently, we are displaying a non-stranded mesh, so we are looking into the possibility of reflecting ray tracing with fewer hairs as described in the LOD section. ©CAPCOM 30
References How Frostbite is Advancing the Future of Hair Rendering Technology (ea.com),2020 https://www.ea.com/frostbite/news/the-future-of-hair-rendering-technology-in-frostbite Hair Shading • Strand-based Hair Rendering in Frostbite, Sebastian Tafuri ,2019 • Physically Based Hair Shading in Unreal,Brian Karis,2016 AntialiasedLine • https://en.wikipedia.org/wiki/Xiaolin_Wu's_line_algorithm Order Independent Transparency • Multi-Layer Alpha Blending,Marco Salvi and Karthik Vaidyanathan,2014 https://software.intel.com/content/dam/develop/external/us/en/documentsf/i3d14-mlab-preprint.pdf • Practical Order Independent Transparency, Johan K¨ohler, Treyarch,2016 https://research.activision.com/content/dam/atvi/activision/atvi-touchui/research/tech-reports/docs/PracticalOIT.pdf This concludes the presentation of Part I. We will continue with the presentation of the second part, Strand Lighting. 28 ©CAPCOM 31
Strand Lighting I will now introduce Strand Lighting. 29 ©CAPCOM 32
Agenda Strand Lighting Final Results Multiple Scattering Dual Scattering Indirect Lighting Shadows Shader Graph Processing Load Measurement Future Challenges and Prospects The agenda is shown here. 30 ©CAPCOM 33
Strand Lighting Final Results First, I would like to show you the final result of Strand Lighting, before I go into detail on it. This is part of a cutscene from the RE4 game, where Strand Lighting was applied to Leon and Ashley's hair. 31 The quality of the hair expression has been improved with the use of Strand Lighting, which gives the hair realistic saturation, depth, and volume. ©CAPCOM 34
Multiple Scattering Light is scattered countless times while passing through multiple hairs Light hair color results from scattering Need to calculate the contribution of countless different light paths Eye Difficult to achieve in real time Let's begin with the fundamental concept of hair shading: multiple scattering. The figure on the right shows how light from the sun enters and is scattered by vertical brown strands of hair. 32 As shown on the right, light undergoes a complex multiple scattering phenomenon where it's scattered countless times as it passes through the hair and enters the viewpoint. This is an important effect for achieving a realistic sense of depth and volume. This is a particularly important component for light-colored hair, as light-colored hair appears lighter due to this scattering effect. However, in order to achieve this, countless scattering simulation calculations must be performed on every hair, which is difficult to achieve... Especially for games that require real-time performance. ©CAPCOM 35
Multiple Scattering Comparison with and without multiple-scattering No multiple-scattering With multiple-scattering Here is a comparison of RE4's Leon with and without multiple scattering. The left image is without multiple scattering, which does not express the bright hair color, and the depth and volume of the hair look unnatural. 33 On the right, with multiple scattering, the bright hair color is expressed with realistic saturation, depth, and volume. ©CAPCOM 36
Dual Scattering Approximate multiple scattering Global Multiple-scattering ・Global Multiple Scattering(Ψ𝐺 ) Ψ𝐺 (𝑥, ω𝑑 , ω𝑖 ) ≈ 𝑇𝑓 (𝑥, ω𝑑 ) 𝑆𝑓 (x, ω𝑑 , ω𝑖 ) 𝑆𝑓 (𝑥, ω𝑑 , ω𝑖 ) = g(θℎ , 𝜎2𝑓 (𝑥, ω𝑑 ))/(π cosθd) 𝑇𝑓 𝑥, ω𝑑 ≈ 𝑑𝑓 𝑎𝑓 (θ𝑑 )𝑛 σ2 𝑓 𝑥, ω𝑑 ≈ β2 𝑓(θ𝑑 )𝑛 × n ・ Local Multiple Scattering(ΨL) ΨL(𝑥, ωd, ωi)fs(ωi, ωo) ≈ dbfback(ωi, ωo) fback(ωi, ωo) = 2Ab(θ)g(θh − ∆b(θ), σ2b(θ)) / πcos2θd Eye References: ・Dual Scattering Approximation for Fast Multiple Scattering in Hair ・Efficient Implementation of the Dual Scattering Model in RenderMan Local Multiple-scattering I will now explain the implementation of multi-scattering. As I mentioned earlier, multi-scattering is computationally expensive and difficult to implement in a game, so it is necessary to use an approximation method. 34 In RE4, multi-scattering is implemented using an approximation method called Dual Scattering. Dual Scattering is a method that simplifies the calculation by dividing Multiple Scattering into two components: Global Multiple Scattering and Local Multiple Scattering. Dual Scattering is a simple method to calculate complex physical multiple scattering phenomena. Global Multiple Scattering approximates the scattering from the point where light enters the hair to the shading point, as shown in the blue box in the figure on the right. ©CAPCOM 37
Dual Scattering Approximate multiple scattering Global Multiple-scattering ・Global Multiple Scattering(Ψ𝐺 ) Ψ𝐺 (𝑥, ω𝑑 , ω𝑖 ) ≈ 𝑇𝑓 (𝑥, ω𝑑 ) 𝑆𝑓 (x, ω𝑑 , ω𝑖 ) 𝑆𝑓 (𝑥, ω𝑑 , ω𝑖 ) = g(θℎ , 𝜎2𝑓 (𝑥, ω𝑑 ))/(π cosθd) 𝑇𝑓 𝑥, ω𝑑 ≈ 𝑑𝑓 𝑎𝑓 (θ𝑑 )𝑛 σ2 𝑓 𝑥, ω𝑑 ≈ β2 𝑓(θ𝑑 )𝑛 × n ・ Local Multiple Scattering(ΨL) ΨL(𝑥, ωd, ωi)fs(ωi, ωo) ≈ dbfback(ωi, ωo) fback(ωi, ωo) = 2Ab(θ)g(θh − ∆b(θ), σ2b(θ)) / πcos2θd References: ・Dual Scattering Approximation for Fast Multiple Scattering in Hair ・Efficient Implementation of the Dual Scattering Model in RenderMan Eye Local Multiple-scattering By treating only forward scattering, which has a large contribution, and by assuming thatthe direction of the passing hair is the same, many calculations can be pre-computed. 34 The equation in red is the approximation. It can be pre-computed except for the n term (number of hairs from the light source to the shading point). Local Multiple Scattering, on the other hand, approximates the scattering in the vicinity of the shading point, as shown in the teal oval in the figure on the right. Especially for light-colored hair, this item has a significant effect on hair color. Since Global Multiple Scattering handles forward scattering, this term must include at least one backward scattering. For simplicity, this can also be pre-computed by assuming the presence of hair in the surroundings and approximating the shading to be in the same direction. If you are interested in these Dual Scattering techniques,please see the included references. ©CAPCOM 38
Dual Scattering Number of hairs n term in Global Multiple Scattering calculation ・Utilize ForwardScatteringMaps - Voxel size is 128 x 128 x 128 - Rayshoot Voxel from shading point to light source →No need to store Tf and σ2f terms in voxels from approximate formula, only hair opacity is drawn → No need to create DeepOpacityMap for each light source Head and body scattering occlusion ・Occluding between Gbuffer depth and depth offset (e.g., 30 cm) Lighting calculations are per hair strand vertex Next, for the number n of hairs from the shading point to the light source that appeared in the GlobalMultipleScattering calculation described earlier, We use Forward Scattering Maps as described in the reference paper. However, Tf and σf squared are not kept in Voxel, and the opacity of the hair is calculated by drawing it. 35 The opacity is normalized from 0 to 1 as a percentage of the thickness of the hair in the voxel and is additively drawn. VoxelSize is 128x128x128. When shading, the Voxel from the shading point to the light source is ray casted, and the total opacity of the hair is obtained and converted to the number of hairs. ©CAPCOM 39
Dual Scattering Number of hairs n term in Global Multiple Scattering calculation ・Utilize ForwardScatteringMaps - Voxel size is 128 x 128 x 128 - Rayshoot Voxel from shading point to light source →No need to store Tf and σ2f terms in voxels from approximate formula, only hair opacity is drawn → No need to create DeepOpacityMap for each light source Head and body scattering occlusion ・Occluding between Gbuffer depth and depth offset (e.g., 30 cm) Lighting calculations are per hair strand vertex The reason for choosing this method over DeepOpacityMap is that it eliminates the need to draw a shadow map for each light source and then create multiple layers of DeepOpacityMap, thereby reducing the geometry drawing cost, and it also eliminates the need to maintain Tf, and σf squared and since only the opacity of the hair needs to be drawn, it is easy to handle and memory-efficient. 35 The head and body scattering occlusion is handled by treating the area between the Gbuffer depth and the depth offset as the scattering occlusion area. Lighting calculations are performed per hair strand segment vertex. ©CAPCOM 40
Dual Scattering Scattering occlusion comparison No occlusion With occlusion Here is a comparison image of the scattering occlusion. The left image is without occlusion, and the right image is occluded using Gbuffer depth. You can see that the lighting of the hair is occluded by the head and body. ©CAPCOM 36 41
Dual Scattering Comparison with and without Dual Scattering Single Scattering Dual Scattering Pseudo Scattering (geometry) Next, I will show you how it changes with and without the Dual Scattering process. The left image is with only single scattering, and the middle image is dual scattering. The right image shows polygonal hair with pseudo anisotropic specular and rim lighting processing. 37 In the left image, since we're not performing the multiple scattering calculations, the hair is generally flat with no attenuation due to scattering, and the color isn't visible. In the middle image, dual scattering has more attenuation due to scattering than the other two images, and it expresses realistic saturation, depth, and volume. I think it gives a smooth and natural look. ©CAPCOM 42
Indirect Lighting Compatible with Light Probes, Local Cubemaps, and IBL Radiance Two directions based on camera viewpoint Normal ・ Normal + Binormal direction ・ Normal - Binormal direction Tangent Radiance Binormal I will introduce the handling of indirect light lighting. Indirect lighting includes Light Probes, Local Cubemaps, and IBL. 38 Calculating the effects of these from all directions in realtime with multiple scattering, is difficult, so we approximate it. The indirect light direction is determined based on the camera's point of view. In the figure on the right, the strand hair direction is Tangent, the camera direction is Normal, and the Binormal orthogonal vector is the cross product of the Tangent and Normal vectors. The multi-scattering light source is calculated in two directions: In the direction of the addition of the Binormal and Normal directions and in the opposite direction. That's the whole calculation process. ©CAPCOM 43
Shadows Shadows falling on strand hair ・Issue Due to per-vertex lighting, pixel dithering half-shadows can be seen ・Solution - Exponential Shadow Maps Half − shadow cast smoothly on hair I'll talk about shadows in strand hair are handled. Initially, shadows falling on strand hair were shadowed using the pixel dither method in screen space as with opaque objects. 39 But because the lighting was per vertex, even the slightest movement of the viewer or hair caused a violent flicker in the half-shadowed areas. To solve this problem, Exponential Shadow Maps were used to smooth out the shadow gradient. Exponential Shadow Maps were also beneficial in terms of load, as the shadow map was fetched only once per Strand segment vertex. ©CAPCOM 44
Shadows Shadows cast by strand hair ・Line drawing of strand hair on shadow map ・Don't draw all hairs (default 20%) The next section explains the shadows cast by strand hairs. In the shader, the segment vertices of the strand hairs are drawn as lines into the shadow map. 40 Since the pixel area of the strand hairs in the shadow map is small and does not require that much precision, the number of lines is reduced to minimize the processing load. By default, 20% of the total number of strands are drawn in the shadow map. ©CAPCOM 45
Shader Graph Shader graph for strand hair per vertex segment Specify hair color in BaseColor Convert BaseColor to Absorption Editable normals and roughness Reference: A Practical and Controllable Hair and Fur Model for Production Path Tracing This is an introduction to material shaders for shading strand hair. As with the other shaders, strand hairs have material shader support that can be handled by the artist. 41 The shader can be edited per strand hair vertex segment. BaseColor is used as the hair color input. Internally, it is converted to Absorption and used in the multiple scattering calculation. If you are interested in the details of this conversion, please refer to the reference shown. We also handle normals and various roughness values as main input parameters. ©CAPCOM 46
Processing Load Measurement 10 spotlights Ambient light is two directions based on camera direction PS5 GPU processing load (lighting processing only) Dual Scattering Number of strands: 23,115 Number of segment vertices: 266,937 No shadow map Approx. 1.32 ms With shadow map Approx. 2.25 ms I would like to show the processing load of Dual Scattering. The measurement conditions were 10 spotlights and ambient lighting in two directions based on the camera direction42 I mentioned earlier. The measurements were taken with PS5 for lighting only. The number of strands is 23,000 and the number of segment vertices is about 270,000. The GPU processing load without shadow map is about 1.32 ms, and with shadow map is 2.25 ms. ©CAPCOM 47
Future Challenges and Prospects Seamless increase/decrease of hair segment vertices for lighting Ray traced ambient light lighting support Support for Acceleration Structures for ray tracing Support for contact shadows Improvement and optimization of indirect lighting quality I will discuss the main issues and prospects for the future. The first is to support seamless increase and decrease of lighting hair segment vertices. 43 We would like to prepare the segment vertices of the lighting strand hair separately and make it possible to change them seamlessly to achieve a balance between lighting quality and load. We believe that this technique will be useful in dealing with LOD. Second, we would like to support lighting for ray-traced ambient light. We would like to apply ray-traced ambient light to strand hair. The third is Acceleration Structure support for ray tracing, which would allow ray tracing to affect strand hairs when creating ambient light. ©CAPCOM 48
Future Challenges and Prospects Seamless increase/decrease of hair segment vertices for lighting Ray traced ambient light lighting support Support for Acceleration Structures for ray tracing Support for contact shadows Improvement and optimization of indirect lighting quality The fourth is support for contact shadows. This is because there are currently no fine shadows on the hair and face, and there is a lack of grounding between them. 43 The fifth is about the approximate method explained in the indirect light lighting section, but the quality is still poor in some areas. We hope to further improve and optimize the quality. ©CAPCOM 49
References ・Dual Scattering Approximation for Fast Multiple Scattering in Hair, Arno Zinke, 2008 http://www.cemyuksel.com/research/dualscattering/ ・Efficient Implementation of the Dual Scattering Model in RenderMan, Iman Sadeghi, 2010 https://media.disneyanimation.com/uploads/production/publication_asset/24/asset/2_DualScatteringImplementation.pdf ・A Practical and Controllable Hair and Fur Model for Production Path Tracing, Matt Jen-Yuan Chiang, 2016 https://media.disneyanimation.com/uploads/production/publication_asset/152/asset/eurographics2016Fur_Smaller.pdf ・Physically Based Hair Shading in Unreal, Brian Karis, 2016 ・Strand-based Hair Rendering in Frostbite, Sebastian Tafuri ,2019 Thank you for your attention. 44 ©CAPCOM 50
Extra: Accuracy Using RGB111110Float for OIT colors RGB999e5 is better for fidelity Not yet in use, due to additional processing costs and minor differences in appearance As an aside, we're using RGB111110Float for OIT colors. The fidelity is better with RGB999e5, which shares the exponential part. 46 Since the mantissa part is 9 bits, a more accurate result can be expected. However, we don't currently use it, because the calculation cost is high, and the visual difference is unnoticeable except when A-B tested. ©CAPCOM 51
Extra: Accuracy — RGB111110Float FHD 47 ©CAPCOM 52
Extra: Accuracy — RGB999e5 FHD 48 ©CAPCOM 53
Extra: Division Determine the number of divisions based on the projected length of the line segment Division using Catmull-Rom (not implemented for RE4's release) This is a prototype where we determine the number of divisions based on the projected length of the line segment. Dividing using Catmull-Rom can create a curve that passes through all vertices. Division decisions can also be made during the classification stage to separate shading points and fineness. This feature wasn't implemented in the RE4 release. 49 If the Manhattan distance between the line segments of two points of the projected hair exceeds 8, a division is made. Also because Catmull-Rom also has derivatives, even with a hardware rasterizer you can produce smooth, continuous lines. Catmull-Rom Centripetal Catmull–Rom spline , 2023 https://en.wikipedia.org/wiki/Centripetal_Catmull%E2%80%93Rom_spline ©CAPCOM 54
Extra: Division — with Catmull-Rom 50 ©CAPCOM 55
Extra: Division — without Catmull-Rom 51 ©CAPCOM 56
Extra: IHV Characteristics
Considering loading GroupShared memory with Interlocked for safety
Possibility of GroupShared memory load results being cached
• Occured when implementing without using GroupMemoryBarrierWithGroupSync
• Particular caution needed during CompareAndSwap loop
float4 c1 = getMLABFragmentColor(fragment);
uint count = 0;
do{
Fragment f0 = imageLayer[x][y].layer[OIT_LAYER_COUNT - 1];
Fragment f0;
InterlockedMax(imageLayer[x][y].layer[OIT_LAYER_COUNT - 1].u64, 0,f0.u64 );
float4 c0 = getMLABFragmentColor(f0);
float4 mergedColor;
mergedColor.rgb = c0.rgb + c1.rgb * c0.a;
mergedColor.a = c0.a * c1.a;
When looking to support as many GPUs as possible, it's worth considering loading GroupShared memory with Interlocked instructions.
RE4 was tested on various GPUs, and we found that GroupShared memory load results may be cached in some cases. 52
Especially when referring to it in the color swap CompareAndSwap loop, we needed to be careful.
However, this is based on the characteristics of specific GPUs, so it's a possibility should be kept in mind.
©CAPCOM
57
Extra: Indirect Lighting Dominant direction in multiple directions →Up to 8 directions. The more directions, the higher both the quality and processing load Here's another method we have considered for indirect lighting. This method examines ambient light from the center of the strand hair all around the strand, selects the dominant light sources' directions, in multiple directions from 2 to 8, and performs multi-scattering light source calculations for those directions. 53 However, in some situations, this method requires 6 or more iterations to achieve the desired quality, so RE4 does not use this method. ©CAPCOM 58
Extra: Other Optimization Methods Using the guide hairs meant for simulation for lighting Guide Hair Lighting Normal Hair Lighting Here are some other optimization methods. Since lighting is done vertex by vertex, the load increases with the number of vertices. 54 To optimize this, we have a method of doing lighting with the guide hairs used for simulation, and storing them in the strand hairs used for the visuals. This method was not used in RE4 because the shading is a bit flat, but we believe it will be useful in the future because it reduces the processing load considerably. (We will show the benchmarks later.) ©CAPCOM 59
Extra: Other Optimization Methods LOD that increases/decreases number of strand hairs drawn ₋ ₋ Calculate the percentage of screen area covered (ScreenRate) by strand hair's AABB (where 1.0 is the entire screen) Calculate the LOD value as a percentage by taking into account the maximum (RateMax) and minimum (RateMin) of the area percentage and the minimum LOD value LODLevel = (ScreenRate - RateMin) / (RateMax - RateMin) + MinLOD 𝐿𝑂𝐷 value: 0.1 The second method is to automatically increase or decrease the number of strand hairs depending on the screen display area of strand hairs. 55 First, it calculates the percentage of AABB's screen display area of strand hairs, and then automatically calculates the LOD value from the following formula, taking into account the maximum and minimum display area values and the minimum LOD value. This allows for more strand hairs when hair is drawn large on the screen and less strand hairs when hair is drawn small on the screen, thereby reducing the overall processing load of strand hairs. In addition, when drawing the hair in voxels as explained earlier, if the number of hairs is reduced due to LOD, the opacity is increased to compensate for the reduced number of hairs, so that the opacity does not change due to LOD. This feature came too late to be added to RE4. ©CAPCOM 60
Extra: Processing Load Measurement 10 spotlights Ambient light is two directions based on camera direction PS5 GPU processing load (lighting processing only) Dual Scattering Strands: 23,115 Segment vertices: 266,937 Dual Scattering (guide hair) Strands: 1,420 Segment vertices: 17,614 No shadow map Approx. 1.32 ms 0.12 ms With shadow map Approx. 2.25 ms 0.18 ms This is a comparison of the processing load for lighting with and without guide hair. The measurement conditions were 10 spotlights and ambient light in two directions based on the camera direction, and 56all nonlighting processing were disabled. On the left is DualScattering and on the right is DualScattering with guide hair. In the lighting without guide hair on the left, the number of strands is about 23,000 and the number of segment vertices is about 270,000, resulting in a GPU processing load of about 1.32 ms. In the lighting with guide hair on the right, the number of strands is about 15,000 and the number of segment vertices is about 18,000, resulting in a GPU processing load of about 0.12 ms. The lighting with guide hairs has a processing load of about 1/10 that of lighting without guide hairs. ©CAPCOM 61