5.6K Views
November 27, 23
スライド概要
■Overview
An introduction to the rendering techniques used in RE ENGINE on 8th/9th generation consoles will be presented, including optimizing rendering using bindless to reduce CPU load.
In addition, rendering techniques for future titles will be introduced.
Note: This is the contents of the publicly available CAPCOM Open Conference Professional RE:2023 videos, converted to slideshows, with some minor modifications.
■Prerequisites
Assumes knowledge of graphics APIs and an interest in real-time CG rendering.
I'll show you just a little bit of the content !
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CAPCOM Open Conference Professional RE:2023
https://www.capcom-games.com/coc/2023/
Check the official Twitter for the latest information on CAPCOM R&D !
https://twitter.com/capcom_randd
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
株式会社カプコンが誇るゲームエンジン「RE ENGINE」を開発している技術研究統括によるカプコン公式アカウントです。 これまでの技術カンファレンスなどで行った講演資料を公開しています。 【CAPCOM オープンカンファレンス プロフェッショナル RE:2023】 https://www.capcom-games.com/coc/2023/ 【CAPCOM オープンカンファレンス RE:2022】 https://www.capcom.co.jp/RE2022/ 【CAPCOM オープンカンファレンス RE:2019】 http://www.capcom.co.jp/RE2019/
Is Rendering Still Evolving? Is rendering still evolving? I will be speaking on this topic. ©CAPCOM 1
Table of Contents Ray Tracing Bindless Visual Improvements Future Features I will discuss the evolution of RE ENGINE rendering based on this content. 1 ©CAPCOM 2
Table of Contents Ray Tracing Bindless Visual Improvements Future Features First up is Ray Tracing. 2 ©CAPCOM 3
Ray Tracing Implemented in DMC5 SE and EXOPRIMAL among others • Future titles will also use it Will be presenting about Materials and Acceleration Structures • Denoiser, etc., covered in Advances in Ray Tracing presentation RE ENGINE started supporting Ray Tracing just in time for the launch of 9th generation consoles. Devil May Cry5 Special Edition used Acceleration Structures for characters and backgrounds as much as possible. Also, for the first time in RE ENGINE history, it used Ray Tracing for GI and Reflection. Resident Evil Village was limited to using it for backgrounds only, but utilized it for GI and Reflection. 3 In addition, Resident Evil 2, 3, and 7, which added Ray Tracing support in an update, ased it for GI and reflections, including characters. Resident Evil 4 supports it only for reflections, and Exoprimal supports it only in cutscenes. Each title decides what to use Ray Tracing for, depending on the performance at runtime and what is being represented. I will talk about Materials and Acceleration Structures. Concepts such as Denoiser will be covered in the Advances in Ray Tracing presentation. ©CAPCOM 4
Ray Tracing Implementation and Principles Stable operation on any platform, any IHV • Using Inline Ray Tracing (RayQuery) • Inline ray tracing is difficult to use with multiple arbitrary shaders like DXR • Use only a single material function • Possibility to use conventional optimization knowledge • Can be a substitute for arbitrary shaders if they support bindless resources No impact on the art pipeline • Once ray tracing is enabled, it replaces the traditional approximation functionality • Replaces Indirect Illumination of Opaque Meshes, etc. • Implemented without modifying Assets that have already been created • Minimize modifications to meshes and materials whenever possible Ray Tracing must be stable on any platform and any IHV. That is why RE ENGINE uses Inline Ray Tracing. Inline Ray Tracing is difficult to use with arbitrary shaders like DXR. Currently released titles use only a single material function. By doing so, conventional optimization techniques may be used. 4 Material support has to include allowing arbitrary textures and UVs. Bindless resources are used to represent a variety of materials. It is also important that it does not affect the art pipeline. Once Ray Tracing is enabled, it replaces traditional functionality. For example, it can replace indirect lighting on opaque objects. To make this possible, the implementation is done without modifying the assets that have already been created. As a practical matter, some configuration changes have to be made. Modifications to meshes and materials are kept as minimal as possible. ©CAPCOM 5
Ray Tracing Material-Related Aspects In RE ENGINE, artists create using Shader Graph • Presets are used for the naming conventions for Texture and Shader variables • Some are specific to the game development team and will be specially addressed Ray tracing imitation shader Background shader Next is the material support conversion for Ray Tracing. In RE ENGINE, artists use shader graphs to create shaders. To support Ray Tracing, for materials, it's a essentially a case of porting the left image's graph to the right's. 5 Texture and shader variables' names follow a naming convention that we can use. For the most part, they're selected from presets in the shader graph. However, there are some that are specific to the game's development teams, which require special handling. ©CAPCOM 6
Binding of Existing Materials and Ray Tracing
Related settings defined in Json files
Rasterize
Ray Tracing
"LayerMap":{
"Redirects":[
{
"Type":"L1L2XX",
"Name":"LayerMaskOcclusionMap"
},
{
"Type":"L1L2XX",
"Name":"LayerMaskOcclusionMap"
}
]
},
"Layer1AlbedoMap":{
"Redirects":[
{
"Type":"ALBD",
"Name":"Snow_ColorRoughnessMap"
},
{
"Type":"ALBD",
"Name":"BaseDirecticMap1"
},
{
"Type":"ALBM",
"Name":"BaseMetaMap1"
}
]
},
"LayerNormalMap":{
"Redirects":[
{
"Type":"NRRC",
"Name":"NormalRoughnessCavityMapBase"
},
{
"Type":"NRRC",
"Name":"NormalRoughnessCavityMap1"
},
{
"Name":"NormalRoughnessMap1"
}
]
},
"Layer2AlbedoMap":{
"Redirects":[
{
"Type":"ALBD",
"Name":"Snow_ColorRoughnessMap"
},
{
"Type":"ALBD",
"Name":"BaseDirecticMap2"
},
{
"Type":"ALBM",
"Name":"BaseMetalMap2"
}
]
},
Json
The rebinding to support Ray Tracing for a title's materials is defined as in the Json on the right.
Let's take a look at the results.
The image shows a test performed with randomly selected assets from Resident Evil Village.
It's mirrored down the center of the screen.
The left side shows the traditional rasterizer display.
The right side shows the Ray Tracing results for comparison.
6
The results are somewhat similar.
Some materials, such as some decals, etc., are not being supported in the Ray Tracing version.
For example, the bloodstains on the table.
That's all for materials.
©CAPCOM
7
Acceleration Structures Acceleration Structures for per-vertex ray tracing data structure are required Mesh, etc. requires Bottom Level Acceleration Structure (BLAS) • With Static Meshes, a single BLAS can be reused with instancing • Environments, Props • For Dynamic Meshes, a BLAS is created uniquely for each object • Need enough memory for buffer of position after shape deformation and Acceleration Structure (AS) • Update work is also required for deformation of geometry, and requires GPU processing time for Refit or Build of AS • Skinning Mesh, Destruction, etc. Increasing Dynamic Mesh count consumes memory and GPU processing Next, Acceleration Structures. Ray Tracing requires a data structure type called an Acceleration Structure. Meshes, etc, need a Bottom Level Acceleration Structure (BLAS). 7 For Static Meshes, a single BLAS can be reused with instancing and treated similarly to an existing mesh. Static Meshes can be used for background objects or fixtures. For Dynamic Meshes, you need to create a unique BLAS for each object. Enough memory for the position buffer after transformation and the Acceleration Structure is required. Also, GPU processing time is required to perform Refit or Build on Acceleration Structures. This applies to skinning meshes. As you can tell, Ray Tracing consumes more memory and GPU processing time as the number of Dynamic Mesh increases. Therefore, special handling is needed for Dynamic Meshes. ©CAPCOM 8
Single-Joint Meshes Demoted to Static Meshes Interactable objects, primarily doors • Reduces BLAS counts, reduces skinning time Let's deal with the easy problems first. As we investigated the number of active skinning joints throughout the game, we discovered an interesting fact. 8 We found that there are many skinning meshes that contain only one joint in the model, such as doors. By automatically converting these to Static Meshes, we were able to reduce the memory required for BLAS creation and the processing time on the GPU. ©CAPCOM 9
Dynamic Meshes For most human-shaped models using skinning functions, Refit alone is enough Refit alone is not sufficient for Destruction meshes that use Skinning • Update at regular intervals? The next step is to deal with complex meshes. This section deals with skinning meshes and destruction effects. 9 For skinning meshes, simply reconfiguring the AABB, usually called Refit, is sufficient. But what about destruction meshes? Capcom traditionally uses Skinning Meshes to create destruction effects. Let's take a look at the video. This video is an example of a destruction effect performed using only Refit. This test is using a GeForce RTX2070 SUPER to compute Ambient Occlusion. It gets slower and slower as time goes on. This shows that using only Refit results in a lower quality BLAS and lower Ray Tracing performance. ©CAPCOM 10
More Stable BLAS Update Evaluation Roughly reproduce GPU's BLASes on CPU • Reuse Bounding Box of Skinning Mesh to build BVH for evaluation • Bounding Boxes are created from each Joint's AABB • The maximum number of joints in a single mesh is 256 in Resident Evil Village • Uses LinearBVH for the BVH A coarse reproduction of the GPU's BLAS on the CPU was used to introduce a more stable BLAS update evaluation. The detailed AABBs used to calculate the bounding box of the Skinning Mesh are reused to construct a simplified BVH10 for evaluation purposes. The image on the left shows the final bounding box of the character. The middle image shows the AABBs of the body joints, and the right image shows the AABBs of the head joints. From these AABBs, the surface area heuristic (SAH) is calculated using LinearBVH. When the SAH exceeds a threshold value, an update is made. For more details, please refer to the GDC document. ©CAPCOM 11
More Stable BLAS Update Evaluation Refit only Combination of build and refit using AABB's SAH The left side shows only Refit, while the right side uses CPU-side BVH to determine the build timing. The figure on the right shows that the rebuild is performed only when the GPU processing time increases to some extent, 11 improving performance and providing a stable frame rate. Although this method won't have the same effect everywhere, but it is quite a practical method. ©CAPCOM 12
BLAS Optimization Async Compute supported on PD since the Ray Tracing update • Async Compute required for Dynamic Mesh BLAS updates The next step is optimization. For Resident Evil Village, the PC version of RE ENGINE did not support Async Compute. 12 The Ray Tracing updates for Resident Evil 2, 3, and 7 supported AsyncCompute for the PC version to improve efficiency of BLAS updates for dynamic objects. Since synchronization is not possible with CommandList, the PC version only uses it for intensive tasks. When there are many Dynamic Meshes, refit and build of BLAS should be run in AsyncCompute to improve performance. This concludes the section on creation of Acceleration Structures for games. ©CAPCOM 13
Ray Tracing Applications: Light Probes Conventionally, penetrating light is a problem when multi-bouncing • Reuse of Cube map shooting process resulted in poor quality Ray Tracing for intersection determination • Stable results can be created even with multiple bounces Next, let's look at some applications of ray tracing. First is in Light Probes. Conventionally, RE ENGINE reuses cube map shots to create Light Probes. 13 However, there was a problem with light penetrating in multi-bounce. This is because the result baked in the first pass is interpolated in the second pass, and if the interpolation cannot be properly occluded, the result will look incorrect. When baked with Ray Tracing, it is just light tracing. It can handle occlusion properly, so multi-bounce can be represented correctly. Also, by using shadow rays, you can achieve shadows that are more stable than that of a shadow map and can be produced faster. ©CAPCOM 14
Application of Ray Tracing: Lightmaps Lightmaps used in background in some stages in Street Fighter 6 • Ray Tracing is used to bake in the engine In addition, Street Fighter 6's team requested to use lightmaps for some of the stages. They also wanted direct lighting to be baked into those lightmaps. On the left is using lightmaps, and on the right is using Light Probes. Again, Ray Tracing is used for baking. 14 The orange circled areas show the bake including the lighting. The shadows are baked in as well. ©CAPCOM 15
Application of Ray Tracing: Lightmaps Lightmaps used in background in some stages in Street Fighter 6 • Ray Tracing is used to bake in the engine This is what it looks like without the textures. 15 ©CAPCOM 16
Application of Ray Tracing: Signed Distance Fields Ray Tracing can be easily used for SDF creation as well • Used to bake scene data in games SDFAO Initial SDF Finally, Ray Tracing is also used to bake Signed Distance Fields. Baking in-game scenes can also be easily accomplished using Ray Tracing. Hardware Ray Tracing is a useful feature not only for the game at runtime, but also as a development tool. 16 That concludes my discussion of Ray Tracing. ©CAPCOM 17
Table of Contents Ray Tracing Bindless Visual Improvements Future Features Next, let's talk about bindless. Bindless is a method of not explicitly setting textures, buffers, etc., to the Graphics API. 17 ©CAPCOM 18
Bindless Resident Evil Village • Titles that straddle two generations of consoles: • 8th and 9th • A lot of CPU time is consumed by CommandList creation time • Resource binding is a Data Driven implementation using Shader Reflection • Low Descriptor Table reuse between Batches • Unique combinations of shaders and materials • in Descriptor Tables in reduce reuse potential Represent Mesh materials using Bindless RE ENGINE began to support bindless at the same time that Ray Tracing was introduced. It was used in a more generalized form in Resident Evil Village. This title was released on 8th and 9th generation consoles. 18 The resource bindings to each of RE ENGINE's Graphics APIs is done with a Data Driven approach using reflection on compiled shaders. As shader graphs became more resource-intensive, the problem of creating CommandLists for the GPU became more time-consuming. Much of this is due to the fact that shaders are binding different textures on a per-material basis. This resulted in poor performance because it was almost impossible to reuse the Descriptor Table between batches. This was especially problematic on Xbox One. Therefore, we improved the Descriptor Table reuse rate by using Bindless to represent the mesh materials. ©CAPCOM 19
Bindless Coverage
DirectX12 for generalization
• From ShaderModel 6.0 to 6.5, access Bindless Resources via Space
• In ShaderModel, 6.6 access Bindless Resources with ResourceDescriptorHeap
Bindless is for ShaderResourceView (SRV) only
• Access Texture from Shader with Handle
• Streaming Textures also updated based
on Index
#if !defined(HEAP_DIRECTLY_INDEXED)
.....
Texture2D<float4> BindlessTexture2D[] : register(t0, space4);
.....
#endif
Texture2D<float4> getBindlessTexture2D(uint handle) {
#if defined(HEAP_DIRECTLY_INDEXED)
return ResourceDescriptorHeap[NonUniformResourceIndex(handle)];
#else
return BindlessTexture2D[NonUniformResourceIndex(handle)];
#endif
}
.....
I will now talk about DirectX12 to generalize the story.
In RE ENGINE, from ShaderModel 6.0 to 6.5, bindless range is specified by Space.
19
Relatively new titles have moved to ShaderModel6.6 and use the ResourceDescriptorHeap if available.
Access as bindless is limited to ShaderResourceView (SRV) only.
Access to the texture from the shader is via a handle.
This is advantageous for handling streaming texture updates, as the handle itself remains unchanged while the contents are replaced.
So how do we make materials bindless?
©CAPCOM
20
Making Shader Graph Parameters Bindless Output structure for Bindless when generating code from Shader Graph • Make user-defined constant values and textures Bindless • Bindless/non-bindless can be casually toggled in project settings RE ENGINE uses a Shader Graph, so we can output the structures for bindless during code generation. Only user-defined parameters and textures will be made bindless. 20 Also, the use of bindless/non-bindless can be toggled in the project settings. ©CAPCOM 21
Making Material Variables Bindless Conventional ConstantBuffer • Create parameters according to the shader graph cbuffer UserMaterial { float4 VAR_BaseColor; float4 VAR_EmissiveColor; float4 VAR_Bulb_Color; float4 VAR_LampShade_Color; float VAR_EmissiveIntensity; float VAR_EmissiveRate; float VAR_EmissiveRateGamma; float VAR_Metallic; ……. }; The first step is to make the parameters bindless. Conventionally, they are defined as a Constant Buffer. 21 This Constant Buffer is created from the shader graph parameters. ©CAPCOM 22
Making Material Variables Bindless
Change from ConstantBuffer to a huge StructuredBuffer
• StructuredBuffer<uint4> BindlessBuffer;
• About 32 MiB
• All parameters are defined in static global variables
static float4 VAR_BaseColor;
static float4 VAR_EmissiveColor;
static float VAR_EmissiveIntensity;
static float VAR_EmissiveRate;
static float VAR_EmissiveRateGamma;
static float VAR_Metallic;
static float VAR_Roughness;
static float VAR_Translucency;
static float VAR_UV_Select;
static float VAR_use_Bulb;
static float4 VAR_Bulb_Color;
static float VAR_Bulb_Intensity;
static float VAR_Bulb_Pow_Rate;
static float VAR_Bulb_Levels_min;
….
void initializeUserMaterialConstantsFromBindlessBuffer(uint BindlessOffsetByte){
const uint index = BindlessOffsetByte / 16;
const uint4 block0 = BindlessBuffer[index + 0];
const uint4 block1 = BindlessBuffer[index + 1];
……
const uint4 block7 = BindlessBuffer[index + 7];
const uint4 block8 = BindlessBuffer[index + 8];
VAR_BaseColor.x = asfloat(block0.x);
VAR_BaseColor.y = asfloat(block0.y);
VAR_BaseColor.z = asfloat(block0.z);
VAR_BaseColor.w = asfloat(block0.w);
VAR_EmissiveColor.x = asfloat(block1.x);
VAR_EmissiveColor.y = asfloat(block1.y);
VAR_EmissiveColor.z = asfloat(block1.z);
VAR_EmissiveColor.w = asfloat(block1.w);
……
}
Next, we change from a ConstantBuffer to a huge StructuredBuffer.
In Resident Evil Village, we made the StructuredBuffer accessible in uint4 units.
22
About 32 MiB of memory is allocated.
All shader parameters are defined as static global variables, and an initialization function is used to initialize the parameters.
Initialization is done by specifying an offset byte to a dedicated function.
©CAPCOM
23
Making Textures Bindless Keep texture handle as part of the parameters • Can maintain arbitrary parameters, so we can use what we need static Texture2D BaseMetalMap; static Texture2D NormalRoughnessMap; static Texture2D EmissiveMap; …. void initializePSTextureFromBindlessBuffer(uint BindlessOffsetByte){ const uint index = (BindlessOffsetByte / 16); const uint4 block0 = (BindlessBuffer[index + 9].block); BaseMetalMap = getBindlessTexture2D(block0.x); NormalRoughnessMap = getBindlessTexture2D(block0.y); EmissiveMap = getBindlessTexture2D(block0.z); } The next step is to make the textures bindless. Textures are also defined as static variables. 23 As with the parameters, we restore the texture descriptors via a byte offset. This means that the index for the texture is kept immediately after the existing constant buffer. ©CAPCOM 24
Creating and Updating Bindless Information Use a BindlessBuffer of about 32 MiB as a HeapAllocator • Allocates memory with the required number of parameters according to the Shader Graph generation algorithm • The information required for material rendering is the byte offset of that allocated memory • We refer to it as BindlessMaterialHandle • Update BindlessBuffer on GPU using BindlessMaterialHandle • Update memory structures, similar to CPU memcpy The actual creation and updating of the bindless information is done on the CPU side to distribute the memory range. Currently, the Heap Allocator for Video Memory is reused. 24 Following the shader graph generation algorithm, the size is determined from the number of parameters and textures required, and memory is allocated. The byte offset of this memory is referred to as BindlessMaterialHandle. The BindlessMaterialHandle is used to update the bindlessBuffer on the GPU. This updates the memory structure in the same way as memcpy on the CPU. ©CAPCOM 25
Draw Call Modification RE ENGINE can set 32-bit Root Constant for Draw and Dispatch • Equivalent to D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS • Parameters are set sequentially for each batch • Platforms that don't support it use Constant Buffer Single item Draw Call • Set BindlessMaterialHandle directly as Root Constants Instancing Draw Calls • Set the BindlessMaterialHandle in the instancing management buffer Now, finally, a modification of the conventional draw call. Since the parameters are contained in a specific StructuredBuffer on the GPU, it is necessary to specify an offset to allow 25reading from outside. RE ENGINE allows an immediate 32 bit value to be passed into Draw or Dispatch as a Root Constant. This is set sequentially for each batch. For platforms that do not support this feature, it operates as a Constant Buffer. For a single DrawCall using the Root Constant, specify the BindlessMaterialHandle directly. For Instancing DrawCalls, specify BindlessMaterialHandle in the Instancing management buffer. ©CAPCOM 26
CPU Optimization Results Reduced from 10.3 ms for non-bindless to 8.2 ms for bindless Resource Barrier creation time Reduced from 2.1 ms to 1.79 ms • Gbuffer CommandList creation time Reduced from 3.3 ms to 2.1 ms Let's look at the results of CPU optimization. Measurements were taken on an XBox One development build at the Village location shown in the image. Because the measurements were taken manually, they are not exactly at the same location or at the same time. 26 Here is the comparison. Resource Barrier creation time is now 1.79 ms, down from 2.1 ms. This is an approximate 0.3 ms reduction. Next, let's look at the heavier gbuffer commandlist creation. The conventional Gbuffer has gone from 3.3ms to 2.1ms. The reason is that the traditional Gbuffer had 927 batches and 864 unique Descriptor Tables. Bindless Gbuffer has 936 batches and 96 unique Descriptor Tables. Descriptor Tables are now reused, skipping the DescriptorTable creation time. Going bindless improved the cpu's commandlist creation time. ©CAPCOM 27
GPU Optimization CPU performance improved, GPU performance reduced • The cause is that each parameter is going from the StructuredBuffer into VGPRs • Noticeable during Instanced drawing Explicitly perform scalarization to move from VGPRs to SGPRs • Scalarization per BindlessMaterialHandle Conventional Bindless Bindless+Scalarization 12 textures 68VGPRs 124VGPRs 72VGPRs 3 textures 32VGPRs 60VGPRs 56VGPRs The move to Bindless improved CPU performance, but reduced GPU performance. The reason for this is that Constant Buffer, Texture, etc. are now routed through StructuredBuffer. The most noticeable change is in instancing. 27 The index when instancing does not get picked up as a Uniform in Pixel Shaders. As a result, the register usage tends to increase. In this case, we will introduce Scalarization, which is often used for AMD-focused optimization. Here is a table of register usage for each shader on the Xbox One, taken from a development scene in Village. The non-bindless shader used 68 VGPRs and the bindless shader used 124 VGPRs, but the optimization reduced the register usage back to 72 VGPRs. This improves GPU parallelism and memory access. ©CAPCOM 28
Other Examples of Bindless Applications Mostly used to reduce Video Memory • Local Cube map (Reflection Probe) • Local Cube maps don't need to be copied to Texture Array etc. during runtime • Arbitrary resolution can be retained • Forward Renderer decals • Previously a large Texture2DAtlas • Free choice of resolution, Texture compression, Mipmap retention • Signed Distance Fields • Atlas-like 3DTexture is not required • Can be held in different accuracies (BC4, R8Unorm, R16Float...) Let's look at some examples of Bindless usage. Most of them are focusing on video memory use reduction. 28 Bindless allows the process to refer to the resource as it is, whereas previously it had to copy and hold it in internal memory. This makes data management more direct and reduces memory. ©CAPCOM 29
Bindless Side Effects Instability due to bindless • All managed on GPU, uninitialized, unstable due to update errors • No hope of checking status from data break or crash dump Available Weapons • NVIDIA Aftermath + NVIDIA Nsight Graphics • Minidump output per instruction of the shader running at the location of the problem • Aftermath is enabled by default during RE ENGINE game development • Console debugging function GPU crashes require specialized knowledge • Automatic control whenever possible • It is important to use a common interface Do not use Bindless except for performance or memory-critical processes So far, I've talked about the advantages of bindless, but there are some repercussions to using it. Conventional descriptors, for example, are managed entirely by the CPU. This is more stable thanks to debugging functions and a debugging layer. 29 However, with bindless, everything is managed on the GPU, so uninitialization or update errors can lead to instability. Instability leads to GPU crashes. On the user side, it is difficult to debug because you are dealing with something that is managed entirely on the GPU. For PCs, the most useful tool is NVIDIA's Aftermath. RE ENGINE always enables Aftermath during development and generates a mini dump when a GPU crash occurs. This minidump can help identify problems in shader code and point out buffer overruns, which can narrow down the scope of the problem to a certain extent. The minidump is especially useful for QA to investigate problems. ©CAPCOM 30
Bindless Side Effects Instability due to bindless • All managed on GPU, uninitialized, unstable due to update errors • No hope of checking status from data break or crash dump Available Weapons • NVIDIA Aftermath + NVIDIA Nsight Graphics • Minidump output per instruction of the shader running at the location of the problem • Aftermath is enabled by default during RE ENGINE game development • Console debugging function GPU crashes require specialized knowledge • Automatic control whenever possible • It is important to use a common interface Do not use Bindless except for performance or memory-critical processes The console also has excellent debugging capabilities, which is useful for non-platform specific issues. However, even then, GPU crashes require specialized knowledge. 29 No matter how automated the Bindless management mechanism is, there will still be unexpected cases that lead to improper memory access and cause GPU crashes. It is important to use a common interface whenever possible and to generalize the protections as much as possible. Therefore, I personally think it is best not to use bindless except for processes where individual execution time or memory is critical. Stability is very important. ©CAPCOM 31
Table of Contents Ray Tracing Bindless Visual Improvements Future Features Now, moving on. We're always working to improve the visual appearance, but here is one that has been particularly effective. 30 ©CAPCOM 32
Addressing the Invisible Normals Problem Invisible normals form as a result of shaders • Primarily caused by normal map Environmental maps and plane Environment and Normal Maps This section introduces how to deal with the invisible normal problem. Without a regular normal map, the visible surfaces most likely have visible normals. 31 Reflected images do not reflect anything below the horizontal plane. However, when a normal map is applied, the image will look like the figure on the right. The areas that appear dark in some places are the result of invisible normals, which are referencing the environment map below the horizon. ©CAPCOM 33
Addressing the Invisible Normals Problem Visualize invisible areas by inner product of line of sight and normal This problem is immediately apparent in the inner product (dot product) of the line of sight and the normal. The left image is a visualization of the normal map, and the right is the dot product of the line of sight and the normal.32 These wouldn't usually be visible. We, too, have been aware of this problem for a long time, but have ignored it. However, we are indeed in the middle of the ninth generation, so we wanted to do something about it. ©CAPCOM 34
Addressing the Invisible Normals Problem Corrected invisible normals to be visible when shading • Introduced in Resident Evil RE:4 We have therefore added the ability to use geometry normals to correct shading normals to prevent improper results. This feature can be turned on or off on a shader-by-shader basis, and is used by many shaders. 33 This eliminates invisible normals and provides more stable shading results. ©CAPCOM 35
Conventional Normals + Direct Lighting These are the results of conventional normals and direct lighting. 34 ©CAPCOM 36
Corrected Normals + Direct Lighting Here's direct lighting after the modification. The visual issues are less noticeable. 35 There are pros and cons to the correction method, but the elimination of the invisible normals provides a stable and consistent result. ©CAPCOM 37
Table of Contents Ray Tracing Bindless Visual Improvements Future Features Finally, let's talk about future features. 36 ©CAPCOM 38
Mesh Shader Divide vertexes into units called meshlets and process them • Units of 64 triangles/128 triangles Advantages • Flexible compression of vertex buffers • Only one value needs to be stored for the UV or vertex color for all vertices in a meshlet • Contributes to memory usage reduction • Culling • Culling is also possible with Compute Shaders and Amplification Shaders • Clear performance improvement during shadow mapping Disadvantage • Unsupported hardware • Imitate the effect with Vertex Shaders • Hard to get performance without a new GPU Future features include Mesh Shader support. Titles currently in development are rendering backgrounds using Mesh Shaders. The main reason for introducing Mesh Shaders is the flexible compression of the vertex buffer. 37 Mesh Shaders reduce Video Memory usage by compressing and decompressing data separated by rendering units called Meshlets. In many cases, data compression of 40% or more is possible, including position quantization. In addition, the granularity of occlusion culling on GPUs conventionally used is 768 triangle units. Meshlets are in units of 128 triangles, allowing for finer culling. Hardware that cannot use Mesh Shaders will run code that mimics them in vertex shaders, but with poor vertex reuse. ©CAPCOM 39
Visibility Buffer No need for double processing of vertices • Output PrimitiveID and InstanceID from Mesh Shader and Vertex Shader • Dramatically reduce processing time due to lower overdraw • Merge and execute in 1 Draw Call if they are the same shader • Material is set bindlessly • Reduces CPU processing time Next, we want to introduce the Visibility Buffer. This has already been introduced in many game engines, but it eliminates the need for double vertex processing. 38 This is advantageous to avoid the increased pixel load caused by overdraws, which is a problem with regular Gbuffers. In addition, materials are handled bindlessly, as we have already discussed earlier. Vertex data can also be restored from the Visibility Buffer. This means that different vertices and different materials can be executed together in one draw call, as long as they are the same shader. Using Mesh Shaders and Visibility Buffers reduces the GPU and CPU load considerably. ©CAPCOM 40
Summary • Addition of Ray Tracing support and ninth generation-like features • Performance and memory improvement using bindless • Improved a long-standing visual defect • Future memory reductions and ways to reduce the number of drawing calls and GPU load To summarize. Ray Tracing support and ninth generation-like features have been added. Performance and memory improvements have been realized using bindless. 39 We have also improved a visual defect that had stayed around for a long time. Mesh Shader and Visibility Buffer are poised to be our tools for the future. RE ENGINE's rendering capabilities are still evolving! ©CAPCOM 41
Future Challenges ・Safer Bindless Operation ・Visibility Buffer, complete transition to Deferred Texturing • Would contribute many performance improvements ・Optimization for each IHV Thank you for your attention! Future challenges include more stable Bindless operations. There is also the transition to Visibility Buffer and Deferred Texturing for rendering. 40 Games with complex shaders and large numbers of vertices will see considerable performance benefits. There are also more optimizations for each IHV. We will continue to learn more about the hardware and adapt optimizations for each of them. This is the end of this presentation. I hope the contents of this lecture will be useful to you. ©CAPCOM 42
Extra: Visibility Buffer + Variable Rate Shading We want to improve capabilities with Variable Rate Shading • Using the equivalent of Tier1 causes resolution loss up to the edge of polygons Hardware VRS Tier 1 2x2 Full Resolution This is something that we are testing and may not actually use in the end. We want to optimize the Visibility Buffer a bit more. 41 If you just naively use Tier 1 Variable Rate Shading, the result is lowered resolution right up to the edge of each polygon. This is an example using hardware Variable Rate Shading, which is designed for geometry. ©CAPCOM 43
Extra: Visibility Buffer Software VRS Maintain Visibility Buffer with MSAA or Interleave • In testing, MSAA was faster in processing the Visibility Buffer • Gbuffer's Visibility Buffer processing cost is reduced, but Resolve cost is high Full Resolution Hardware VRS Tier1 2x2 Software VRS Tier1 2x2 This is something that we are testing and may not actually use in the end. We are also working on Software VRS by expanding Material IDs to MSAA and running Gbuffer with Depth Equal and Sample Rate. 42 In testing, MSAA gave better results than Interleave, but considering the manual Resolve has to be run for each Gbuffer, and given the visual degradation and decompression cost, it's not meaningful VRS at the present time. A little more experimentation may be needed. ©CAPCOM 44