5.7K Views
November 27, 23
スライド概要
■Overview
HDR Grading
- A case study of transitioning from SDR grading to HDR grading by utilizing OpenColorIO.
ShellFur
- A case study of utilizing an old technology to achieve next-generation quality.
Signed Distance Field
- We will present features realized using SDF and examples of its optimization.
Note: This is the contents of the publicly available CAPCOM Open Conference Professional RE:2023 videos, converted to slideshows, with some minor modifications.
■Prerequisites
Assumes some knowledge of rendering.
I'll show you just a little bit of the content !
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CAPCOM Open Conference Professional RE:2023
https://www.capcom-games.com/coc/2023/
Check the official Twitter for the latest information on CAPCOM R&D !
https://twitter.com/capcom_randd
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
株式会社カプコンが誇るゲームエンジン「RE ENGINE」を開発している技術研究統括によるカプコン公式アカウントです。 これまでの技術カンファレンスなどで行った講演資料を公開しています。 【CAPCOM オープンカンファレンス プロフェッショナル RE:2023】 https://www.capcom-games.com/coc/2023/ 【CAPCOM オープンカンファレンス RE:2022】 https://www.capcom.co.jp/RE2022/ 【CAPCOM オープンカンファレンス RE:2019】 http://www.capcom.co.jp/RE2019/
New Rendering Features Rundown (HDR Grading, Shell Fur, Signed Distance Field) This is “New Rendering Features Rundown: HDR Grading, Shell Fur, Distance Field” First, let‘s start with the HDR Grading section, I will discuss a case study of an HDR grading implementation using OpenColorIO. ©CAPCOM 1
Agenda Overview of old and new pipelines HDR grading pipeline support GUI changes due to the migration to the new pipeline Optimization Artists' grading environment Future outlook Here is today's agenda. First, an overview and background of the pipeline before and after the transition to HDR grading. Details of the HDR grading pipeline. GUI support for the transition to the HDR grading pipeline. Optimization, artists' grading environment, and future prospects. 2 That will be the flow of the talk. ©CAPCOM 2
Background HDR output was supported, but still in SDR grading • Workaround during LUT access to avoid luminance clamping A request for HDR-first grading was raised First, as background on the transition to HDR grading: The engine originally supported HDR output, but it was limited to SDR grading, as will be explained later, when accessing the LUT used for grading during HDR output, a work-around was performed to avoid luminance clamping, 3 The content itself was also being produced as SDR-first, and there was a strong demand for HDR-first production. To be clear, in this session, the term “HDR grading” refers to supporting RGB values higher than 1 when processing screen tones and color grading. ©CAPCOM 3
Old Pipeline SDR Output New Pipeline SDR Output Now before we get into the explanation of the pipeline, I will show you a visual comparison of the SDR output of the new and old pipelines. 4 You can see that the new pipeline is able to display the sky and wall without blowing out, and you can also see the difference in the color of red fruit. ©CAPCOM 4
Old Pipeline LUT format is BT.709+sRGB • Workaround for HDR output, since it is not possible to input a value greater than 1 Old Pipeline sRGB LUT BT.709 Linear HDR Scene Render HDR PostEffect Exposure and Tonemap SDR ColorGrading sRGB OutPut SDR Display HDR10 Output HDR Display UI This is the old pipeline. The LUT format used for color grading is BT.709 sRGB. After the grading process is complete, UI rendering is performed. The output is converted to each display space such as sRGB and HDR10. ©CAPCOM 5 5
LUT Workaround for HDR Output If the luminance was 1.0 or higher, change the LUT access and usage method • Refer to LUT by color divided by luminance, multiply return value by luminance With LUT clamping Avoiding LUT clamping If the luminance of the color to be input to the LUT is 1.0 or higher, it is divided by the luminance before being looked up in the LUT. Then, the value that comes back from the LUT is multiplied by the luminance. That's how we work around the HDR output LUT access issue mentioned earlier. 6 While this does avoid clamping, it means the artists had no control over the grading of areas above 1.0 luminance. ©CAPCOM 6
New Pipeline Conversion using OpenColorIO+ACES • Converted to ACES AP1 (hereafter AP1) just before grading • Display output by ACES RRT+ODT LUT changed to AP1+ACEScc (or ACEScct) New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display This is the new pipeline. It incorporates a conversion pass using OpenColorIO+ACES and the color space for grading has been changed. 7 The conversion pass consists of a Gamut conversion (BT.709 to AP1) before grading, and the ACES RRT+ODT conversion to display space. Where we used to be grading with BT.709+sRGB, it's now AP1+ACEScc/ACEScct a the time of grading. Also due to the effect of the RRT+ODT conversion, the order of UI rendering was changed to be after the conversion to display space. ©CAPCOM 7
Comparison of Old and New Pipelines Old Pipeline sRGB LUT BT.709 Linear HDR Scene Render Exposure and Tonemap HDR PostEffect SDR ColorGrading sRGB OutPut SDR Display HDR10 Output HDR Display UI New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display 8 ©CAPCOM 8
LUT Format Created with AP1+ACEScc/ACEScct • The main grading tool is DaVinci Resolve • FP16 Size is chosen arbitrarily by the artist • Most choose 33x33x33 Let me explain a little more about LUTs. Artists primarily use DaVinci Resolve to grade with AP1+ACEScc/ACEScct. 9 If they are used at the same time, the size of each LUT must be the same, but, the size itself is not restricted and can be selected arbitrarily by the artist. Most of the LUT sizes are the DaVinci Resolve preset sizes of 33 in height, width, and depth. ©CAPCOM 9
LUT Shaper Titles can choose ACEScc or ACEScct • Different distribution of dark areas ACEScc ACEScct As for the LUT shaper, ACEScc/ACEScct can be selected arbitrarily for each title. ACEScct has a different distribution of dark areas due to a toe added to the ACEScc curve. For more information on each conversion formula, please refer to the ACES Technical Documentation: (ACES Technical Documentation (acescentral.com)) ©CAPCOM 10 10
New Pipeline Concepts Emphasis on working with DCC tools Ease of introduction Facilitates HDR/SDR output compatibility New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display Let's move on to the details of the new pipeline. The new pipeline emphasizes integration with DCC tools, simplifies implementation, and has easier conversion to SDR output when content is produced HDR-first. 11 That was our concept. ©CAPCOM 11
Emphasis on Integration with DCC Tools Grading is done in DCC tools • Various settings need to be matched between the engine and the DCC tool The format should be easy to handle as it may be used by various teams within the company • Engine adapts to the DCC tool side Adopt the widely used OpenColorIO+ACES format Let me start with the first point of the concept, which is integration with DCC tools. The reason that this is so important is that grading is not done in-engine, but in DCC tools such as DaVinci Resolve. 12 Naturally, in order to match the results in the engine with the results in the grading tool, various color transformations must match. It is used by various teams in the company, so it should be easy to handle so as to avoid unnecessary problems, such as mismatched pictures due to differences in settings. This is why we chose OpenColorIO; it's used for sharing settings even between DCC tools. We went with ACES for the same reason; widespread adoption between CDD tools. ©CAPCOM 12
OpenColorIO Open-source color management solution used in the film industry and by DCC tools • Generates various transformations based on config (.ocio) and LUT Color settings can be shared among DCCs We use the configs provided by ACES OpenColorIO is an open source color management solution used in the film industry and by DCC tools. It can generate various conversions from .ocio files and LUTs. 13 This mechanism facilitates the sharing of color settings among DCCs. For the configs, we're using the ones published on GitHub by Academy Software Foundation. ©CAPCOM 13
Academy Color Encoding System (ACES) Video production color management pipeline • Various color spaces and conversions are defined Use AP1 color space and RRT, ODT MIKE SEYMOUR fxguide "The art of digital color" August 23, 2011 https://www.fxguide.com/fxfeatured/the-art-of-digital-color/ (August 24, 2023) Academy Color Encoding System, or ACES, was developed by the Academy of Motion Picture Arts and Sciences. It's a color management pipeline, mainly used in the film industry. 14 The ACES pipeline is divided into several stages: IDT (Input Device Transform), in which the characteristics of the camera, production environment, etc., are removed, and the image is converted to a neutral state. ACES, a stage where rendering and grading in a wide gamut color space called AP1 occurs. Skipping RRT briefly, there's the ODT (Output Device Transform) stage, which converts images to suit the target displays such as sRGB and HDR10. Tone mapping is performed to optimize the image for ODT, the stage after RRT (Reference). So-called film-look effects are applied here. The pipeline consists of these stages. The engine incorporates AP1 color space, RRT, and ODT. ©CAPCOM 14
Relationship Diagram between Engine and DCC RE ENGINE DCC BT.709→AP1 OCIO BT.709→AP1 RRT+ODT ACES RRT+ODT https://www.blackmagicdesign.com/jp/media/images/davinci-resolve-logo The relationship between OpenColorIO and ACES can be summarized as follows: Both the engine and DCC tools such as DaVinci Resolve use OpenColorIO. By using the same ACES config file, various conversions can 15 be easily and consistently performed. ©CAPCOM 15
Incorporating OpenColorIO
Managed in two types of files
• .ocio: OpenColorIO original file
• .ocioc: File that specifies the conversion to use from the .ocio file
.ocio
.ocioc
displays:
ACES:
- !<View> {name: sRGB colorspave: Output – sRGB}
- !<View> {name: DCDM, colorspace: Output –DCDM}
- !<View> {name: DCDM P3D60 Limited, colorspace: Output – DCDM
- !<View> {name: DCDM P3D60 Limmited, colorspace: Output – DCDM
- !<View> {name: P3-D60, colorspace: Output – P3-D60}
“OCIOConfigPath” : “config.ocio”,
“DisplayName” : “ACES”,
“ViewName” : “sRGB”
float3 v4 = OCIO_Lut3d_1. SampleLevel(TrilinearClamp, nextInd, 0) . Rgb;
if (frac. r >= frac. g)
{
if (frac. g >= frac. b)
{
nextInd = baseInd + float3(0. , 0. , 0.0153846154);
float3 v2 = OCIO_lut3d_1.sampleLevel(TrilinearClamp, nextInd, 0). rgb;
nextInd = baseInd + float3(0., 0.0153846154, 0.0153846154);
The following section describes the embedding of OpenColorIO.
Two types of files were used to embed OpenColorIO.
16
The .ocio file is the OpenColorIO file itself, which is originally distributed, etc., and describes various conversions divided into Display
and View.
The .ocio file and attached LUTs are the same as those used in the DCC tool.
And the .ocioc is a file that specifies which of the transformations in the .ocio file to use. The artist edits this file and specifies the
transformations they wishes to use.
In this example, by specifying ACES as Display and sRGB as View in the .ocioc file, the shader code, LUT, and transformation matrix are
generated in advance and used at runtime.
©CAPCOM
16
Incorporating OpenColorIO Specify .ocioc files to use for each pass New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display Once the .ocioc file is created, specify the .ocioc file to be used for each conversion pass. In this example, the .ocioc file specifying the conversion from BT.709 to AP1 is set in GradingPass, which is the conversion 17 before grading. For SDROutputPass, which is the conversion pass to the SDR display, a .ocioc file specifying the conversion between sRGB RRT and ODT is set. ©CAPCOM 17
Incorporating OpenColorIO
Specify the source color space with scene_linear if you want to perform arbitrary gamut
conversion
• Rules can be changed in the OpenColorIO library
.ocio
.ocioc
Roles :
scene_linear:Utility – Linear - - sRGB
texture_paint : ACES - ACEScc
“OCIOConfigPath” : “config_RenderingSpace_sRGB. Ocio”,
“DisplayName” : “ACES”,
“ViewName” : “Transform_AP1”
- !<View> {name : Transform_AP1, colorspace : ACES - ACEScg
{
float4 res = float4(outColor. rgb. r, outColor. rgb. g, outColor. rgb. b, outColor
float4 tmp = res;
res = mul(tmp, float4x4(0.61311571346586224, 0.070197158243296642, 0.0206190
outColor. rgb = float3(res.X, res. Y, res. Z);
outColor. A = res.w;
}
It also supports arbitrary gamut conversion.
We can generate a conversion where the source color space is the color space specified in scene_linear in the .ocio file,
The specified view is the destination color space, and a conversion can be generated.
18
The BT.709 to AP1 conversion performed by GradingPass described above is created in this way.
Because scene_linear in the distributed ACES .ocio file is ACES - ACEScg, I maintain a duplicate .ocio file with scene_linear changed to
BT.709.
©CAPCOM
18
Advantages/Disadvantages of OpenColorIO+ACES Advantages • Various conversions can be easily used/shared • If used properly, the correctness of the conversion is assured • Easy to update • On the engine side, update the OpenColorIO library • Artists update .ocio files Disadvantages • Not very flexible • Due to ACES, RRTs and ODT coming as a set is very unwieldy One of the advantages of using OpenColorIO+ACES is that various conversions can be easily used and shared without the need for engine programmers' help. 19 Once created, settings can be shared with other teams, making it easy to share settings not only between DCC tools, but also between teams. The correctness of the conversion itself is also ensured, so there are no differences between the DCC tools, etc. Another advantage is that once the system is established, it is easy to update it. If the DCC tool is updated, the engine side updates the OpenColorIO library as needed. The artist only needs to update the .ocio file. One disadvantage that can be mentioned is the lack of flexibility. It's hard to intervene in the generated transformations. In particular, the fact that RRTs and ODTs are combined in a set is unwieldy, and causes GUI issues we will discuss later. ©CAPCOM 19
Ease of Introduction Wide gamut before grading • Minimize the range of influence when introducing and changing the color gamut • Some of the lighting is not fully compatible with a wide color gamut... Is there any benefit if only the grading is wide gamut? • It can help prevent over saturation in high-luminance areas New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display The second concept is to facilitate introduction into titles that are in production. We have tried to reduce the impact as much as possible in order to make adoption easier. 20 As a part of this effort, only the grading section is converted to a wide gamut. Although it would be ideal to convert to wide gamut from the lighting stage, but for in-progress titles, the influence on the lighting, textures, etc., is too wide. By providing a wide gamut pass just before grading, wide gamut rendering is not required and the impact of introduction can be minimized. This is partly because wide gamut rendering is not fully supported. Also, the benefits of wide gamut can be obtained even if only the grading is wide gamut.. ©CAPCOM 20
BT.709 AP1 For example, it is difficult to saturate colors in brightly lit areas, 21 ©CAPCOM 21
BT.709 AP1 The result is that in fully saturated areas, the difference in brightness is now visible. 22 ©CAPCOM 22
Facilitates HDR/SDR Output Compatibility Artists grade with HDR first • SDR output produces overall too bright scenes ACES RRT+ODT does not guarantee HDR/SDR picture compatibility • Separate support was needed for SDR output compatibility The third part of the concept is that when artists grade HDR first, SDR output compatibility should be made easy to obtain. The problem was that when grading HDR-first, some scenes were too bright for SDR output. 23 We hoped to achieve HDR/SDR compatibility with ACES RRT+ODT, but it was not fully guaranteed. Separate support was needed for SDR output compatibility. ©CAPCOM 23
HDR/SDR Output Compatibility Will you also grade for SDR output and provide a LUT for SDR output? • Difficult to manage and produce An easier method would be nice Of course, it would be ideal to prepare a separate graded LUT for SDR output. However, this would double the production and management work of LUTs and would be impractical. 24 ©CAPCOM 24
BT.2446 Method C Consists of linear section + log curve section Adjust exposure scaling, log curve starting point, and strength Original implementation assumes HLG, but results are good even with AP1+Linear Reference : Report ITU-R BT.2446-1 - Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2446-1-2021-PDF-E.pdf Therefore, BT.2446, a converter designed for broadcasting, which converts HDR content to SDR content, was used. There are several types of conversion, but we chose Method C. The conversion is simple, consisting of a linear section and a log curve section. 25 Artists adjust the scaling of Exposure and the log curve's start point and strength. The original implementation is intended for HLG, but results were good even with AP1+Linear. ©CAPCOM 25
BT.2446 Method C Hue preservation was made optional for the artist • The degree of saturation can also be controlled • Some artists preferred saturation Hue Preservation ON Hue Preservation OFF We made the hue preservation function in BT.2446 Method C optional after requests from artists. The degree of saturation when it's enabled is adjustable, too. 26 ©CAPCOM 26
BT.2446 OFF BT.2446 ON Here is an example of applying BT.2446. Although it can't compare with the HDR output image, we were able to match the impression except for the highlights. 27 ©CAPCOM 27
Old Pipeline SDR Output Here is a comparison of the final output of the new and old pipelines. We compared them in several situations. 28 ©CAPCOM 28
New Pipeline SDR Output 29 ©CAPCOM 29
Old Pipeline SDR Output 30 ©CAPCOM 30
New Pipeline SDR Output 31 ©CAPCOM 31
Old Pipeline SDR Output 32 ©CAPCOM 32
New Pipeline SDR Output 33 ©CAPCOM 33
GUI Support Put GUI rendering later in the pipeline to prevent it being affected by RRT Now blending the GUI with the scene becomes a problem Old Pipeline New Pipeline Now, about the changes we made to GUI when migrating to the new pipeline. When outputting SDR and HDR using OpenColorIO, ACES' RRT and ODT are processed as a set at a time. 34 RRT includes tone mapping processing to obtain a film look. If RRT+ODT is applied after drawing the GUI, the brightness of the GUI will be affected. Therefore, we changed the order in which the GUI is drawn to be after RRT+ODT is done. However, doing so causes problems with blending the GUI and the scene. This is because the scene is already converted to its respective display space, not linear. If the GUI is opaque, we can simply convert it to the display space and draw it as-is. But if the GUI uses alpha blending, etc., simply blending it with the scene image will result in an unintended appearance. ©CAPCOM 34
Old pipeline appearance Here is how the old pipeline looks. On the far left is an opaque GUI. The GUI on the far right uses alpha blending. 35 ©CAPCOM 35
New pipeline — normal blend Here is how the new pipeline looks. There is no difference in the opaque GUI, but the GUI with alpha blending looks different. 36 ©CAPCOM 36
GUI Support Opaque/alpha blend GUI drawn in BT.709+Linear on working buffer • Format is R8G8G8A8Unorm sRGB because A channel is required Blend with Compute Shader • Opaque: Convert to display space and overwrite • Alpha Blend: Scene image is turned back to Linear, blended with the GUI, and output to display space again So, the opaque/alpha-blended GUI is drawn in BT.709+Linear on a working buffer, and we use a Compute Shader to convert it to display space while compositing it with the scene image. 37 If it is an opaque GUI, it's converted to display space and overwritten. If the GUI is an alpha-blended GUI, the scene image is converted back to linear and then blended with the GUI, and output back to display space. The reason why opaque GUIs are also passed through the work buffer is to normalize the results between opaque and alpha-blended GUIs. ©CAPCOM 37
Old pipeline appearance Let's compare the results of this process. Here is how the old pipeline looks. 38 ©CAPCOM 38
New pipeline — after Here is a comparison of the results. Here is the result using the working buffer process in the new pipeline 39 ©CAPCOM 39
New pipeline — before 40 ©CAPCOM 40
What about additive blending? Artistically, the result of compositing with the scene image is not very important Only additive blending is directly drawn • Visual adjustments were requested for the pipeline changeover As for additive blending—and this is just our GUI artists' opinion—we don't worry too much about the consistency of the blended result with the scene image. So, for additive blending, we just draw directly on the scene. 41 We have asked them to adjust the appearance of the GUI for additive compositing when migrating to the pipeline. ©CAPCOM 41
RRT+ODT is slow on PS4, about 2.2 ms at 1080p High speed is essential for things that are constantly being processed Next, let's move on to optimization. For the RRT+ODT pass, the shader code generated from OpenColorIO and LUTs, takes about 2.2 ms on PS4 at 1080p. 42 RRT+ODT is constantly being processed during gameplay, so that 2.2 ms load is unbearable. ©CAPCOM 42
Runtime LUT Baking Bake RRT+ODT + brightness adjustment process Size: 64x64x64 Format: R10G10B10A2Unorm • After RRT+ODT we don't exceed 1, so Unorm is OK Chose ACEScc for the shaper • Selected because no difference was visually apparent Shortened from 2.2 ms to about 0.4 ms in the same environment • Baking itself takes about 0.12 ms So we decided to bake RRT+ODT, and while we were at it, the existing brightness adjustment, the LUT at runtime. When the .ocioc file changes or changes in the brightness adjustment parameters are detected, the LUT baking process 43runs for one frame. The size of the LUT is 64x64x64 and the format is R10G10B10A2 Unorm. The values after RRT+ODT are within 0~1, so the Unorm format can be used. ACEScc was selected as the shaper to the LUT because no difference was visually apparent. This reduced the RRT+ODT from 2.2 ms to about 0.4 ms on PS4 at 1080p. The baking process, which runs for just one frame after a change is detected, only takes about 0.12 ms. ©CAPCOM 43
Appendix : Removal of gamut conversion pass Removed BT.709 to AP1 conversion working as an independent pass • Integrated in the Exposure or Color Grading pass, when they're used New Pipeline BT.709 Linear HDR Scene Render HDR PostEffect OCIO GradingPass ( BT.709 AP1 ) AP1 Linear ACEScc ACEScct LUT Exposure HDR ColorGrading BT.2446 OCIO SDR OutputPass ( RRT and ODT ) UI ( sRGB ) SDR Display OCIO HDR OutputPass ( RRT and ODT ) UI ( HDR10 ) HDR Display Also the pass using the shaders generated from OpenColorIO for just gamut conversion was taking about 0.2ms on PS4Base at 1080p. We integrated it such that it's included in the the Exposure or Color Grading passes. ©CAPCOM 44 44
Remove gamut conversion passs
Conversion matrices are obtained from the OpenColorIO library
• At first I was parsing and retrieving the generated shader code, but quit when I wanted detailed
transformations such as ToXYZ matrices
//変換行列の取得
auto matList = std : : vector<OCIOMat4>();
for (int i = 0; i < transformNum; ++i)
{
const auto transformFromMetaData = groupTransform->getTransform(i);
const auto type = transformFromMetaData->getTransformType();
if (type == OCIO : : TransformType : : TRANSFORM_TYPE_MATRIX)
{
const auto transform = OCIO : : DynamicPtrCast<const OCIO : : MatrixTransform>(transformFromMetaData) ;
double m44[16] ;
transform->getMatrix(m44) ;
matList. push_back (m44) ;
}
}
FOCIOTransform
▶R0
▶R1
▶R2
{…}
{0.616116, 0.0701972, 0.0206191, 0}
{0.33951, 0.916355, 0.10958, 0}
{0.0473748, 0.0134491, 0.869801, 0}
As for the gamut conversion matrices, we initially obtained the matrices by parsing the generated shader code, in some cases, we
needed not only the matrix to convert from the current color space to the target color space, but also the matrix to convert to XYZ
space, however, in some cases, not only the matrix to convert the current color space to the target color space, but also
45the matrix to
convert to XYZ space was needed, so it was obtained via the OpenColorIO library API.
©CAPCOM
45
Remove LinearToACEScc branch Negative conditional branches can be ignored for the purpose of shaping to the LUT if (value <= 0 && !ignoreNegativeValue) return -0.358447; else if (value < pow(2, -15)) return 0.554795 + 0.0570776 * log2(pow(2, -16) + value / 2); else return 0.554795 + 0.0570776 * log2(value) ; 46 ©CAPCOM 46
Remove ACESccToLinear branch Ignore conditions due to negative values as well if (value <= -0.30137 && !ignoreRestrictCondition) return pow(2, 17.52 * value – 8.72) – pow(2, -15); else if (value < 1.468 || igonoreRestrictCondition) return pow(2, 17.52 * value – 9.72); else return 65504; 47 ©CAPCOM 47
Artists’ Grading Environment Live grading • Output AP1+ACEScc/ACEScct video from engine and stream to DCC tool LUT Creation Flow RE ENGINE AP1 + ACEScc / ACEScct + Exposure Capture board DaVinci Resolve Grading Export LUT Last but not least, I would like to conclude with an introduction to the artist’s grading environment and our future prospects. Our artists' grading environment is based on the live grading method, in which the engine footage is sent to DaVinci Resolve and 48 grading is performed while the game is running. The advantage of this method is that it is excellent for iterating on the game, since you can adjust and check the appearance of the game while it is running. The process of LUT creation is as follows: The PC where live grading is performed is equipped with an addition graphics card and a capture card. The engine is AP1+ACEScc/ACEScct, Exposure is applied, and then the engine outputs the video immediately before the grading process. Davinci Resolve receives the video sent via the capture card and uses that video as the source for grading. ©CAPCOM 48
Future Outlook Utilization of Reference Gamut Compression (RGC) • Selectable as Builtin Transform since OpenColorIO 2.1.0 Compresses out-of-gamut colors into the gamut • Compression can be controlled by parameters RGC OFF RGC ON ACES Technical Documentation "ACES Reference Gamut Compression Implementation Guide" June 16, 2022 https://docs.acescentral.com/guides/rgc-implementation/#parametricversion-implementation-specifications(August 24, 2023) In the future, we'd like to make use of ACES Reference Gamut Compression. Reference Gamut Compression is an out-of-gamut color correction function that was originally included in ACES. It was 49created to replace Blue Light Artifact LMT (Look Modification Transform). It is available in OpenColorIO from version 2.1.0 and can be used by selecting it from Builtin Transform. The image on the right side of the slide shows the settings screen of DaVinci Resolve, where various parameters can be used to control the compression threshold and degree. ©CAPCOM 49
Future Outlook Wide color gamut in lighting • The foundation for conversion of albedo and light is somewhat complete Handling of pre-baked lighting assets such as IBLs, light probes, and local cube maps is not yet decided on • Must consider compatibility with past titles Another future consideration is wide color gamut lighting. The foundation for converting albedo and lights has been established, but the handling of pre-baked lighting assets such 50 as IBLs and light probes has not yet been developed. Compatibility, not only with titles currently in production, but with past titles as well, needs to be considered. We would like to resolve this issue and support wide color gamut lighting. That is all I have to say about HDRGrading. Thank you very much. ©CAPCOM 50
So easy! So fluffy! Introducing: Shell Fur! In this session, we will present how we worked on fur expressions in RE ENGINE. ©CAPCOM 51
Table of Contents Explanation of the basic concept of Shell Fur with regard to fur methodology Introduction of the Shell Fur functionality required for high quality fur Shell Fur optimization method Final thoughts This is the agenda. We'll describe the fur representation method, introduce the functionality, give some optimization examples, and then sum everything up. 52 ©CAPCOM 52
Table of Contents Explanation of the basic concept of Shell Fur with regard to fur methodology Introduction of the Shell Fur functionality required for high quality fur Shell Fur optimization method Final thoughts First, I will explain which method is used for fur representation and why we chose it. I will then discuss the basic specifications of Shell Fur. 53 ©CAPCOM 53
Fur Methodology Fur requirements • In the traditional production flow, artists made stacked meshes We want to programmatically grow the fur • We want to automatically control processing load • We want a lot of fur on a lot of furry characters The artist had to manually create these layered meshes Traditionally, artists create fur by manually stacking meshes. The images show the main character's jacket from Resident Evil Village and a Popo from Monster Hunter Rise. This method increased memory usage, and made it difficult to control processing loads. 54 Our requirements for replacing this system were: Programmatic fur, with automatic processing cost control, which can display a lot of fur, on a large number of characters. A low memory, efficient method had to be chosen. ©CAPCOM 54
Fur Methodology Shell Fur Layers meshes and uses AlphaTest to draw hairs in slices to represent fur HairCard Grows polygon strips on the mesh surface and uses AlphaTest to draw hair in bundles Strand Draws individual hairs one by one (More details in the " Resident Evil 4 Hair Discussion" lectures in this conference) Now, there are several methods of fur representation in real-time rendering. Here are some examples: 55 The first one is shell fur, which is the subject of this session. It layers meshes, and the fur is represented as slices by removing sections with AlphaTest. Next, HairCard. It's used mainly for representing hair. Polygons in the form of strips are grown on the surface of the mesh, and the hair is drawn in bundles to represent fur. Finally, Strand. This method draws hairs one by one. The details will be presented in the "Resident Evil 4 Hair Discussion" sessions, so I will only touch on them here. ©CAPCOM 55
Fur Methodology Shell Fur Layers meshes and uses AlphaTest to draw hairs in slices to represent fur HairCard ratio is goodand uses AlphaTest to draw hair in bundles Grows polygon😄Cost-to-volume strips on the mesh surface 😄Can reuse the vertex buffer of the source mesh Strand 😞Worried that the slices will be obvious Draws individual hairs one by one (More details in the "StrandLighting" and "StrandRaster" lectures in this conference) We chose to use Shell Fur in this case. There were several reasons, but the main one was the ability to reuse the vertex buffer of the original mesh. 56 It's also the most memory-friendly method, which was a requirement since there is a demand to output a large amount of fur. ©CAPCOM 56
Fur Methodology Shell Fur • Layers meshes and uses AlphaTest to draw hairs in slices to represent fur Let's review the specifications of Shell Fur. The mechanism is fairly simple. Mesh is layered like mille-feuille and drawn. 57 The layered meshes should be extruded in the normal direction. The extruded layered meshes are then cut out with AlphaTest, and fur is represented as one slice per mesh. As you can see in the image, the slices are noticeable when viewed from the side due to this method. This is the biggest drawback of Shell Fur. ©CAPCOM 57
History of Shell Fur In use since PS2, still used on next-generation hardware 『3Dゲームファンのための「ワンダと巨像」グラフィックス講座』 GAME Watch 2005年12月7日 https://game.watch.impress.co.jp/docs/20051207/3dwa.htm (アクセス日 2023/7/6) 『ワンダと巨像』 ソニー・インタラクティブエンタテインメント 2005年 『『ラチェット&クランク パラレル・トラブル』開発秘話——新キャラクターや新システムはこうして 生まれた!』 PlayStation.Blog 2021年5月12日 https://blog.ja.playstation.com/2021/05/12/20210512-ratchet/ (アクセス日 2023/7/6) 『ラチェット&クランク パラレル・トラブル』 ソニー・インタラクティブエンタテインメント 2021年 Let's touch on the history of Shell Fur for a moment. It has a long history and was used as a feasible fur expression from hardware that could not use programmable shaders. 58 A famous example is from "Shadow of the Colossus." This method was used to cover giant statues with fur. It's an old method, but still in use on current hardware. The example on the right is "Ratchet & Clank: Rift Apart." The fur growing on Ratchet’s body is drawn by the same mechanism as Shell Fur. You can't see the slices at all. The basic concept is the same, but the number of layered meshes is much larger due to improved hardware specs. We aimed to match this quality with a large number of characters. ©CAPCOM 58
Table of Contents Explanation of the basic concept of Shell Fur with regard to fur methodology Introduction of the Shell Fur functionality required for high quality fur Shell Fur optimization method Final thoughts Although Shell Fur is cheap, we can't afford to have it look or feel cheap. Next, we will introduce the features we have prepared to ensure high quality. ©CAPCOM 59 59
Shell Fur Feature Introduction • Dynamically changing layer count • Can make any mesh fluffy • Grooming function for fur direction • Simulation functionality • Vertex shader support for a wide range of artist requests We'll present five of RE ENGINE's Shell Fur features. 60 ©CAPCOM 60
Shell Fur Feature Introduction Dynamically changing layer count • Increase or decrease the number of layers depending on the camera distance • Can withstand camera close-ups • Automatically scales processing costs so you can confidently use it in multiple places in the same scene In implementing Shell Fur, the dynamic layer count function was top priority. To mitigate the visible slices—Shell Fur's weak point—we simply increase the layer count. 61 When the camera is close, the maximum number of layers is used, and as the distance increases, the number of layers is reduced in order to reduce processing load. This is the most important feature for aggressively placing multiple furred objects in a scene, as it automatically scales the processing costs. ©CAPCOM 61
Shell Fur Feature Introduction Dynamically changing layer count • We tested three reduction methods Reduction from hair ends Reduction from hair roots Skip every second one when reducing 1 5 1 2 4 4 3 3 2 4 2 5 5 1 3 😄You can't see the slices 😞You lose the silhouette 😄The silhouette stays together 😞You can see some gaps 😄The silhouette is maintained to a degree 😞Slices are still quite visible We tested three layer reduction methods for implementation. They are: Reducing from the hair tips, reducing from the root, and removing every second layer. 62 In the "skip one layer" type, we started from the tips, keeping the odd-numbered layers and eliminating the even-numbered layers until we reached the root of the hair. At the time of implementation, I thought that the third method of thinning would be the best for avoiding visible slices while maintaining the fur's silhouette. As it turned out, however, the first method of removing layers from the tips of the hairs was the most visually pleasing. In the Shell Fur method, the biggest problem is the slice visibility, and as far as the silhouette is concerned, as long as the fuzzy feeling remains, there is no problem. I hope this will be helpful when implementing the method. ©CAPCOM 62
Shell Fur Feature Introduction Can make any mesh fluffy • Shell Fur is implemented as a Component that can be added to GameObjects • All you have to do is add the Component, and the mesh will become furry; no other steps Note: The assets in the images are using a Felyne from Monster Hunter Rise; Fur functionality was not used on Felynes in the actual game Shell Fur is designed to be as easy to use as possible. It is provided as a component that can be added to GameObjects, making it easy to set up. 63 The original mesh assets do not need to be modified, and it can be used on any kind of mesh. The image shows a Felyne from Monster Hunter Rise with fur added as a test. ©CAPCOM 63
Shell Fur Feature Introduction Can make any mesh fluffy • The Shell Fur function itself only provides the layers • Using AlphaTest to make it look like fur is done on the Shader side Just added Shell Fur functionality After using AlphaTest in a shader to cut the hairs out We separate the functions. The Shell Fur component is responsible for layering meshes, while the material side is responsible for slicing the hairs into 64 rings with AlphaTest. Artists already had their own materials for layered meshes, so we made it possible to use those materials. ©CAPCOM 64
Shell Fur Feature Introduction Grooming function for fur direction Simply adding fur is far from a complete representation. In order to complete the fur texture, we've created a grooming tool and implemented a painting function on the engine. 65 The details of the paint tool function will be presented at the conference, in the session "Creating a Real-Time, In-Engine Painting Tool." Here is a short introductory video. ©CAPCOM 65
Shell Fur Feature Introduction The grooming tool allows you to paint the length and flow of fur. It lets you adjust detailed fur parameters, such as fur movement during simulation, in-engine. ©CAPCOM 66 66
Shell Fur Feature Introduction Grooming function for fur direction • Grooming information is stored in a texture R FlowX G FlowY B FurLength • Bending is controlled by two parameters • Gentle bend toward the end of the hair • Bent only at the root Flow direction Flow direction The grooming tool provides a paint function for making grooming textures. The flow and fur length information is embedded in the texture. The vertex shader fetches the grooming texture and animates the vertices in the flow direction. 67 Two methods of bending the fur are available: One that bends the hair itself, and one that bends only at the root of the fur. ©CAPCOM 67
Shell Fur Feature Introduction Simulation functionality • Simulation influenced by motion, gravity, wind, etc. • Simulation is considered as one hair per vertex and implemented referencing XPBD [MBMJ07] Position based dynamics, Journal of Visual Communication and Image Representation, Volume 18, Issue 2, April, 2007 https://matthias-research.github.io/pages/publications/posBasedDyn.pdf [MMN16] Miles Macklin, Matthias Muller, Nuttapong Chentanez: XPBD: position-based simulation of compliant constrained dynamics, Motion, Interaction and Games (MIG 2016) https://matthias-research.github.io/pages/publications/XPBD.pdf • Collisions between the mesh surface and intra-hair collisions are ignored I'll introduce the simulation functionality. The simulation is influenced by motion, gravity, wind, etc. This can be turned on and off. 68 The implementation is very simple. The simulation calculation is implemented referencing extended Position Based Dynamics. Since we place importance on low processing cost, we ignore inter-hair and hair-skin collisions. This is fine because the simulation is designed for short body hairs. ©CAPCOM 68
Shell Fur Feature Introduction Simulation functionality Set hair direction Keep current velocity, modify position according to spring coefficient and damping value Constraint Fur simulation The hairs are constrained from the root so we can just solve them from the root in order Mesh Surface Briefly, here is how we implemented it. Each vertex retains its current velocity. Mass is ignored, as it is equal within a single hair. 69 The initial position, from the grooming information, is used as a constraint, and position is corrected to it according to the defined spring coefficient and damping value. The constraint condition is only in a single direction, from the root to the tip of the hair, so we just solve them in order from the root, and there is no problem. ©CAPCOM 69
Shell Fur Feature Introduction Vertex shader support for a wide range of artist requests • Grooming textures alone only supports single-directional tilt, but the vertex shader can manipulate on a per-layer basis expanding the range of expression • In the shader editor of the engine functions, the shader is treated as a vertex shader, but Compute Shaders are actually used under the hood • The output position can be used as a constraint condition in the simulation The fur pixel shader had been freely customizable by the artist in the shader editor, but there was demand for more freedom in adjusting the fur. 70 The grooming function alone could only support a single directional tilt for each hair, but by supporting vertex shaders, it is now possible to manipulate each layer. In the engine's shader editor, we treat them as vertex shaders, but for Shell Fur, we actually use a compute shader to deform the vertices. Also, the position of the deformed vertex in the vertex shader is treated as a constraint condition and can be used in conjunction with simulation. ©CAPCOM 70
Shell Fur Feature Introduction Results This is the result of preparing these features and having the artists utilize them. I think we were able to satisfy to some extent the request for high quality, fluffy, fuzzy fur, without obvious slices. Also, 71 these are being actively used in scenes. ©CAPCOM 71
Table of Contents Explanation of the basic concept of Shell Fur with regard to fur methodology Introduction of the Shell Fur functionality required for high quality fur Shell Fur optimization method Final thoughts I will talk about optimization. 72 ©CAPCOM 72
Shell Fur Optimization Methods • Batch drawing using Instancing and MDI • OcclusionCulling • Memory reduction We will discuss three separate topics: batch drawing, occlusion culling, and memory reduction. 73 ©CAPCOM 73
Shell Fur Optimization Methods Batch drawing using Instancing and MDI • Instancing Shell Fur is great for simple mesh replication instancing. Use InstanceCount in ArgumentBuffer to set the number of layers, and a vertex shader to offset to hair tips draw from the tip layer to avoid overdraw as much as possible struct DrawIndexedInstancedArguments { uint IndexCountPerInstance; uint InstanceCount; uint StartIndexLocation; uint BaseVertexLocation; uint StartInstanceLocation; }; Only InstanceCount is changeable Other parameters are the same as for normal drawing Instance draw or multi-indirect draw is used for batch drawing. Shell Fur is great for instance-draw drawing because it only requires the original mesh to be inflated out. 74 We set the instance count for the number of layers, and offset it in the direction of the tip of the hair using the vertex shader. The order of drawing is from the tips of the hairs inward, to avoid overdrawing as much as possible. ©CAPCOM 74
Shell Fur Optimization Methods Batch drawing using Instancing and MDI • Instancing 😄Instancing is memory friendly because the original mesh can be reused 😞High vertex shader cost since grooming calculation results cannot be cached • MDI Allocate a vertex buffer for the number of stacked sheets and cache the results of grooming calculations Separate ArgumentBuffer for each layer and draw with MultiDraw 😄Grooming calculations can be cached, which is friendly to vertex shader costs 😞Memory is tight because a vertex buffer is allocated for each layer Both types are available Instancing is memory-friendly, but it can't cache the grooming vertex offsets. The vertex shader has to do the grooming calculations each time, which increases processing cost. 75 If you make a vertex buffer for each layer and cache the grooming calculation, you can reduce vertex shader cost. But, allocating that many vertex buffers is memory-intensive. So, we have two options: a memory-friendly type using instance draws, and a processing cost-friendly type that uses multi-indirect draws. When the simulation function is used, it is forced to draw in multi-indirect draw mode because it is necessary to keep layers' vertex data. ©CAPCOM 75
Shell Fur Optimization Methods MDI Organize ArgumentBuffers by layer, specify the number of layers to draw in MaxCommandCount at DrawCall Normal mesh vertex buffer layout LOD0 MatA LOD1 MatB MatA LOD2 MatB MatA ‥ MatB ‥ Vertex buffer layout for MDI LOD0 Shell0 MatA LOD1 Shell1 MatB MatA MatB Shell2 MatA LOD0_MatA DrawArgs DrawArgs Shell0 MatB MatA MatB Shell1 MatA LOD0_MatB DrawArgs DrawArgs DrawArgs ‥ Shell2 MatB MatA MatB LOD1_MatA DrawArgs DrawArgs DrawArgs ‥ ‥ ‥ DrawArgs ‥ This section describes the vertex buffer layout and the argument buffer layout for multi-indirect draws. The number of vertex buffers needed is equal to the number of LODs and materials, so the buffers are allocated as shown 76 in the matrix in the middle of the figure. The top one is the vertex buffer layout of the normal mesh. Prepare the argument buffers as an array of LOD and number of materials x number of layers, and group them in layer order using the same LOD and materials. Then, at the time of draw call, set the argument buffer offset for the LOD and material you want to draw, and specify the number of layers to draw in the max command count. Now you can draw Shell Fur in one draw. ©CAPCOM 76
Shell Fur Optimization Methods Batch drawing using Instancing and MDI Instancing MDI VRAM Consumption (16 layers) 0 MB 3.18 MB VRAM Consumption (32 layers) 0 MB 7.72 MB GPU Load (16 layers) 1.2 ms 1.0 ms GPU load (32 layers) 2.5 ms 2.0 ms Measuring conditions • Number of vertices 6097 • Measured when drawing all layers • Resolution 1920x1080 • Measurement platform PS4 Here are the results of the measurements by type. Instance draws do not require any additional memory because the vertex buffer of the mesh asset is available. 77 Multi-indirect draw consumes 3 MB of VRAM for a vertex count of about 6000 and a layer count of 16. Normally, this would be about 8 MB, but certain measures are taken to reduce memory consumption. Memory reduction will be described later. The instance draw is 20~25% more GPU-intensive because vertex calculations cannot be cached. This measurement was taken with all layers drawn and with a large screen occupancy, so it is quite intensive. ©CAPCOM 77
Shell Fur Optimization Methods OcclusionCulling • Expand the bounding box from the base mesh by the length of the hairs • OcclusionCulling is particularly effective for Shell Fur because of the large number of vertices and AlphaTest • While culling, calculate the number of layers according to the distance and put it in CountBuffer (In case of instansing, it is put into InstanceCount) Shell Fur tends to be expensive relative to the drawing area, since the vertex count tends to be high and the alpha test drawing is massive, so occlusion culling is important. 78 Disabling the drawing in advance is highly effective. The bounding box to be checked can be extended to the maximum length of the fur to prevent overflow. The number of layers based on distance can be determined during occlusion culling and put in CountBuffer, offloading more work to the GPU. ©CAPCOM 78
Shell Fur Optimization Methods Memory reduction • If you just allocate vertex buffers for each layer, it costs layers x original buffers... But the layer count changes with camera distance, so you only need to allocate the max possible for each LOD LOD LOD0 Layer Count 8 LOD1 7 6 LOD2 5 4 3 LOD3 2 1 0 Note: Number of layers does not change between LODs 0 and 1. • Allocate only the VertexBuffer of the part you want to grow per material ...measures like the above cut memory usage to half of the initial amount When using multi-indirect draw, it is inevitable to reserve memory, but the memory damage was significant and needed to be addressed. 79 Shell Fur's layer count changes based on camera distance, so we linked it with the LOD and reserved vertex buffers for the minimum number of layers required for each LOD. If the maximum layer count is 8, LOD0 is 8, LOD1 is 7, LOD2 is 4, and so on. Also, fur can set per-material, so buffers are allocated only for the vertices of the relevant parts. After taking other minor measures, we were able to reduce the amount of memory allocation to less than one-half of what it was when we first started using the system. ©CAPCOM 79
Table of Contents Explanation of the basic concept of Shell Fur with regard to fur methodology Introduction of the Shell Fur functionality required for high quality fur Shell Fur optimization methods Final thoughts Now for a summary. 80 ©CAPCOM 80
Final Thoughts It's an old technique, but with variable layer count and the grooming system, etc., we achieved high quality fur expression Outlook Currently a fur-only function, but the layered drawing technique could also be used for moss and other separate expressions Shell Fur is an old technique, but we were able to express high-quality fur by varying the number of layers, expressing the fur texture through grooming, and so on. 81 Currently it's specialized for fur, but I think the layered drawing technique has the potential to be used for other types of expression, such as moss and roughness. I hope this session will be of some help to you. That concludes my presentation. ©CAPCOM 81
Signed Distance Field Introduction and Optimization The next topic is the introduction and optimization of Signed Distance Field. ©CAPCOM 82
Table of Contents Why Signed Distance Field (SDF)? Examples of Utilization Data Structure Optimization Future Outlook The presentation process is as follows. First, I will discuss how we came to introduce Signed Distance Field. 83 Next, I will introduce specific examples of its use. After that, I will explain how SDF is represented as data, challenges we faced and how we optimized them. Finally, we will discuss future prospects. ©CAPCOM 83
Background In the past, RE ENGINE used a lot of pre-baked assets, which assumed that light sources and objects would not move • Sparse Shadow Tree, Light Map, etc. In order to support games where the time of day changes, new methods were needed • Ray Tracing or Signed Distance Field First, let me explain the background to the introduction of the system. Many of the games made with RE ENGINE up to now have had static time of day. This means we could use things such84 as Sparse Shadow Tree, Light Map, and so on, which assumed the light sources were immobile. However, in order to create an open world game with time variation, it has become necessary to provide a new method that does not require pre-baking. Recently, real-time ray tracing has been attracting attention as a means of lighting that does not require pre-baking. On the other hand, the reality is that it is difficult to take full advantage of ray tracing with the performance of today's prevalent hardware. Instead, we decided to develop a lightweight method using Signed Distance Field. ©CAPCOM 84
What is Signed Distance Field (SDF)? A field which defines the distance to an object; positive if the surface is close, negative if the back side is close Surface (zero) Outside (positive value) Inside (negative value) First, a brief description of Signed Distance Fields. Because of its length, it will be abbreviated as SDF hereafter. A SDF is a field in which the distance to the nearest surface is defined for a arbitrary coordinates. 85 It is positive if outside the object, negative if inside, and exactly zero at the surface. ©CAPCOM 85
Ray March Using SDF Just repeat this simple operation: Get a value from the SDF and advance the ray by that distance Light From the SDF, the maximum distance that can be advanced without penetrating an object can be obtained. This can be used to efficiently perform ray marching. 86 For example, assume that the ray is advancing to the tip of this arrow. The range of motion indicated by the SDF at that point is the range of this circle. Since the ray is going to the light source, we advance the ray to this point. In the next step, the distance is obtained from the SDF and the ray is advanced by the same amount. Repeat the same operation. At this point, the value retrieved from the SDF is the shortest distance to the wall. If we continue to advance the ray, it collided with the wall exactly. At the wall's surface, the SDF value is 0—a collision—meaning the light source is blocked. This simple and efficient ray marching method is used for dynamic shadows and so on. ©CAPCOM 86
Examples of SDF Utilization SDF Shadow SDF Ambient Occlusion Other The following are examples of SDF use within RE ENGINE: 87 ©CAPCOM 87
SDF Shadow Dynamically generate shadows in the distance Assumed range is approximately 4,000 meters First, let's discuss SDF Shadow. SDF Shadow is positioned as a dynamic shadow for distant scenery that can be used with Directional Lights. 88 In response to requests from engine users, SDF Shadow was designed to be able to process images up to 4000 meters away. ©CAPCOM 88
SDF Shadow Consists of a 3-step Compute Shader • Fixed number of draw calls regardless of the number of instances Async Compute behind the scenes drawing Shadow Map Draw Shadow Maps SDF Shadow Tile Classification Instance Culling Draw Shadow SDF Shadow consists of three main steps. First, each tile on the screen is examined to determine if it is covered by SDF Shadow. 89 Next, the ray is extended from that tile in the direction of the light, enumerating only those SDFs that may collide. Finally, the SDF Shadow is drawn based on this information. Here, starting from the visible pixels, ray march in the direction of the light source using the SDF as a cue to check for shading. Only these three compute shaders are used regardless of the object count, reducing draw calls. Plus, they are executed in Async Compute while processing Shadow Maps, etc. Therefore, another advantage is that it is easy to hide the processing load. ©CAPCOM 89
SDF Shadow Cascade Shadow Map • • • • High accuracy in the near field Accuracy is poor in long distance view unless the resolution is very large Too many vertices are included and processing tends to be slow Areas not visible from the camera are easily wasted SDF Shadow • Only visible pixels are processed, so there is less waste • No extreme blurring even when the image is extended into the distance • Lack of detail is noticeable in the near view because it depends on the accuracy of SDF I'll compare with Cascade Shadow Map as a way to represent a wide range of shadows. Cascade Shadow Map can render very detailed shadows in the near field. 90 However, if we try to apply the shadow map to a distance of 4,000 meters away, the far-field areas are very blurred. In addition, the process tends to be slow because of the large number of vertices required. On the other hand, with SDF Shadow, only the visible pixels are executed. The advantage of SDF Shadow is that it is not intensive even if it is updated every frame, since the number of pixels to be processed is small. In addition, since it does not go through the shadow map, it can maintain a certain level of accuracy even in distant scenes. The weakness is that it relies on the accuracy of the SDF, the impression of loss of detail is noticeable in the near view. ©CAPCOM 90
SDF Shadow Cascade Shadow Map for near view and SDF Shadow for far view Based on these strengths and weaknesses, two types of shadows are used in RE ENGINE. The blue area in this image is the SDF Shadow coverage area. Cascade Shadow is applied to the area in front of it that's 91closer to us. Since the SDF shadow is designed to be used for distant scenery, it's a fuzzy shadow that's using the low-precision SDF. The soft shadows are not rough when viewed from a distance, but they are not suitable for close-ups. Cascade Shadow Map is used in the foreground instead. This also resulted in improved quality of the shadows in the near view. ©CAPCOM 91
SDF Shadow The drawing area of SDF Shadow is half of the screen at most • In a scene with 10,000 instances, 2.5 ms in wide open areas (PS4) The more the distant view is obstructed, the higher the speed • Within 0.5 ms in the forest and city When viewed at the eye level of the player character, SDF Shadow’s drawing area is about half of the screen at the widest point. As for performance at that time, the processing time of SDF Shadow is roughly 2.5 ms on PS4. 92 This is with Async Compute turned off, so the impact on actual performance is a bit more limited. In addition, in situations where the distant view is mostly shielded, such as in a city or in a forest, the SDF Shadow area is reduced, so the processing time is about 0.5 ms. ©CAPCOM 92
SDF Ambient Occlusion Capable of getting occlusion information that's not in screen space Approx. 1 ms on PS4 (combined with SSAO) • Ray march distance is about 2 meters • 540p (half the height and width of the screen resolution) Next, I'll introduce Ambient Occlusion using SDF. We achieved high quality AO by using information on the surrounding environment obtained from SDF. 93 Processed together with SSAO, the processing time is 1 ms on PS4. At this time, the ray march distance was 2 meters and the resolution was half the height and width of 1080p. ©CAPCOM 93
SDF Ambient Occlusion Multiple rays are cast from the object surface in a hemispherical direction to determine the degree of occlusion • 3 per thread, 4x4 Deinterleave ⇒ 48 directions in total 0.5 Not just a binary 0 or 1 ray march result; partial shadowing, taking into account the distance traveled along the way 1 0 In AO, multiple rays are cast from the surface of the object in a hemispheric direction. The light and dark areas are painted according to how much of the rays are occluded. 94 Three rays are cast in one thread and processed in a 4x4 Deinterleave. Therefore, we are looking at 48 directions of occlusion. We want smooth shadowing even with a small ray count, so the ray march result is not just 0 or 1 for collision or not, but a penumbra taking into account the distance traveled. ©CAPCOM 94
SDF Ambient Occlusion The final result is blurred and the light and dark tones are smoothed • The strength of this method is that it does not rely on Temporal Blending and can produce a stable image within a single frame At the time of Interleave resolution, noise is noticeable as shown on the left. A blur is applied together with SSAO, and the result is smooth as shown on the right. 95 Temporal blending is often used when a large number of rays are required. In this case, however, we were able to achieve smooth AO without resorting to it. The advantage of this method is that the image is completed within a single frame and the result is always stable. ©CAPCOM 95
SDF Ambient Occlusion If you want to increase the distance of the ray march, more sample directions are needed • Currently, the maximum length is assumed to be 3 meters in consideration of performance and quality Ray march distance 2 m Ray march distance 7 m However, there is a drawback in the behavior when the ray march distance is increased. When examining long distances, the processing time increases and the number of rays becomes insufficient, resulting 96 in quality problems. In the image on the right, where the ray march distance is increased, unevenness of color is noticeable at a distance from the tree. Currently, the ray march distance is within 3 meters, under assumption it will be used under shorter conditions. ©CAPCOM 96
Other Global Illumination light leakage prevention Use within shaders created by RE ENGINE users We also use SDF for a variety of other purposes. For example, it has been used to prevent light leakage in Global Illumination. 97 We also expose interfaces for shaders implemented by engine users. In this screenshot, SDF is used to represent whitecaps at the edge of a wave. In this way, SDF can be used in shaders for simple hit detection, color painting based on distance, and other applications. ©CAPCOM 97
Data Structure Each mesh has an SDF as a 3D texture Pre-baked on RE ENGINE as generation takes time Since it is a 3D texture thin shapes and curves are difficult to reproduce Next, we explain how SDF is represented as data in RE ENGINE. Each mesh asset has one SDF as a 3D texture. 98 Rays are sent in multiple directions from the center of each voxel using ray tracing. The distance of the shortest ray hit is stored in that voxel. Since the SDF generation process is time-consuming, it is not performed in real time. The SDF is pre-baked on the RE ENGINE. The image on the right is a debug display of the SDF. Cast a ray from the viewpoint and ray march it to near the surface, the color coding is based on the number of steps it took to collide. It's not great with thin and curvy shapes like the ribbon. It appears that the original shape is not reproduced well. ©CAPCOM 98
Data Structure Over 1000 SDF textures loaded simultaneously Because we want to reference everything in a single shader run, we go Bindless In large scenes, more than 1,000 SDF textures may be loaded at a time. Since we want to reference them all in a single shader, we treat them as bindless resources. ©CAPCOM 99 99
Data Structure Information for culling • Attribute bit • Dynamic, less than or equal to the standard, non-negative, etc • AABB in world space Information to be used for texture samples • Matrix to convert from world coordinates to UV • Index of Bindless resource • Value range The data used to access the SDF information for each instance is divided into two parts. Shaders that only perform culling refer to the attribute bit and the AABB in world space. 100 The attribute bit provides information such as whether the object is moving or not. First, the bit mask determines if the instance needs checking, and check if the AABB is in the search area. Shaders that use the SDF in the texture refer to another set of data. In it is a matrix that converts the world coordinates to the UV space of the 3D texture, the index of the Bindless resource, and the value range of the SDF. ©CAPCOM 100
Optimization Memory Processing Load Next, we will present the optimizations we have worked on so far, divided into memory and processing load. 101 ©CAPCOM 101
Optimization: Memory 3D textures tend to increase VRAM usage Initially, we used 16-bit Float, but that takes several hundred MB of VRAM First, regarding memory, as mentioned earlier, SDF is represented as a 3D texture. You can imagine that VRAM usage tends to be large due to its 3D nature. 102 Initially, we used 16-bit Float textures. However, this became a problem in large scenes, where hundreds of MB of SDF were constantly being loaded. ©CAPCOM 102
Optimization: Memory Currently, a smaller sized format is used • Texture Format is R8Unorm • BC4 compression applied This compresses it down to about 30% of its 16-bit counterpart Now, to save memory, R8Unorm is used as the texture format, with BC4 compression applied. With this change, the same scene requires only about 30% of the previous amount of VRAM. 103 Since this is Unorm, the value will be 0~1 when Samples are taken as they are. Therefore, after Sample, the values must be linearly complemented using the minimum and maximum values of the original values as clues. ©CAPCOM 103
Optimization: Memory Only maintain SDF based on LOD 0 shape When the drawing LOD is different, the start of the ray march is often buried in a negative position + + + - - + - - - - - + - - - - - - + - - - - - - - - - - - + + + - - + - - - - + - - - + - - - - - If you use the same SDF with different LODs... In addition, SDF is generated only for LOD0 geometry. However, if the shape changes due to LOD changes, etc., this does change the result in some cases. 104 For example, as circled on the right side of this figure, the center of a voxel is outside of the object, but the value is negative. This frequently occurs. Nevertheless, preparing a different SDF for each LOD would strain VRAM. Therefore, we decided to use the same SDF for all LODs. ©CAPCOM 104
Optimization: Memory Get SDF once at the start of the ray march. If the first value is negative, increase the offset accordingly, to avoid self-intersection. No offset With offset If the same SDF is used again for a different LOD than when baked, self-intersection will occur in some places. For example, in AO, as shown in the screenshot on the left, the problem is visible as black stains. 105 However, this problem can be avoided with a simple workaround. In the figure on the right, if the voxel at the start of the ray march is negative, an offset amount is increased according to the negative value in the normal direction. That alone avoided the problem in most cases. ©CAPCOM 105
Optimization: Processing Load Main causes of increased processing load • Too many instances • Massive overlap in sphere of influence Next, performance optimization. As the use of SDF has increased, we have been faced with two main problems: the problem of too many instances, and the problem of 106 overlap in the area of influence. ©CAPCOM 106
Instance Count Reduction Classify instances • Evaluate importance when registering SDF instances, and set bit • Size of AABB • Minimum value contained in Texture • Static object or not • Check if the required bits for the accuracy required for the process are set There can be over 50,000 instances on a scene, Simply trying to examine all of them would be heavy. So we check the size and minimum value of the AABB when first registering an SDF instance, and set bits to indicate valid 107 distance, what functions it should be used in, etc. For example, if the AABB is too small, or its minimum vales is greater than the termination threshold, it can be excluded from the Ray march's targets. As for how to search for instances in the vicinity, a simple method is used to narrow down the instances to be searched. It may have been poorly implemented, but using BVH proved a costly way to search for structures, and tended to be slower than simpler methods such as the uniform grid. ©CAPCOM 107
Overlap Problem We need to look at all possible candidates who may have information on the nearest neighbor The ★ has the nearest neighbor surface from this coordinate Now, a description of the overlap problem in the area of influence of instances. For example, for the red point in the image, an object slightly further away has a closer surface than an object that includes 108 that point in its texture range. Considering that the margins are included in the AABB, as in this case, it is possible that instances at a distance may contain closer surfaces. Therefore, even when just trying to get the shortest distance to a single point, in dense areas, a large number of textures must be examined. ©CAPCOM 108
Reduction of Overlap Limit the distance that can be traveled in one step of a ray march • Search efficiency in open spaces deteriorates • In return, overlap is dramatically improved Limit maximum distance of one step To reduce the number of candidates to examine, we do one step of the ray march, limiting the maximum distance. Although there is a disadvantage of increasing the number of ray march steps, the overlap was greatly mitigated. 109 In the image on the left, we need to examine a large number of textures that may have the shortest distance. But on the right, where we restricted the maximum distance, we can guarantee no penetration by examining only two textures, although the true shortest is not guaranteed. ©CAPCOM 109
Reduction of Overlap Create overlap-resolved SDF on several clipmaps • Up to four 3D textures with 128x64x128 resolution • Dynamically created at runtime • Although the accuracy is low, it is easy to determine the nearest neighbor distance with only one sample The interface provided to engine users and SDF AO refer to the clipmap SDFs Other overlap avoidance mechanisms include storing overlap-resolved SDFs in several clipmaps. Because clipmaps are dynamically generated at runtime, changes to the scene layout are immediately reflected. 110 The accuracy will be lower than the original texture, but the distance of the nearest neighbor can be determined after only one sample. The SDF interface provided to engine users and AOs that send rays in multiple directions refer to clipmaps. ©CAPCOM 110
Reduction of Overlap Clipmap is differentially updated • Clipmaps are divided into several grids, which are used as update units • Has a checksum for each grid, updated when a change is detected • Full update takes more than 20 ms on PS4, but differential update takes less than 1 ms Uninitialized area created by camera movement Areas where additions / deletions / transformations are detected The use of clipmaps is low-cost, but their creation can be very costly. If we had to re-create every frame, the processing load would be huge— over 20 ms on PS4. So, we use differential updates. 111 When an object is added, deleted, or moved, the checksum of the grid that overlaps the affected area changes, and only the minimum area will be rebuilt. When the reference position is moved, follow with UV scrolling. If the area was included in the previous frame, it will continue to be used. The new clipmap will be created only for the areas newly included in the clipmap range. ©CAPCOM 111
Future Outlook Issues • Doesn't support joints • Too many unique meshes strain VRAM Potential new features • Application to Dynamic GI This has been all about the current state of SDF implementation. The main issue at present is that it is difficult to increase the number of variations. For example, an object deformed and placed along the terrain with joints is not covered by SDF. 112 Another problem is that too many unique meshes can cause high VRAM usage. A way to bake unique static, terrain-deformed meshes into a single octree-like structure might increase flexibility. Also, testing is underway on new dynamic Global Illumination. We would like to continue developing new features using SDF as well as optimizing SDF itself. That's all for my presentation on SDF. ©CAPCOM 112
Overall Summary HDR Grading • Change to pipeline using OpenColorIO+ACES • Emphasis on sharing settings, ease of deployment, and compatibility with HDR/SDR output • Some GUI support was needed due to pipeline changes Shell Fur • Introduced to engine as a lightweight fur representation method • High range of expression through high user customizability • Users need to control whether memory reduction or processing reduction type is used Signed Distance Field • Introduced to engine as a lightweight ray marching method • Shadow, AO, user shaders, etc., can be used flexibly in various situations • Memory usage and overlap of influence area can be a problem, and needs some trickery Thank you for your attention. I will conclude with this summary of the presentation. 113 ©CAPCOM 113