3.1K Views
November 27, 23
スライド概要
■Overview
While HLSL has earned its place as a first-class citizen among shader languages, translators are essential for cross-platform use.
We will introduce a shader translator that leverages the DirectX Shader Compiler technology that we produce in-house for RE ENGINE. We will discuss the internal history of translators, how they have been used in the development of our titles, why we do not use existing translators, and the results that can only be achieved by in-house production.
Note: This is the contents of the publicly available CAPCOM Open Conference Professional RE:2023 videos, converted to slideshows, with some minor modifications.
■Prerequisites
Assumes an interest in language processing systems (compilers/interpreters).
I'll show you just a little bit of the content !
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CAPCOM Open Conference Professional RE:2023
https://www.capcom-games.com/coc/2023/
Check the official Twitter for the latest information on CAPCOM R&D !
https://twitter.com/capcom_randd
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
株式会社カプコンが誇るゲームエンジン「RE ENGINE」を開発している技術研究統括によるカプコン公式アカウントです。 これまでの技術カンファレンスなどで行った講演資料を公開しています。 【CAPCOM オープンカンファレンス プロフェッショナル RE:2023】 https://www.capcom-games.com/coc/2023/ 【CAPCOM オープンカンファレンス RE:2022】 https://www.capcom.co.jp/RE2022/ 【CAPCOM オープンカンファレンス RE:2019】 http://www.capcom.co.jp/RE2019/
Shader Translator: Implementation and Use I will now start my presentation titled "Shader Translator: Implementation and Use." ©CAPCOM 1
Contents of this session Describes the in-house shader translators we used in "Monster Hunter Rise" and "Monster Hunter Rise: Sunbreak" In this session, we will talk about RE ENGINE's shader translators. In particular, I will explain how we developed and used a shader translator for Monster Hunter Rise and Monster Hunter2Rise: Sunbreak on Nintendo Switch. ©CAPCOM 2
Agenda • Overview of RE ENGINE Shaders • Shader Translator Implementation • Optimization Case Studies • Summary and Future Prospects This is our agenda. After an overview of RE ENGINE's shaders and multi-platform support, we will discuss the implementation of our in-house 3 shader translators and optimization examples. ©CAPCOM 3
Overview of RE ENGINE Shaders I will begin with an overview of RE ENGINE's shaders. 4 ©CAPCOM 4
RE ENGINE Shaders Shader code written in HLSL by programmers Shader assets created by artists in the node-based editor ⇒ In the end, all written in HLSL There are two main types of shaders in RE ENGINE. One is shader code written in HLSL by the engine programmers, and the other is shader assets created by artists in the node-based 5 editor. Shader assets created in the node-based editor are eventually converted to HLSL source code. Therefore, all shaders are written in HLSL. ©CAPCOM 5
Multi-platform Support for Shaders Platforms requiring something other than HLSL • Nintendo Switch • Vulkan • Metal ⇒ Conversion is handled by translator As you know, RE ENGINE is a multi-platform game engine. Therefore, for platforms that require shaders in something other than HLSL, a translator is needed to convert the shaders. 6 Such platforms include Nintendo Switch, Vulkan, and Metal. ©CAPCOM 6
Shader Translators in the Wider World • DirectX Shader Compiler (DXC) • HLSL Cross Compiler (HLSLcc) • Glslang • SPIRV-Cross Here we will look at some of the shader translators that exist in the world. 7 ©CAPCOM 7
DirectX Shader Compiler (DXC) HLSL compiler for DirectX Open source project forked from LLVM/Clang Compiles from HLSL to DXIL • DXIL : Shader intermediate language for DirectX HLSL DXC DXIL DirectX First, let me introduce DirectX Shader Compiler. DirectX Shader Compiler is an HLSL compiler for DirectX developed by Microsoft. 8 It will be referred to hereafter by the abbreviation, DXC. DXC is developed as an open source project forked from the LLVM/Clang project. DXC compiles HLSL to DXIL. DXIL is a shader intermediate language for DirectX. DXIL is a subset of LLVM IR and is like an abstracted assembly language. ©CAPCOM 8
DirectX Shader Compiler (DXC) HLSL compiler for DirectX Open source project forked from LLVM/Clang Compiles from HLSL to DXIL • DXIL : Shader intermediate language for DirectX SPIR-V code generation is also possible • SPIR-V : Shader intermediate language for Vulkan HLSL DXIL DirectX SPV Vulkan DXC DXC is also capable of SPIR-V code generation thanks to Google’s contribution. SPIR-V is a shader intermediate language for Vulkan. SPIR-V is also inspired by LLVM IR, which is a kind of assembly language similar to DXIL. 9 Since SPIR-V is also supported, DXC can be used not only for DirectX but also for Vulkan, making DXC probably the most widely used tool today. ©CAPCOM 9
HLSL Cross Compiler (HLSLcc) HLSL to GLSL translator Compiles HLSL first with FXC, then the resulting DXBC is converted to GLSL • FXC : HLSL compiler for DirectX before DXC • DXBC : Shader intermediate language for DirectX before DXIL Functions from HLSL Shader Model 6.0 or later are not available • Wave Intrinsics, etc. HLSL FXC DXBC HLSLcc GLSL Next, we introduce HLSL Cross Compiler. HLSL Cross Compiler is a translator from HLSL to GLSL. As of 2023, it is almost no longer used, but until around 2017, when DirectX Shader Compiler was introduced, it was used 10 in various game engines. In the following, it will be referred to as HLSLcc for short. HLSLcc is a translator that compiles HLSL once with FXC and converts the resulting DXBC to GLSL. FXC is a shader compiler for DirectX from before DXC, and is a legacy compiler used until the DirectX 11 era. DXBC is the equivalent of DXIL in DXC and is a shader intermediate language used up to the DirectX 11 era. HLSLcc is a translator based on legacy FXC, so it has the disadvantage that it does not support HLSL Shader Model 6.0 or later. For example, Wave Intrinsics cannot be used. ©CAPCOM 10
Glslang GLSL compiler for Vulkan Compiles from GLSL to SPIR-V GLSL Glslang SPV Vulkan Next is Glslang. Glslang is a GLSL compiler for Vulkan. It is developed by the Khronos Group and comes with the Vulkan SDK. 11 Glslang supports GLSL to SPIR-V compilation. ©CAPCOM 11
Glslang GLSL compiler for Vulkan Compiles from GLSL to SPIR-V Compilation from HLSL is also supported • However, only up to HLSL Shader Model 5.0 GLSL Glslang SPV Vulkan HLSL It also supports compiling from HLSL and supports up to HLSL Shader Model 5.0. It is currently probably the most widely used GLSL compiler for Vulkan. ©CAPCOM 12 12
SPIRV-Cross Translates from SPIR-V to various shader languages SPV SPIRV-Cross GLSL Vulkan HLSL DirectX MSL Metal Finally, we introduce SPIRV-Cross. SPIRV-Cross is a tool to convert SPIR-V to various shader languages. 13 It can convert to GLSL, HLSL, MSL, etc. The last one, MSL, is an abbreviation for Metal Shading Language. ©CAPCOM 13
RE ENGINE's Approach Nintendo Switch ⇒ In-house shader translator (HLSL → GLSL) Contents of the second half of this session Vulkan ⇒ DXC SPIR-V code generation (HLSL → SPIR-V) Metal ⇒ DXC SPIR-V code generation + SPIRV-Cross (HLSL → SPIR-V → MSL) As we mentioned, there are many shader translators in the world. Based on this, the following is a list of approaches currently employed in RE ENGINE for shader multiplatform support. They are listed in the order in which they were supported by RE ENGINE, from top to bottom. 14 First, for Nintendo Switch, we do not use the shader translator introduced earlier, but use an in-house shader translator. Next, Vulkan uses DXC's SPIR-V code generation. Finally, Metal uses the same flow of DXC SPIR-V code generation and conversion to MSL using SPIRV-Cross. RE ENGINE uses this approach to achieve multi-platform support for HLSL shaders. In the second half of this session, we will discuss our in-house shader translator for Nintendo Switch. ©CAPCOM 14
History of In-house Shader Translator Development Circa 2017, RE ENGINE's Nintendo Switch support required conversion to GLSL Initially used HLSLcc • Difficult to debug Before I start, I would like to discuss the background of the development of the in-house shader translator. As for the background, around 2017, we needed to convert from HLSL to GLSL for RE ENGINE's Nintendo Switch support. 15 Initially, HLSLcc was used. However, HLSLcc had many defects and the GLSL resulting from the conversion was difficult to debug due to its low readability. ©CAPCOM 15
Problems with HLSLcc: Difficult to Debug HLSLcc converts the source code once compiled to DXBC to GLSL, which is difficult to debug because it is very different from the original source code HLSL GLSL Since HLSLcc converts the source code once compiled from DXBC to GLSL, the converted source code form is very different from the original, making debugging difficult. 16 For example, the variable names in the original HLSL are lost in the converted GLSL code. Also, at a glance, it is not clear which part of the code corresponds to the original HLSL. In this situation, it is very difficult to quickly identify the cause if there is a problem somewhere. ©CAPCOM 16
History of In-house Shader Translator Development Circa 2017, RE ENGINE's Nintendo Switch support required conversion to GLSL Initially used HLSLcc • Difficult to debug • Functions from HLSL Shader Model 6.0 or later are not available DXC SPIR-V code generation was not yet stable ⇒ Needed a translator with both ease of debugging and stability, so we started in-house production There was also a concern that HLSLcc could not take full advantage of GPU performance because it could not use SM6.0 or later functions. 17 Finally, DXC's SPIR-V code generation was not adopted because it had just been released at the time and its operation was not as stable as it is now. Therefore, a translator with both ease of debugging and stability was needed, so we decided to start in-house production. That's the development background. ©CAPCOM 17
Shader Translator Implementation I will now describe the implementation of the in-house shader translator. 18 ©CAPCOM 18
Overall Process Flow Frontend HLSL Input HLSL DXC HLSL Formatted HLSL Backend DXC AST In-house translator Abstract Syntax Tree (AST) GLSL Output GLSL To begin with, the overall processing flow of the in-house shader translator is as shown. For convenience, I'll refer to the part from HLSL to the abstract syntax tree (AST) output as the "frontend" and the part19 from the AST to the output of GLSL as the "backend." Let's take a closer look. ©CAPCOM 19
1. Code Formatting with DXC's HLSL Rewriter
HLSL
DXC
Input HLSL
HLSL
DXC
In-house translator
AST
Abstract Syntax Tree
Formatted HLSL
GLSL
Output GLSL
$ dxr.exe -remove-unused-globals –HV 2021 –enble-16bit-types -T ps_6_7 -E CopyColorPS input.hlsl
Texture2D< float4 > CopyUtilityHDRImage;
Texture2D<float>
ReadonlyDepth;
HLSL
Texture2D<float4> CopyUtilityHDRImage;
float4 CopyColorPS(float4 vs_out : SV_Position) : SV_Target {
return CopyUtilityHDRImage.Load(uint3(vs_out.xy, 0));
}
HLSL
float loadReadOnlyDepth(int3 pos)
{
float depth = ReadonlyDepth.Load(pos.xyz).x;
return depth;
}
float4 CopyColorPS(float4 vs_out: SV_Position): SV_Target
{
return CopyUtilityHDRImage.Load( uint3(vs_out.xy,0));
}
The first step is to use DXC's HLSL Rewriter to perform HLSL code formatting.
The executable file for HLSL Rewriter is named dxr.exe.
20
The option "-remove-unused-globals" can be used to remove unreferenced, unused definitions from entry points for code formatting.
The HLSL on the left, when formatted with HLSL Rewriter, turns into the HLSL on the right.
Unused global variables and function definitions that are not referenced by the entry point have been removed.
©CAPCOM
20
2. Output Abstract Syntax Tree with DXC
HLSL
DXC
Input HLSL
HLSL
DXC
AST
In-house translator
Abstract Syntax Tree
Formatted HLSL
GLSL
Output GLSL
$ dxc.exe –ast-dump –HV 2021 –enble-16bit-types -T ps_6_7 -E CopyColorPS refined.hlsl
Texture2D<float4> CopyUtilityHDRImage;
float4 CopyColorPS(float4 vs_out : SV_Position) : SV_Target {
return CopyUtilityHDRImage.Load(uint3(vs_out.xy, 0));
}
HLSL
Next, we use DXC again to output an abstract syntax tree from the HLSL we just formatted.
The executable file we're using is dxc.exe.
21
The option "-ast-dump" can be used to output the abstract syntax tree.
The abstract syntax tree is output in plain text as shown on the right.
©CAPCOM
21
3. Output GLSL from Abstract Syntax Tree
HLSL
Input HLSL
DXC
HLSL
Formatted HLSL
DXC
In-house translator
AST
Abstract Syntax Tree
GLSL
Output GLSL
#version 460 core
#extension GL_NV_gpu_shader5 : enable
#extension GL_NV_bindless_texture : enable
layout(std430) uniform;
layout(std430) buffer;
GLSL
layout(binding = 0) uniform texture2D CopyUtilityHDRImage;
layout(location = 0) out vec4 out_Target0;
// CopyColorPS
void main()
{
vec4 vs_out = vec4(gl_FragCoord.xyz, 1.0/gl_FragCoord.w);
const ivec3 autogen_TempVar0 = ivec3(uvec3(vs_out.xy, 0));
out_Target0 = texelFetch(sampler2D(uint64_t(CopyUtilityHDRImage)),
autogen_TempVar0.xy, autogen_TempVar0.z);
return;
}
And finally, GLSL is output from the abstract syntax tree.
This is the overall process flow.
22
©CAPCOM
22
Advantages of Using DXC on the Frontend DXC is used for the frontend Advantages • No need to write your own HLSL parser • DXC abstract syntax trees also contain type information • Easy to keep up with the latest HLSL specifications As you have just seen, we are using DXC for the frontend. This has several advantages. 23 The first advantage is that we do not have to write our own HLSL parser. This is quite an advantage because it is quite difficult to write your own parser for a C-like language. Another advantage is that the DXC abstract syntax tree also includes type information. This type information can be used in the GLSL conversion explained later. Finally, another advantage is that it makes it easy to keep up with the latest HLSL specifications. For the front-end part, it is only necessary to update DXC. That's all for the frontend. ©CAPCOM 23
Backend Implementation Traverse Abstract Syntax Tree TranslationUnitDecl VarDecl VarDecl IntegerLiteral AST Parse FunctionDecl Preprocess ParmVarDecl GLSL Output GLSL CompoundStmt VarDecl Abstract Syntax Tree Output GLSL BinaryOperator DeclRefExpr Tree Data Structure • Collection of global definitions (resource variables, constant buffers, structures, functions) • Memory layout calculations for constant buffers and structures • Simple control flow analysis within each function • Check side effects of each function I will now explain the backend implementation in detail. Flow-wise, first, the abstract syntax tree is output as plain text from the front end, which is parsed to restore the data structure of the 24 tree structure. Then, it traverses the tree and outputs GLSL strings sequentially. The actual traversal is done twice, once for pre-processing and once for GLSL output. I won't go into the pre-processing today, but it includes collecting global definitions, calculating memory layouts, simple control flow analysis, and checking for side effects. ©CAPCOM 24
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Let's show you a simple conversion example.
The following is a concrete example of the process flow of converting the very simple HLSL in the upper left corner to GLSL.
25
The text below is the AST of the top left HLSL.
©CAPCOM
25
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
The first line of the HLSL is the texture variable declaration.
Variable declarations are represented by VarDecl nodes in the AST.
26
The variable name and type information correspond in this way.
©CAPCOM
26
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
When converted to GLSL, it looks like this.
The variable names are used as they are, and the texture type is converted from the HLSL type to the GLSL type appropriately.
27
In this case, the textures are converted to GLSL texture2D types based on Vulkan's separate textures and samplers specification.
©CAPCOM
27
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Next is the function declaration.
Function declarations are represented in AST by FunctionDecl nodes.
28
Function names and function types correspond in this way.
Argument information is represented by the child node ParmVarDecl node.
Argument names and argument types correspond in this way.
©CAPCOM
28
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
When converted to GLSL, it looks like this.
Only the type names are properly converted from HLSL types to GLSL types.
©CAPCOM
29
29
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Next are the braces in the body of the function.
In AST, this is the CompoundStmt node.
30
©CAPCOM
30
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
They're also converted directly into braces in the GLSL.
31
©CAPCOM
31
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Next is the return statement.
In AST, it is the ReturnStmt node.
32
©CAPCOM
32
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
it's converted as-is.
33
©CAPCOM
33
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Next is the Load from the texture variable.
This is a member function call and is represented by a CXXMemberCallExpr node.
34
Information on the member function being called is contained in the child node MemberExpr node.
Information on the parent variable is contained in the DeclRefExpr child node.
DeclRefExpr node is the node that appears when a variable is used in an expression.
©CAPCOM
34
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex),
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
When converted to GLSL, it looks like this.
The HLSL texture's Load function is replaced by the texelFetch function in GLSL.
35
The first argument is converted to sampler2D type using the AST's type information.
©CAPCOM
35
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex),
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Next is the actual argument part.
In AST, it is represented as such.
36
©CAPCOM
36
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
!?
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex), ???
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
We would like to convert this to GLSL, but it will not work as-is.
37
©CAPCOM
37
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex),
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
texelFetch(sampler2D
sampler, ivec2 P, int
| |-ParmVarDecl
<col:17, col:23> col:23
used pos 'uint2':'vector<unsigned
int, 2>'
Texture2D::Load(int3
Location);
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
GLSL
lod);
This is the type signature of the HLSL Load function and the GLSL texelFetch function.
Since the two functions do not have the same number of arguments, distribution is required when converting to GLSL.38
©CAPCOM
38
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex),
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
So we'd like to create a temporary variable, but the expression we've output so far would get in the way.
39
©CAPCOM
39
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
return texelFetch(sampler2D(ColorTex),
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
return texelFetch(sampler2D(ColorTex),
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
Evacuate
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Let's put this somewhere safe for now.
40
©CAPCOM
40
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp =
GLSL
Create temporary variable
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
return texelFetch(sampler2D(ColorTex),
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Then, create a temporary variable.
41
©CAPCOM
41
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp = ivec3(uvec3(pos, 0));
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
return texelFetch(sampler2D(ColorTex),
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Substitute an expression for the actual argument into it.
42
©CAPCOM
42
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp = ivec3(uvec3(pos, 0));
return texelFetch(sampler2D(ColorTex),
GLSL
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
return texelFetch(sampler2D(ColorTex),
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
Restore
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Then, restore the expression that was just evacuated,
43
©CAPCOM
43
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
GLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp = ivec3(uvec3(pos, 0));
return texelFetch(sampler2D(ColorTex), temp.xy, temp.z);
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
And distribute the values of the temporary variable.
44
©CAPCOM
44
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
GLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp = ivec3(uvec3(pos, 0));
return texelFetch(sampler2D(ColorTex), temp.xy, temp.z);
}
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
Finally, when exiting the CompoundStmt node, we close the braces.
45
©CAPCOM
45
Simple Conversion Example
Texture2D ColorTex;
float4 getColor(uint2 pos)
{
return ColorTex.Load(uint3(pos, 0));
}
HLSL
GLSL
uniform texture2D ColorTex;
vec4 getColor(uvec2 pos)
{
const ivec3 temp = ivec3(uvec3(pos, 0));
return texelFetch(sampler2D(ColorTex), temp.xy, temp.z);
}
|-VarDecl <line:6:1, col:11> col:11 used ColorTex 'Texture2D<vector<float, 4> >'
|-FunctionDecl <line:8:1, line:11:1> line:8:8 getColor 'float4 (uint2)'
| |-ParmVarDecl <col:17, col:23> col:23 used pos 'uint2':'vector<unsigned int, 2>'
| `-CompoundStmt <line:9:1, line:11:1>
|
`-ReturnStmt <line:10:5, col:39>
|
`-CXXMemberCallExpr <col:12, col:39> 'vector<float, 4>'
|
|-MemberExpr <col:12, col:21> '<bound member function type>' .Load
|
| `-DeclRefExpr <col:12> 'Texture2D<vector<float, 4> >' lvalue Var 'ColorTex' 'Texture2D<vector<float, 4> >'
|
`-ImplicitCastExpr <col:26, col:38> 'vector<int, 3>' <HLSLCC_IntegralCast>
|
`-CXXFunctionalCastExpr <col:26, col:38> 'uint3':'vector<unsigned int, 3>' functional cast to uint3 <NoOp>
|
`-InitListExpr <col:31, col:38> 'uint3':'vector<unsigned int, 3>'
|
|-ImplicitCastExpr <col:32> 'uint2':'vector<unsigned int, 2>' <LValueToRValue>
|
| `-DeclRefExpr <col:32> 'uint2':'vector<unsigned int, 2>' lvalue ParmVar 'pos' 'uint2':'vector<unsigned int, 2>'
|
`-ImplicitCastExpr <col:37> 'unsigned int' <IntegralCast>
|
`-IntegerLiteral <col:37> 'literal int' 0
That's it.
The above process is how we output GLSL sequentially from the AST.
©CAPCOM
46
46
Delimited Continuations
The idea of cutting and pasting code when inserting temporary variables was inspired by the
"let insertion with control operator" technique in the study of functional languages.
Control Operator = "operator that manipulates continuation"
• Continuation = "the remaining computation after a given computation has been performed."
• Delimited Continuation = "a part of the remaining computation"
• Examples: exception, coroutine, call/cc, shift/reset, control/prompt
float calc(float x, float y, float z)
{
float a = x * 3;
float b = foo(2 * y, 3 * z);
return a + b;
}
float calc(float x, float y, float z)
{
float a = x * 3;
float b = foo(2 * y, 3 * z);
return a + b;
}
The remaining computation of
"3 * z" = Continuation
A part of the remaining computation of
"3 * z" = Delimited Continuation
I would like to digress a little here to introduce the concept used in implementing the conversion.
The idea of cutting and pasting code when inserting temporary variables is inspired by the"let insertion using control operator"
47
technique in the study of functional languages.
Many of you may not be familiar with this method, so I will briefly explain it here.
A control operator is an operator that manipulates the continuation of a program.
Continuation refers to the concept of "the remaining computation after a certain computation."
And in particular, "a part of the remaining computation" is called a delimited continuation.
The yellow-highlighted area in the lower left figure represents the remaining computation, or continuation, when trying to calculate "3 *
z."
The yellow-highlighted area in the lower right image represents a delimited continuation.
Delimitation can be done however you like, but here we are delimiting by statement.
©CAPCOM
47
Delimited Continuations
The idea of cutting and pasting code when inserting temporary variables was inspired by the
"let insertion with control operator" technique in the study of functional languages.
Control Operator = "operator that manipulates continuation"
• Continuation = "the remaining computation after a given computation has been performed."
• Delimited Continuation = "a part of the remaining computation"
• Examples: exception, coroutine, call/cc, shift/reset, control/prompt
float calc(float x, float y, float z)
{
float a = x * 3;
float b = foo(2 * y, 3 * z);
return a + b;
}
float calc(float x, float y, float z)
{
float a = x * 3;
float b = foo(2 * y, 3 * z);
return a + b;
}
The remaining computation of
"3 * z" = Continuation
A part of the remaining computation of
"3 * z" = Delimited Continuation
Examples of control operators that we game programmers are familiar with are exceptions and coroutines.
Let me explain a little about exceptions, too.
48
Continuation is represented by a stack frame during actual program execution.
Local variables, and the program counter, which represents the remaining computation to be resumed after a function call, are both
stored in the stack frame.
Since exceptions manipulate stack frames through stack unwinding, you can imagine them as operators that manipulate continuation.
Another example is the well-known call/cc in the Scheme language.
Finally, shift/reset and control/prompt operators are also typical examples.
These have recently been incorporated into the Haskell compiler as first-class language features.
©CAPCOM
48
Let Insertion Let insertion = "operation to replace an expression with a temporary variable" (fun x -> x * x) (1 + 2) let t = (1 + 2) in (fun x -> x * x) t Cut and paste code when inserting temporary variables • Implementation-wise, it is "just string manipulation" • Conceptually, it is "program transformation through the operation of delimited continuation" Research findings in the field of functional languages ⇒ Basis of thinking for correct program conversion Let insertion is an "operation to replace an expression with a temporary variable" during program generation. This is the same operation that was performed in the previous example. 49 Cut-pasting code to insert temporary variables is just implemented as string manipulation, but conceptually, it's program transformation using the delimited continuation operation. The research in the field of functional programming is used as the basis for the concept of correct program transformation without changing the evaluation results of the program. There is no particular necessity to think in this way, but since I had knowledge of research in the field of functional languages, I use it in the implementation. In fact, it is quite difficult to correctly convert a program in a C-like procedural language with side effects, so I think this way of thinking has been very useful. ©CAPCOM 49
Primitive Data Type Mapping HLSL GLSL float float float3 vec3 float3x3 mat3 float3x4 mat3x4 int int int3 ivec3 half float16_t half3 f16vec3 Back to the main topic. This is how the primitive data types are mapped. 50 There are no special difficulties, but we need to be a little careful about handling matrices. ©CAPCOM 50
Primitive Data Type Mapping HLSL GLSL float float float3 vec3 float3x3 mat3 float3x4 mat3x4 int int int3 Row xhalf Column ivec3 float16_t half3 e.g. mul(float4(pos, 1), worldMat) Column x Row f16vec3 e.g. worldMat * vec4(pos, 1.0) Matrices are reversed in HLSL and GLSL in terms of the meaning of the two numbers in the type name. In HLSL, it is "row x column," while in GLSL it is "column x row." 51 RE ENGINE handles the memory layout of HLSL matrices in row-major, but in GLSL it is basically column-major, so the memory layout matches. Therefore, GLSL always handles matrices in a transposed state. Of course, the results will not match if we do this, so we deal with multiplication by rearranging the order of the matrices. ©CAPCOM 51
Resource Type Mapping
HLSL
GLSL
cbuffer A { ... }
uniform A { ... };
SamplerState
sampler
Texture2D
texture2D
RWTexture2D
image2D
ByteAddressBuffer
readonly buffer { uint[]; };
StructuredBuffer<T>
readonly buffer { T[]; };
RWByteAddressBuffer
buffer { uint[] };
RWStructuredBuffer<T>
buffer { T[] };
Next is resource type mapping.
There are no special difficulties here either.
52
Constant Buffer is converted to Uniform Buffer, and SamplerState and Texture are converted to sampler and texture.
RWTexture is converted to Image in GLSL.
Finally, ByteAddressBuffer and StructuredBuffer are converted to Shader Storage Buffer in GLSL.
©CAPCOM
52
Conversion of Input/Output Interfaces
HLSL semantics are converted to GLSL input/output variables
struct FullScreenTriangleTexVSOut {
float4 positionViewport : SV_Position;
float2 texCoord : TEXCOORD0;
};
HLSL
FullScreenTriangleTexVSOut
FullScreenTriangleTexVS(uint vertexID : SV_VertexID)
{
FullScreenTriangleTexVSOut vsout;
// ...
vsout.texCoord = grid;
return vsout;
}
struct FullScreenTriangleTexVSOut {
vec4 positionViewport;
vec2 texCoord;
};
GLSL
layout(location = 1) out vec2 out_TEXCOORD0;
void main()
{
// input
uint vertexID = uint(gl_VertexID - gl_BaseVertex);
FullScreenTriangleTexVSOut vsout;
// ...
vsout.texCoord = grid;
// output
gl_Position = vsout.positionViewport;
out_TEXCOORD0 = vsout.texCoord;
return;
}
I'll also briefly describe the conversion of input/output interfaces.
Input/output between shader stages is specified by semantics in HLSL, but in GLSL, it is specified by the location of input/output
53
variables.
System-value semantics in HLSL are converted to built-in variables in GLSL.
©CAPCOM
53
Conversion of Input/Output Interfaces HLSL semantics are converted to GLSL input/output variables struct FullScreenTriangleTexVSOut { float4 positionViewport : SV_Position; float2 texCoord : TEXCOORD0; }; FullScreenTriangleTexVSOut FullScreenTriangleTexVS(uint vertexID : SV_VertexID) { FullScreenTriangleTexVSOut vsout; // ... vsout.texCoord = grid; return vsout; } HLSL struct FullScreenTriangleTexVSOut { vec4 positionViewport; vec2 texCoord; }; GLSL layout(location = 1) out vec2 out_TEXCOORD0; void main() { // input uint vertexID = uint(gl_VertexID - gl_BaseVertex); FullScreenTriangleTexVSOut vsout; // ... vsout.texCoord = grid; // output gl_Position = vsout.positionViewport; out_TEXCOORD0 = vsout.texCoord; return; } And the input/output conversion is done like this. 54 ©CAPCOM 54
Mapping of HLSL Semantics to GLSL Built-in Variables HLSL GLSL SV_Position (as vertex shader input) gl_Position SV_Position (as pixel shader input) vec4(gl_FragCoord.xyz, 1.0/gl_FragCoord.w) SV_Depth gl_FragDepth SV_VertexID gl_VertexID - gl_BaseVertex SV_InstanceID gl_InstanceID SV_PrimitiveID gl_PrimitiveID SV_GroupID gl_WorkGroupID SV_GroupThreadID gl_LocalInvocationID SV_DispatchThreadID gl_GlobalInvocationID SV_GroupIndex gl_LocalInvocationIndex This is the mapping between HLSL semantics and GLSL built-in variables. I have only included some examples in this list, but this is how the conversion is done. ©CAPCOM 55 55
When Special Handling is Required at GLSL Dealing with Different Memory Layouts Functions that Take and/or Return Buffer Type Values Basically, the conversion to GLSL can be handled in the way I have described so far. However, there are cases where special handling is required. 56 ©CAPCOM 56
Dealing with Different Memory Layouts
HLSL float3 is 4-byte aligned
GLSL vec3 is 16-byte aligned (std430)
HLSL
struct BoundingAABB {
float3 center;
float3 extent;
};
StructuredBuffer<BoundingAABB> AABBSRV;
readonly buffer AABBSRV { BoundingAABB AABBSRV_Buffer[]; };
BoundingAABB
center
GLSL
struct BoundingAABB {
vec3 center;
vec3 extent;
};
BoundingAABB
center
extent
extent
Unintended 4-byte padding
The first is to deal with different memory layouts.
For example, the HLSL float3 type becomes vec3 type when converted to GLSL as it is.
However, HLSL's float3 is 4-byte aligned, while GLSL's vec3 is 16-byte aligned.
57
In this example, two variables of type float3 are declared in the structure BoundingAABB.
But as shown in the figure below, unintended padding is created by simply converting as is, resulting in different evaluation results
between HLSL and GLSL.
©CAPCOM
57
Dealing with Different Memory Layouts
⇒ Dealing with vector types by converting them to array types
struct BoundingAABB
{
vec3 center;
float center[3];
vec3 extent;
float extent[3];
};
readonly buffer AABBSRV { BoundingAABB AABBSRV_Buffer[]; };
GLSL
Insert a glue function for conversion where it references an arrayed member
vec3
float[3]
floatArrayToVector3(float[3] a) { return vec3(a[0], a[1], a[2]); }
vector3ToFloatArray(vec3 v)
{ return float[3](a.x, a.y, a.z); }
GLSL
vec3 center = floatArrayToVector3(AABBSRV_Buffer[i].center);
BoundingAABB aabb;
aabb.center = vector3ToFloatArray(center);
If you can use GL_EXT_scalar_block_layout extension, you can just use "layout(scalar) buffer;"
Such cases are handled by converting vector types to array types.
Thus, the vec3 type is converted to a 3-element float array type.
58
The glue function for the conversion is automatically generated and inserted where the arrayed member is referenced, so that the
conversion matches up.
This is the automatically generated glue function.
And where it references an arrayed member, it inserts calls to these functions.
As a side note, if you can use the GL_EXT_scalar_block_layout extension, you can simply declare "layout(scalar) buffer;" at the beginning.
The scalar layout allows the memory layout to be exactly the same as in HLSL, so arrayization is no longer necessary.
That's all for how to deal with different memory layouts.
©CAPCOM
58
Functions that Take and/or Return Buffer Type Values
In GLSL, objects of type buffer cannot be passed as function arguments, returned as return values,
or assigned to local variables (= not first-class objects)
• Possible if using GL_EXT_buffer_reference extension or GL_NV_shader_buffer_load extension
ByteAddressBuffer Input;
ByteAddressBuffer getInputBuffer()
{
return Input;
}
float readBuffer(ByteAddressBuffer buf, uint offset)
{
return asfloat(buf.Load(offset));
}
uint addr = id * 4;
ByteAddressBuffer input = getInputBuffer();
float v = readBuffer(input, addr) * 10.0f;
HLSL
readonly buffer Input { uint Input_Buffer[]; };
uint* getInputBuffer()
{
return Input_Buffer; // compilation error
}
float readBuffer(uint* buf, uint offset)
{
return uintBitsToFloat(buf[offset]);
}
GLSL
uint addr = id * 4;
uint* input = getInputBuffer();
float v = readBuffer(input, addr) * 10.0;
⇒ Addressed by inlining
Another problem is that in GLSL, objects of type buffer cannot be passed as function arguments, returned as return values, or assigned
to local variables.
59
In other words, in GLSL, a buffer type is not a first-class object.
A "first-class object" is generally defined as an object that can be passed as a function argument, returned as a return value, or assigned
to a local variable.
In the HLSL on the left, a ByteAddressBuffer is returned as the return value of a function, but it cannot be converted to GLSL as it is.
The GLSL on the right is written with the intention of converting it in this way, but in reality it results in a compile error.
©CAPCOM
59
Functions that Take and/or Return Buffer Type Values In GLSL, objects of type buffer cannot be passed as function arguments, returned as return values, or assigned to local variables (= not first-class objects) • Possible if using GL_EXT_buffer_reference extension or GL_NV_shader_buffer_load extension ByteAddressBuffer Input; ByteAddressBuffer getInputBuffer() { return Input; } float readBuffer(ByteAddressBuffer buf, uint offset) { return asfloat(buf.Load(offset)); } uint addr = id * 4; ByteAddressBuffer input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0f; HLSL readonly buffer Input { uint Input_Buffer[]; }; uint* getInputBuffer() { return Input_Buffer; // compilation error } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; ⇒ Addressed by inlining As I found out later, it is possible to use the GL_EXT_buffer_reference extension or the GL_NV_shader_buffer_load extension to write GLSL like the one on the right. 60 However, the in-house shader translator handles such functions by inlining them. I will say in advance, this method was the result of trial and error at the time, and in retrospect, I do not think it is the easiest way to handle such functions. In fact, it is quite difficult to implement inlining. If I were doing it again now, I would have adopted a conversion using the GL_NV_shader_buffer_load extension. ©CAPCOM 60
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable GLSL Value This section describes the steps for inlining. First, to propagate the contents of a variable of type buffer, we will track all variable declarations and assignments, as a61 typical interpreter implementation would do. We will look at each in turn. ©CAPCOM 61
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL The first addr variable declaration registers the variable name and value in a table like this. 62 ©CAPCOM 62
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input The input variable declaration in the next line is registered in the same way. 63 ©CAPCOM 63
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; Lexical Scope Lexical Binding uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input Thus, the table represents the lexical scope, and each row represents a lexical binding. 64 ©CAPCOM 64
Example of Inlining GLSL readonly buffer Input { uint Input_Buffer[]; }; uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; Inlining uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input Next, this function is a function that returns a buffer type, so I want to expand it inline. 65 ©CAPCOM 65
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input Because we have a continuation here... 66 ©CAPCOM 66
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; uint* input = getInputBuffer(); float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input Evacuate We'll take it out for now. 67 ©CAPCOM 67
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; getInputBuffer() float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input Then inlining is performed. 68 ©CAPCOM 68
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input It goes like this. 69 ©CAPCOM 69
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL input First, create a temporary variable to store the return value of the function. The buffer type is commented out because it can't be handled as a local variable like that in GLSL. 70 ©CAPCOM 70
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL temp0 input This is also registered in the table. 71 ©CAPCOM 71
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 GLSL temp0 input Next, the return statement after inlining is replaced with an assignment statement to the temporary variable just created. 72 ©CAPCOM 72
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } float v = readBuffer(input, addr) * 10.0; uint* input = ; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer GLSL input The value of this assignment is also registered in the table. 73 ©CAPCOM 73
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = ; float v = readBuffer(input, addr) * 10.0; uint* input = ; Restore uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer GLSL input Then restore the continuation. 74 ©CAPCOM 74
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer GLSL input The hole in the continuation is replaced by the temporary variable that contains the return value. 75 ©CAPCOM 75
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL This assignment is also registered in the table. 76 ©CAPCOM 76
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v The same applies to the next part, and the following variable declaration is registered in the table. 77 ©CAPCOM 77
Example of Inlining GLSL readonly buffer Input { uint Input_Buffer[]; }; uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; Inlining uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v Next, since this function takes a buffer type as an argument, it also requires inlining. 78 ©CAPCOM 78
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v Again, here the continuation needs to be evacuated first. 79 ©CAPCOM 79
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float v = readBuffer(input, addr) * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v float v = * 10.0; Evacuate Evacuated. 80 ©CAPCOM 80
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; readBuffer(input, addr) uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v float v = * 10.0; Then inlining is performed. 81 ©CAPCOM 81
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v float v = * 10.0; It goes like this. 82 ©CAPCOM 82
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL v float v = * 10.0; As before, we first create a temporary variable to store the return value of the function. 83 ©CAPCOM 83
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; v It goes in the table. 84 ©CAPCOM 84
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; v This part assigns the variables that come from the function's arguments. These also registered in the table, but... 85 ©CAPCOM 85
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; Create and register a new Lexical Scope v buf input offset addr Since it is a different scope, create a new lexical scope and register it there. 86 ©CAPCOM 86
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(buf[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; v buf input offset addr The key point is how to handle this buf variable. This is a buffer type variable, but local variables of type buffer cannot be handled as is in GLSL, so it needs to be replaced. 87 So, we can trace back through the table and find the name of the actual resource variable that is being assigned to it. ©CAPCOM 87
Example of Inlining GLSL readonly buffer Input { uint Input_Buffer[]; }; uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; v buf input offset addr It then replaces it with the actual resource variable name found. This is how we get the buffer variable into the function argument. ©CAPCOM 88 88
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } Variable Value addr id * 4 temp0 Input_Buffer input temp0 GLSL temp1 float v = * 10.0; v buf input offset addr Let's look at the rest. 89 ©CAPCOM 89
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v buf input offset addr Register this assignment in the table. 90 ©CAPCOM 90
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = * 10.0; Delete Lexical Scope as it exits the scope uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v buf input offset addr When exiting a scope, the lexical scope is deleted. 91 ©CAPCOM 91
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = * 10.0; float v = * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v Restore Then restore the continuation. 92 ©CAPCOM 92
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = temp1 * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v Replace the hole in the continuation with the temporary variable that contains the return value. 93 ©CAPCOM 93
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = temp1 * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v temp1 * 10.0 Register the assignment to the table... 94 ©CAPCOM 94
Example of Inlining readonly buffer Input { uint Input_Buffer[]; }; GLSL uint addr = id * 4; //uint* temp0; { // inlining 'getInputBuffer' //temp0 = Input_Buffer; } //uint* input = temp0; float temp1; { // inlining 'readBuffer' //uint* buf = input; offset = addr; temp1 = uintBitsToFloat(Input_Buffer[offset]); } float v = temp1 * 10.0; uint* getInputBuffer() { return Input_Buffer; } float readBuffer(uint* buf, uint offset) { return uintBitsToFloat(buf[offset]); } GLSL Variable Value addr id * 4 temp0 Input_Buffer input temp0 temp1 uintBitsToFloat(Input_Buffer[offset]) v temp1 * 10.0 And we're done. That's the procedure used to perform inlining at the source code level. ©CAPCOM 95 95
Optimization Case Study I will now present an example of optimization using an in-house shader translator. 96 ©CAPCOM 96
GPU Optimization through Inlining
By inlining all functions, GPU performance improved
Function calls with no side effects are allowed to have order of computation changed
⇒ Accidentally found a case that improves performance
Before
return (v.z <= 0.0) ? signNotZeroMultiply(p, (vec2(1.0) - abs(p.yx))) : p;
After
This ternary operator was a divergent branch
vec2 temp = signNotZeroMultiply(p, (vec2(1.0) - abs(p.yx)));
return (v.z <= 0.0) ? temp : p;
// inlining 'signNotZeroMultiply'
The divergent branch disappears and latency improves
Usually an area left to shader compiler optimization
⇒ Shader compilers are not always excellent ...
First, here is an example of GPU optimization via inlining.
We took the inlining I just described, tried applying it to all functions, and found that GPU performance was improved in some cases.
The implemented inlining allows the order of computation to be swapped for function calls that don't have side-effects.
97
Therefore, there are cases where inlining changes the order of computation, and we found cases where this improved performance by
chance.
Here is a concrete example.
This is an example of a function call in a ternary operator.
After applying inlining, the order of computation of the ternary operator and the function call is swapped as shown below.
In the compiled result before the change, this ternary operator became a divergent branch.
After the change, the divergent branch disappears and latency is improved.
I think this kind of optimization is an area that really should be left to the shader compiler.
However, since shader compilers are not always excellent, sometimes such rewrites can be effective.
©CAPCOM
97
Results of GPU Optimization through Inlining GPU processing time Before After Diff Diff % 29.26 ms 29.12 ms -0.14 ms -0.47 % (average of the total over 30 frames) These are the measurement results from a scene in Monster Hunter Rise. We're comparing the GPU processing time per frame. 98 It was 29.26 ms before inlining, but it became 29.12 ms afterwards. There was an improvement of 0.14 ms. ©CAPCOM 98
Before (29.26 ms) This is before inlining, 99 ©CAPCOM 99
After (29.12 ms) and here is after. 100 ©CAPCOM 100
CPU Optimization with Static Sampler Replaces fixed-state samplers with constants, eliminating binding processing on the CPU Before: (Normal) Sampler GLSL uniform texture2D BaseMap; uniform sampler BilinearWrap; vec4 color = texture(sampler2D(BaseMap, BilinearWrap), uv); After: Static Sampler GLSL uniform texture2D BaseMap; const uint64_t BilinearWrap = 1; vec4 color = texture(sampler2D(BaseMap, sampler(BilinearWrap)), uv); Register a fixed BilinearWrap sampler to the first SamplerPool Specialized implementation for NVIDIA GPUs using the GL_NV_bindless_texture extension ⇒ Non-standard vendor extensions are often not supported by DXC or Glslang The following is an example of CPU optimization with a static sampler. Fixed-state samplers can be replaced with constants in shaders to eliminate binding in the CPU. In this example, the BilinearWrap sampler is replaced with a constant 1 of type uint64_t. The BilinearWrap sampler is then registered as a fixed value in the first SamplerPool. 101 In this way, the sampling result is the same as before the change, but the sampler binding process can be removed. This implementation is specific to NVIDIA GPUs using the GL_NV_bindless_texture extension. This is the implementation that provides the best performance on the Nintendo Switch. Non-standard vendor extensions such as these are often not supported by DXC or Glslang, but in-house translators have the flexibility to incorporate them. This is an optimization technique that can only be achieved with in-house translators. ©CAPCOM 101
CPU Optimization Results with Static Sampler Total processing time for bind functions Number of bind function calls Before After Diff Diff % 0.40 ms 0.11 ms -0.29 ms -72.5 % 109 11 -98 -89.9 % (average of the total over 120 frames) These are the measurement results from a Monster Hunter Rise: Sunbreak scene. The total processing time and number of calls to the sampler's bind function are compared. 102 Before the application, it took 0.40 ms, but after the application, it took 0.11 ms. The number of calls also decreased from 109 to 11. The addition of NPC allies called "Followers" in Monster Hunter Rise: Sunbreak required additional CPU processing optimization. CPU optimization with static sampler helped to reduce the processing load for this purpose. ©CAPCOM 102
GPU Micro-Optimization: bitfieldInsert
Engine library functions whose interface is slightly different from the GLSL built-in functions
Simple distribution of arguments won't work
Engine library functions
uint VIA_BFI(uint mask, uint insert, uint base) { return (mask & insert) | (~mask & base); }
GLSL built-in functions
Bit mask
Offset
Number of bits
uint bitfieldInsert(uint base, uint insert, int offset, int bits);
⇒ Slightly reduces the number of instructions by automatically converting arguments according to their
expressions and values
VIA_BFI(3 << offset, insert, base);
bitfieldInsert(base, insert >> offset, offset, 2);
VIA_BFI(0x70, insert, base);
bitfieldInsert(base, insert >> 4, 4, 3);
VIA_BFI(0x55, insert, base);
VIA_BFI(0x55, insert, base);
Here's a very minor GPU optimization example.
GLSL has a built-in function called bitfieldInsert that sets the value of a bitfield.
The RE ENGINE library also has a function that performs a similar process, but the interface is slightly different, so there
103are cases where
it can't be converted as is.
The RE ENGINE library functions have an interface that matches AMD's ISA, and the range of the bitfield is specified by a bitmask.
The GLSL built-in functions, on the other hand, specify the range of bitfields by offset and bit number.
This does not translate well with simple argument distribution.
In such cases, they are automatically converted to fit the interface according to the form and value of the argument expression.
The engine's library functions will still work correctly, but we do this because the number of instructions can be slightly reduced by
using GLSL's built-in functions.
In some cases, such as the third case, the conversion may not be possibledepending on the argument values.
In such cases, the conversion is left as is. This kind of micro-optimization is also incorporated for the convenience of the engine.
©CAPCOM
103
Inline GLSL
Writing string literals is allowed in HLSL
⇒ GLSL can be written in string literals within HLSL
(just remove the double quotes when converting)
HLSL
uint wavePrefixCountBits(uint mask)
{
"return bitCount(mask & gl_ThreadLtMaskNV);";
return 0;
}
uint wavePrefixCountBits(uint mask)
{
return bitCount(mask & gl_ThreadLtMaskNV);
return 0;
}
GLSL
No need to touch the DXC
|-FunctionDecl <line:12:1, line:15:1> line:12:6 used wavePrefixCountBits 'uint (uint)'
| |-ParmVarDecl <col:26, col:31> col:31 mask 'uint':'unsigned int'
| `-CompoundStmt <col:37, line:15:1>
|
|-StringLiteral <line:13:3> 'literal string' lvalue "return bitCount(mask & gl_ThreadLtMaskNV);"
|
`-ReturnStmt <line:14:3, col:10>
⇒ Useful for a little GLSL experimentation
Finally, we introduce inline GLSL.
In HLSL, it is allowed to write string literals in the source code.
We make use of this to support writing GLSL as string literals in HLSL.
104
The implementation of the conversion is very simple: when we find a string literal, we simply remove the double quotation marks.
This provides a simple way to inline GLSL.
No modifications to DXC are required. Since string literals are originally allowed, they are parsed as StringLiteral nodes.
This is useful for experimenting, as it allows writing inline GLSL in HLSL without modifying the translator.
I used this in some places for platform-specific optimizations.
The implementation of the wavePrefixCountBits function above is an example.
This concludes the section on optimization case studies.
©CAPCOM
104
Summary and Future Prospects Summary and Future Prospects. 105 ©CAPCOM 105
Summary Shader translators can be made if you work hard • Labor saving by using DXC for the frontend, fairly foolproof implementation for the backend Shader translators are very hard to implement • GLSL language features are not as extensive as HLSL • There are so many language extensions in GLSL that it is difficult to grasp the features available ⇒ If you know how to use it, you can access all the GPU's functions Results only possible with in-house production • Bespoke conversion for the conveniences of the engine and/or platform • GPU optimization through inlining • CPU optimization using platform-specific extensions This is a summary of this session. First, we showed that shader translators can be made if you try hard enough. 106 The approach of direct conversion between high-level languages, from HLSL to GLSL, is a method that anyone can think of, but I don't hear about real implementations very often. This might be because, being a niche subject, implementations just aren't publicized much. Either way, with hard work, you can implement something that can be used in production. The frontend can be made without much effort by using DXC. The backend is a fairly simple concept, but it works well. However, the actual implementation was very difficult. Since the language features of GLSL are not as extensive as those of HLSL, various implementations were required to fill the gaps in the translator. We had to do a lot of hard work, such as dealing with different memory layouts and inlining for handling buffer type variables. ©CAPCOM 106
Summary Shader translators can be made if you work hard • Labor saving by using DXC for the frontend, fairly foolproof implementation for the backend Shader translators are very hard to implement • GLSL language features are not as extensive as HLSL • There are so many language extensions in GLSL that it is difficult to grasp the features available ⇒ If you know how to use it, you can access all the GPU's functions Results only possible with in-house production • Bespoke conversion for the conveniences of the engine and/or platform • GPU optimization through inlining • CPU optimization using platform-specific extensions Knowledge of the field of functional languages came in handy. It was also difficult to grasp the features available in GLSL, since there are many language extensions. 107 However, I think it was a good thing that I was able to become familiar with various language extensions through the development of the translator. Once you master it, you can access all GPU functions, so you can implement platform-specific features that are not possible with general-purpose shader translators. The first result of in-house production was that we were able to create a conversion that perfectly matched the engine and platform. This may seem obvious, but it is an important point. We were also able to flexibly handle special platform-specific functions that are not supported by general-purpose shader translators. Other examples include GPU optimization using inlining, and CPU optimization using platform-specific extensions, as described in the optimization examples. ©CAPCOM 107
Future Prospects Transition to DXC SPIR-V code generation • • • • As of 2023, it works fine in most cases In-house shader translators have a hard time keeping up with and maintaining HLSL evolution Familiarity with DXC codebase through development of in-house shader translators Looking to customize if necessary More active use of SPIR-V • Look at lower layers for further optimization • Incorporate vendor extensions as needed ⇒ High-quality game production with both convenience and performance Now, for our future prospects. Although we have talked about in-house shader translators in this session, we are actually considering moving to DXC's SPIR-V code generation in the future. 108 As of 2023, DXC's SPIR-V code generation works fine in most cases. In fact, we use it in our RE ENGINE implementations of Vulkan and Metal, and it works without issue. Also, HLSL has evolved remarkably in recent years. C++ language features such as templates and operator overloading have been actively incorporated. In-house shader translators have to keep up with these changes, which is costly to maintain. And although I didn't talk about it in this session, when we developed our in-house shader translators, we actually made some modifications to DXC itself. Through this process, we have a certain understanding of the DXC codebase. We would like to consider the transition with a view to customizing it as necessary. ©CAPCOM 108
Future Prospects Transition to DXC SPIR-V code generation • • • • As of 2023, it works fine in most cases In-house shader translators have a hard time keeping up with and maintaining HLSL evolution Familiarity with DXC codebase through development of in-house shader translators Looking to customize if necessary More active use of SPIR-V • Look at lower layers for further optimization • Incorporate vendor extensions as needed ⇒ High-quality game production with both convenience and performance Finally, we are looking at more aggressive use of SPIR-V. We would like to take a lower layer approach for further optimization. We will also incorporate our own vendor extensions as needed. 109 This is not to say that the development of in-house shader translators was a waste of time. Development of the translator has greatly deepened our understanding of HLSL and GLSL, and we now have a better understanding of the characteristics of shader compilers. In the future, we would like to approach lower layers based on the knowledge gained from this development. I would like to continue to support high-quality game production through research and development of technologies that combine convenience and performance. ©CAPCOM 109
References OpenGL 4.60 Quick Reference Card https://www.khronos.org/files/opengl46-quick-reference-card.pdf OpenGL Shading Language 4.60 Specification https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf Vulkan 1.3 Specification https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html Kai Nacke. "Learn LLVM 12". 2021. Ikuo Nakata. "コンパイラの構成と最適化". 2nd edition, 2009 N.D. Jones, C.K. Gomard, and P. Sestoft. "Partial Evaluation and Automatic Program Generation". 1993. Kenichi Asai and Oleg Kiselyov. "Introduction to Programming with Shift and Reset". CW 2011 Tutorial. !7942: Native, first-class, delimited continuations – Glasgow Haskell Compiler / GHC https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7942 Here are the references used. 110 ©CAPCOM 110
Thank you for your attention That is all. Thank you very much for your attention. 111 ©CAPCOM 111