D3D12 Learning Notes
—电脑进水了,这个文档也全部丢了好心疼,只有最开始写的几行了—
my temp summrize of my understanding below
Hello World
Describe the process of drawing a triangle, something similar to printf(“Hello World!”) in D3D.
overview
step1: 初始化descriptor heap
descriptor是一个GPU资源(纹理,缓冲区,采样器)的描述。
Descriptor Heap: 创建描述符堆,descriptor heap可以有很多个,每一个都是一个连续的内存块,存放着多个descriptor。
主要有三种heap:
- CBV/SRV/UAV: constant buffer, shader resource, 无序访问视图。这些view都是GPU可见的,也被叫做descriptor table,可以绑定到root signature中。对于经常变化的小资源比如世界矩阵就放到constant buffer,对于别的资源就放到SRV。
- Sampler Descriptor Heap: 在着色器中执行纹理采样的,也是GPU可见的。
- RTV/DSV 描述符堆(Render Target View / Depth Stencil View Descriptor Heap): 设置渲染目标和深度缓冲区。CPU可见,不需要绑定到descirptor table中。
step2: constant buffer
- 上传到upload buffer,不用到default buffer
drawcall:
step 1: vertex and input layout
First thing is to create a custom vertex structure.
1
2
3
4
5
6
7struct Vertex
{
XMFLOAT3 Pos;
XMFLOAT3 Normal;
XMFLOAT2 Tex0;
XMFLOAT2 Tex1;
};After that, we need to give D3D a description, to let D3D know what the vertex is doing with each components (ep. Pos, Normal), which is
D3D12_INPUT_LAYOUT_DESC
.
1 |
|
- Each
D3D12_INPUT_ELEMENT_DESC
is defined as below
1 |
|
Below is each parameters’ description:
- SemanticName: Name of current component, which is used to map elements in vertex shader input signature. A example below:
- SemanticIndex: Index of Semantic, ep Pos is become POSITION0.
- Format: The format of the component, because XMFLOAT3 is still not enough detailed.
- InputSlot: Totally 12 slot, leave it later to learn.
- AlignedByteOffset: Size of float: 4 bytes, so offset is 3 x 4 = 12.
- InputSlotClass: default right now :
InputSlotClass
.
Whole process:
1 |
|
step 2: vertex buffer
We need to let GPU access the vertex array, they need to place into a GPU resourceID3D12Resource
called buffer
. Its a simpler structure to compare with texture.
1 |
|
For a buffer the width
refers to the number of bytes in the buffer. If totally 64 floats, the width should be 64*sizeof(float)
let GPU access the data
$\textbf{If}$ Vertex buffer is static(house,road,etc..), we need to put it into default heap D3D12_HEAP_TYPE_DEFAULT
. But default heap can only be accessed by GPU, so need a tools to let CPU push the data into GPU. Upload buffer D3D12_HEAP_TYPE_UPLOAD
needed. right here. The step is $\textbf{system memory}\rarr\textbf{upload heap}\rarr\textbf{default heap}$
$\textbf{Whole Default Heap Below}$
1 |
|
1 |
|
Above is how to manage upload heap -> default heap, then we just need to call the function above to store the vertex buffer, it makes the life easier. Example of cube below:
1 |
|
bind the buffer to pipeline
- We need a
vertex buffer view
to find the vertex buffer resource.
In my understanding, this is something like pointer, used for us to find the position and size of vertex buffer when we need.1
2
3
4
5
6typedef struct D3D12_VERTEX_BUFFER_VIEW
{
D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
UINT SizeInBytes;
UINT StrideInBytes;
}
- BufferLocation: Virtual address (such as & in c) of the vertex buffer we wanna to view
- SizeInBytes: The number of bytes to view in the vertex buffer starting from
BufferLocation
. - StrideInBytes: The size of each vertex in bytes.
- After create vertex buffer and view, use function below to process the whole steps.
1 |
|
- The step below does not mean we already draw them , just put the resource to the target slot,
DrawInstance
is the function to the draw step.1
2
3
4
5void ID3D12CommandList::DrawInstanced(
UINT VertexCountPerInstance,
UINT InstanceCount,
UINT StartVertexLocation,
UINT StartInstanceLocation);
step 3: constant buffer
It’s a type of GPU resource: ID3D12Resource
1 |
|
1 |
|
graphics pipeline in DirectX 12.
- Fixed-function stages (blue): cannot change how they process data, but can configure them using the DirectX 12 API. Such as imachines in a factory.
- Programmable stages(green): can write a shadow program like HLSL to define exactly how data is processed. Such as a program a robot in a factory.
Input-Assembler(IA) stage: read primitive data from user-defined vertex and index buffers and assemble that data into geometric primitives.
Vertex Shader(VS) Stage
transform the vertex data from object-space into clip-space.Hull Shader(HS) Stage
It is responsible for determining how much an input control patch should be tessellated by the tesslation.
basics of D3D12
COM
- backward compatible.
- language-independent.
Use WRL below to manager the lifetime of COM object, like smart pointer.
1 |
|
1 |
|
PSO
Pipeline State Object, in my understanding is analogous to the concept of draw state. Which means it can not change during the drawcall, a ‘setup’ before rendering process begin.
Render Item
A lightweight structure for us to draw a shape, store based on different PSO.
1 |
|
useful functions
- IDXGI Factory (DirectX Graphic Infrastructure)
Enum Adapters and Creating Swap Chain1
GRS_THROW_IF_FAILED(CreateDXGIFactory2(nDXGIFactoryFlags, IID_PPV_ARGS(&pIDXGIFactory5)));
creating resources
CreateCommittedResouce
implicit heap: the heap object can’t be obtained by the application. Just call the heap and use it directly, do not need to build the heap manually. But hard to control the detail of the heap.
CreatePlacedResource
CreatReservedResource
Heap
1 |
|
- DEFAULT: creating buffet when D3Dxx_USAGE = Default, only GPU could access the data, CPU can not directly access the data. Which means it usually in $\textbf{video memory}$. Always insert some data hard to change in it, such as texture.
- UPLOAD: GPU can not load the data, so upload heap is using to load the data in DEFAULT heap. For GPU “read only”, For CPU “write only”. For do not change.
- READBACK: the oppsite of UPLOAD
Resource Barrier
Handle the parallelism problem between copy engine and graphic command engine. Ep. The texture is large enough and $\textbf{memcopy}$ need some time to copy. But the graphic command engine do not know that and already start $\textbf{Draw Call}$ the texture, which lead the unfinished texture to be rendered.
1 |
|
In my understanding, because command heap’s excution on GPU is in serial order, which means rescource barrier is just like crossbars at supermarket checkout counters.
Adapter
used to looking for a adapter(graphic card)
1 |
|
Command List, Command Allocator, Command Queue
1 |
|
Comptr: it goes out of scope when COM object is no longer needed, helping to prevent memory leaks.
CommandAllocator: create and manage the memory that backs(supports) command list. Every command list need a command allocator, and each command allocator can be used with one command list at a time.
Command List: CPU records a list of commands to be executed by GPU. Such as state changes, resource barriers, drawing operations…
Command Queue: An interface through which CPU submits the recorded command lists to the GPU for execution. The GPU start excute the command as soon as CPU put command list in it.
Fence
A marker let you know when GPU has finished doing its work and tell CPU, so they can be synchronised.
1 |
|
Swap Chain
1 |
|
must more than 2 if using flip ppresentation model.
Transformation Pipeline
- World Transform: change each 3D model’s coordinates into world coordinates.
- View Transform: $V = T \cdot R_z \cdot R_y \cdot R_z$
- Projection Transform:
- Clip transform: ignore the part not in the camera.
Render Target View(RTV):
After several months, now my understanding of RTV is:
The purpose of it is just tell GPU how to render at back buffer before swap. If without RTV, the GPU will not know where the rendered pixel should be sent.
Root signatures
It describe constant, CBV(constant buffer view), SRV, UAV, Sample,etc.. store in register rools.
glossary of CG
mipmap: a set of pictures, with different level of pixels. Becasue off-site viewing do not need that detailed.
SRV(shader resource view): wrapping textures in a format that the shadow can access them. Read Only. For example : a single texture, individual arrays, planes, or colors from a mipmapped texture, 3D texture, 1D texture color gradinets, etc.
UAV(unordered access view): same as SRV, but can read or write in any order, even could read/written simultaneously by multipl,e threads without generate memory conflicts.
root signatures: link command to the resources the shaders require. It determines the type of data the shaders should expect, but does not define the actural memory or data. For graphics command list has both a graphics and compute root signature, for compute command list have one compute root signature. These root signatures are independent of each others.
Resource: all the resource could be excuted by GPU is resource in D3D12. Which is ‘ID3D12Resource’, such as rendering targets(include back buffers), textures, vertex buffers, index buffers…
G-SYNC: refresh screen and graph card together.
Window Advanced Rasterization Platform(WARP): If did not find a valiable GPU, the system will do the same step of D3D12 by CPU by WARP. It can instead all the rendering method such as rasterization, ray tracing…