最近在看gpu instancing相关的东西,1000个球,使用instancing , 用frame debug可以看到出现断批次的情况,
原因是:The previous instanced draw call has reached its maximum instance count.
就很懵逼,这个batch怎么算的,想大概了解batch最大数量,
然后参考https://github.com/vanCopper/Unity-GPU-Instancing
里面说到了使用instancing的时候在PC上DX使用的是constant buffer 为64kb = 65536btye
然后因为使用了UNITY_VERTEX_INPUT_INSTANCE_ID,一结构体分配了Instance ID,之后Unity会在ConstantBuffers中存储两个矩阵:
object-to-world : 世界坐标系下的矩阵
world-to-object:用于法线计算的矩阵
一个矩阵大小为 64byte
然后max instance count = int(65536 / 128) - 1 = 511 (本来是512,-1是unity上的结果)
(这里并没有使用Graphics.DrawMeshInstanced,只是普通的instancing,也没有考虑移动平台300顶点限制,这个球是超300顶点的,PC测试没那么多限制)
如图,1000个小球
//shader使用的buffer 属性
UNITY_INSTANCING_BUFFER_START(Props)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_INSTANCING_BUFFER_END(Props)
//此处省略
fixed4 frag (v2f i) : SV_Target
{
// sample the texture
fixed4 col = tex2D(_MainTex, i.uv);
// apply fog
UNITY_APPLY_FOG(i.fogCoord, col);
//需要访问一下属性再计算,不然instance最大数量不会改变
float4 c1 = UNITY_ACCESS_INSTANCED_PROP(Props, _Color);
return col * c1;
}
上面的结果好像挺对的。。那我继续加多几个属性看看
//prop属性
UNITY_INSTANCING_BUFFER_START(Props)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color1)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color2)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color3)
UNITY_INSTANCING_BUFFER_END(Props)
float4 *4 = 16byte * 4 = 64byte, instance数量还是511
再加多属性,float4 * 9 = 16byte * 9 = 144btye
//prop属性
UNITY_INSTANCING_BUFFER_START(Props)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color1)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color2)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color3)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color4)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color5)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color6)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color7)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color8)
UNITY_INSTANCING_BUFFER_END(Props)
变了454个,
经过测试(懒得上图了,太累,有兴趣自己可以测试),当prop属性 字节数<=128的时候,发现就会是511
然后超了128之后,就要按16byte对齐,DX的 constant buffer 是按16byte对齐
454个 = int(65546 / 144) - 1 = 454
再验证16字节对齐
添加一个float属性= 1
//prop属性
UNITY_INSTANCING_BUFFER_START(Props)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color1)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color2)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color3)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color4)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color5)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color6)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color7)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color8)
//添加这句代码,float=4btye
UNITY_DEFINE_INSTANCED_PROP(float, _Color9)
UNITY_INSTANCING_BUFFER_END(Props)
发现变408了,
所以,我地继续计算,按128位对齐,咁就系16字节补齐,float4 * 9 + 16btye = 160btye
65536 / 160 - 1 = 408
大概计算就是这样,反正就是测试得出的结果,也不知道源码是怎样。
参考https://github.com/vanCopper/Unity-GPU-Instancing
作者:无加冰嘅可乐