Quick Profiling Guide#
Game Thread#
General#
-
Display RenderBudget:
-Budget BebylonPerf
-
Freeze Game Thread
-Pause
-
Check Game Thread Perf
-stat Game
-
Pause Rendering
-show Rendering
Animation#
- Skeletal Meshes
show SkeletalMeshes
r.EnableMorphTargets
r.SkinCache.Mode
a.URO.Enable
a.URO.ForceAnimRate
a.URO.ForceInterpolation
Physics#
- Toggle All Collision
Might need to implement these.
- Toggle Generate All Overlap Events
Might need to implement this either in python or you might be able to the editor commands to set on all actors/objects: set <classname> <propertyName> <value>
- Toggle Anim Dynamics
p.AnimDynamics
p.AnimDynamicsWind
p.AnimDynamicsRestrictLOD
p.RagdollPhysics
- Visualize by:
TraceTag
or TraceTagAll
GPU Profiling#
pause
- pauses game thread and then use show
command to profile rendering
FreezeFrame 0.5
- Freezes/Pauses game after a delay. Custom function in UCheatManager
- Test if GPU Bottleneck:
r.screenpercentage=20
=> fast test to see if GPU is bottleneckshow Rendering
(verify this this actually disables rendering)
- Test if Texture Bandwidth is problem:
- Replace all textures with 2x2 textures (need to implement)
-
Test if Texture MipMaps are appropriate
-Visualize mipmap scale
-Visualize UV scale
-
Test quad overdraw/small triangle size
show QuadOverdraw
-
Test Overdraw is problem
-show Translucency
-show SeparateTranslucency
-
Test Meshes bottleneck
show StaticMeshes
show InstancedStaticMeshes
show SkeletalMeshes
r.ForceLOD
- Animation Compute Shaders:
r.SkinCache.Mode
r.MorphTarget.Mode
- Test if Lighting is bottleneck
-
Toggle All Lighting
show Lighting
(verify this this actually disables rendering)ToggleLight
(verify this this actually disables rendering)show DeferredLighting
(verify this this actually disables rendering)show VisualizeLightCulling
-
Toggle Static Lighting
r.AllowStaticLighting
show DirectLighting
(verify this this actually disables rendering)
-
Toggle Dynamic Lighting
show DirectionalLights
show PointLights
show SpotLights
show SkyLighting
-
Toggle Lighting Components
- Direct Lighting
show DirectLighting
r.SimpleDynamicLighting
- Ambient Occlusion
show AmbientOcclusion
show Diffuse
show Specular
- Global Illumination
show GlobalIllumination
show SubsurfaceScattering
- Indirect Lighting Cache
r.IndirectLightingCache
show IndirectLightingCache
(verify)- Reflection Environment
show ReflectionEnvironment
-
Test if Shader Complexity Bound:
-ToggleForceDefaultMaterial
-show Materials
-
Test FX System
-
Toggle Particles
(find all commands to turn them off completely) -
Toggle Particle Simulation
r.GPUParticle.Simulate
r.GPUParticle.FixDeltaSeconds
FX.FreezeGPUSimulation
FX.FreezeParticleSimulation
FX.RestartAll
-
Toggle Particle Rendering
show Particles
-
Turn Off CPU Particles
-
Turn Off GPU Particles
FX.AllowGPUParticles
-
Misc Particle Commands
PARTICLE
KILLPARTICLES
DUMPPARTICLEMEM
CountDisabledParticleItems
LISTPARTICLESYSTEMS
PARTICLEMESHUSAGE
DUMPPARTICLECOUNTS
TRACKPARTICLERENDERINGSTATS
-
Toggle Decals
- Test If Post Processing
- show PostProcessing
- show PostProcessMaterial (this is for toggling custom postprocessing materials which are usually very expensive)
- show AntiAliasing
- show Decals
- Disable rendering features in order of priority by
r.LimitRenderingFeatures=FeatureLevel
. Feature Levels: - AntiAliasing
- EyeAdaptation
- SeparateTranslucency
- DepthOfField
- AmbientOcclusion
- CameraImperfections
- Decals
- LensFlares
- Bloom
- ColorGrading
- Tonemapper
- Refraction
- ReflectionEnvironment
- AmbientCubemap
- MotionBlur
- DirectLighting
- Lighting
- Translucency
- TextRender
- Particles
- SkeletalMeshes
- StaticMeshes
- BSP
- Paper2DSprites
Draw Thread#
- Look at Draw Call Counter and make sure it's within budget
- stat RHI
- stat SceneRendering
- Look at triangle counts. You can do show [object category] to turn off big groups of objects to see where triangle counts are coming from
- show shadows
- show dynamicshadows
-
Freeze Rendering
- r.RenderTimeFrozen
- FreezeRendering
- FREEZESTREAMING
- FREEZEALL (freezes rendering & level streaming)
- PAUSERENDERCLOCK
- FX.FreezeGPUSimulation
- FX.FreezeParticleSimulation -
Inspect Draw Lists:
- r.DumpDrawListStats
- Occlusion/Visibility Culling:
- Use:- stat initviews - Displays information on how long visibility culling took and how effective it was. Visible section count is the single most important stat with respect to rendering thread performance, and that is dominated by Visible Static Mesh Elements under STAT INITVIEWS, but Visible Dynamic Primitives also factors in
- FIX
- show Bounds
- DumpVisibleActors
- r.VisualizeOccludedPrimitives
- showflag.visualizeculling
- show bounds
-
Check if driver overhead is cause
- stat d3d11rhi -
GPU/CPU Stalls or Pipeline Bubbles
- Do RenderDoc/NSight capture, grab timings, and see if the perf goes up. If it does, the problem is a sync point- stat scenerendering to look at Stats
- Launch GPUView to drill into specifics
VR Specific#
- Launch Oculus Performance HUD Tool
- Should be accessible with console command from U4
- Disable ASW
- Look at these timings compared to emulate stereo mode. These are accurate GPU timings
- Targets:
- Should have <= 1 dropped frame per 5 seconds
- Should have GPU render time ~10ms
Performance Tuning#
Tunable Optimizations#
-
Ticking
- tick.AllowAsyncComponentTicks
- tick.AllowConcurrentTickQueue
- tick.AllowAsyncTickDispatch
- tick.AllowAsyncTickCleanup -
Toggle Occlusion Queries
- r.AllowOcclusionQueries
- r.DownsampledOcclusionQueries
- r.NumBufferedOcclusionQueries
- r.OcclusionQueryLocation (Does nothing in forward) -
Toggle HZB:
- r.HZBOcclusion=0
- EXPERIMENTAL!! r.DoInitViewsLightingAfterPrepass -
Toggle EarlyZPass settings:
- r.EarlyZPass=1
- r.EarlyZPassMovable=True
- r.EarlyZPassOnlyMaterialMasking
- r.MinScreenRadiusForDepthPrepass=0.3
- r.CustomDepth.Order -
Animation Update and Evaluation
a.ParallelAnimEvaluation
a.ParallelAnimUpdate
a.ForceParallelAnimUpdate
-
Compute Skinning
- r.SkinCache.Mode=1
- r.SkinCache.CompileShaders=1
- r.MorphTarget.Mode=1
- r.SkinCache.MaxGPUElementsPerFrame (can't find this)
- r.SkinCache.BufferSize (can't find this)
- r.SkinCache.NumTangentIntermediateBuffers
- r.SkinCache.SceneMemoryLimitInMB -
FX
- FX.AllowGPUSorting
- FX.AllowCulling
- FX.AllowAsyncTick
- FX.EarlyScheduleAsync
- FX.GPUCollisionDepthBounds
- FX.MaxParticleTilePreAllocation
- FX.ParticleCollisionIgnoreInvisibleTime
- FX.ParticleSlackGPU -
Render Target settings
- r.ClearSceneMethod=1
- r.SceneColorFormat=3
- r.GBufferFormat=1 -
Lighting & GI
- r.Cache.LightingCacheMovableObjectAllocationSize
- r.Cache.LightingCacheDimension
- r.Cache.UpdatePrimsTaskEnabled
- r.MinScreenRadiusForLights
- r.MinScreenRadiusForDepthPrepass -
Misc
- r.Forward.LightGridPixelSize
- r.Forward.LightGridSizeZ
- r.Forward.MaxCulledLightsPerCell
- r.Forward.LightLinkedListCulling
- r.DeferUniformBufferUpdatesUntilVisible
- r.UseParallelGetDynamicMeshElementsTasks
- r.Tonemapper.Quality
Quality Trade-Offs#
- Toggle TranslucentLightingVolume settings
- r.TranslucentLightingVolume
- r.TranslucentVolumeMinFOV
- r.TranslucentVolumeFOVSnapFactor
- r.TranslucencyVolumeBlur
- r.TranslucencyLightingVolumeDim
- r.TranslucencyLightingVolumeInnerDistance
-
r.TranslucencyLightingVolumeOuterDistance
(Inner & Outer distance are the ones to change for getting around the popping) -
Toggle Custom Depth
-
r.CustomDepth=0
-
Toggle Separate Translucency
- r.SeparateTranslucency=False
- r.SeparateTranslucencyAutoDownsample=1
- r.SeparateTranslucencyScreenPercentage=100
- r.SeparateTranslucencyDurationDownsampleThreshold=1
-
r.SeparateTranslucencyDurationUpsampleThreshold=0.25
-
RenderTargets & PostProcessing
- r.DBuffer
- r.Atmosphere
- r.CapsuleShadows
- r.ContactShadows
-
r.HighQualityLightMaps
-
AA
- r.DefaultFeature.AntiAliasing=3
- r.MSAA.CompositingSampleCount=4
- r.MSAACount=4 (0=> TXAA, 1=>No MSAA, 2,4,8=> MSAA Count)
- r.WideCustomResolve
-
r.DoTiledReflections
-
DBuffer
-
r.DBuffer=False
-
GI
- r.Cache.UpdateEveryFrame
-
r.Cache.SampleTransitionSpeed
-
Misc Graphics Quality:
- r.FastBlurThreshold=0
- r.BloomQuality=1
- r.MaxAnisotropy=8
-
r.LightFunctionQuality
-
Skinning:
- r.GPUSkin.Limit2BoneInfluences
-
r.SkinCache.RecomputeTangents
-
FX:
- FX.GPUCollisionDepthBounds=250
- FX.MaxCPUParticlesPerEmitter=1000
- FX.MaxGPUParticlesSpawnedPerFrame=524288
- FX.GPUSpawnWarningThreshold=10000
- r.GPUParticle.FixDeltaSeconds
- r.GPUParticle.FixTolerance
- r.GPUParticle.MaxNumIterations
-
r.ParticleLightQuality
-
Reflection Captures
- r.ReflectionEnvironment
- r.ReflectionCaptureResolution=128
- r.ReflectionEnvironmentBeginMixingRoughness=0.1
- r.ReflectionEnvironmentEndMixingRoughness=0.3
- r.ReflectionEnvironmentLightmapMixBasedOnRoughness
- r.ReflectionEnvironmentLightmapMixing
-
r.ReflectionEnvironmentLightmapMixLargestWeight=10000
-
Big Kludges:
- r.pd=1
Detailed Root Cause Analysis#
Overview#
Common stat options: [-ms=5.0][-root=empty] [leaf=empty][-depth=maxint] [-nodisplay]
stat groupname[+] - toggles displaying stats group, + enables hierarchical display
stat namedmarker #markername# - adds a custom marker to the stats stream
stat hier -group=groupname [-sortby=name][-maxhistoryframes=60] [-reset][-maxdepth=4]
stat group list|listall|enable name|disable name|none|all|default - manages enabling/disabling recording of the stats groups. Doing stat [groupname] automatically enables that group
stat none - visually turn off all stats (recording is still active)
-
Find perf offending causers:
Bashstat slow [-ms=1.0][-depth=4] - toggles displaying the game and render thread stats stat dumpevents [-ms=0.2][-all] - dumps events history for slow events, -all adds other threads besides game and render
-
After narrowing down, dump specific stat group frame
Bashstat dumpframe [-ms=5.0][-root=empty] [leaf=empty][-depth=maxint] - dumps a frame of stats stat dumpframe -ms=.001 -root=initviews stat dumpframe -ms=.001 -root=shadow
Get more consistent stats:
Bashstat dumpave|dumpmax|dumpsum [-start | -stop | -num=30][-ms=5.0] [-root=empty][leaf=empty] [-depth=maxint] - aggregate stats over multiple frames
-
Hitches
Bashstat dumphitches [-start | -stop | no explicit option toggles ] - toggles dumping hitches t.HitchThreshold to set threshold
-
Record to disk
Bashstat startfile - starts dumping a capture stat stopfile - stops dumping a capture (regular, raw, memory) Low stat startfileraw - starts dumping a raw capture
General#
- Game Thread:
- stat Game
- tick.LogTicks
- dumpticks
- tick.showPrerequistes
-
Threading Stalls
- stat Threading
- stat CPUStalls -
Engine UObject System/Constructing UObjects/PostInit/Allocation/etc:
- stat Object
- stat ObjectVerbose
- stat GC
-
Game Thread Scene Update:
- stat Component
- stat UObjects
- stat SceneUpdate (only the GT timers)
- stat Character
- stat Tickables (things like movieplayer, timermanager, etc)
- Tick.LogTicks = 1 or dumpticks -
Triangle Count/Frame/Render/Game/GPU timings:
- stat Engine
- stat RHI
- stat SceneRendering
- RenderViewFamily = Render Thread
- InitViews = Culling, dependent on how many objects (not just visible) in the scene
-
Inspect CPU:
- stat dumpcpu
- stat ServerCPU
- stat CPUStalls -
Perf By Tick Functions/Tasks/"Job System":
- stat TaskGraphTasks
- stat Tickables
- stat TickGroups -
Animation:
- stat Anim
- stat MorphTarget
- stat MovieSceneEval
- stat GPUSkinCache
- stat Particles
- ANIMSEQSTATS -
Physics:
- stat Physics
- stat PhysXTasks
- stat Collision
- stat CollisionVerbose
- stat CollisionTags
- stat Character
- stat ImmediatePhysics
-
FX
- stat Particles
- stat ParticleMem
- stat GPUParticles
- stat Emitters
- stat BeamParticles
- stat MeshParticles
- stat TrailParticles
- DUMPPARTICLECOUNTS
- DUMPPARTICLEMEM
- PARTICLEMESHUSAGE
- LISTPARTICLESYSTEMS -
Misc
- stat Quick
- r.DisplayInternals
Render Thread#
- DrawThread/Scene Update Stalls:
- stat SceneRendering
- stat SceneUpdate
-
D3D Driver overhead:
- stat d3d11rhi -
Render Thread Command Marshalling from Game Thread
- stat RenderThreadCommands
- stat RHICmdList
- stat CommandListMarkers
- stat ParallelCommandListMarkers
- stat LightRendering
-
Dump Material/Shader inf
- DumpMaterialStats: Dump material information
- DumpShaderStats: Dump shader information
- DumpShaderPipelineStats: Dump shader pipeline information -
Visibility Culling & Primitive Component count:
- stat initviews
- Displays information on how long visibility culling took and how effective it was. Visible section count is the single most important stat with respect to rendering thread performance, and that is dominated by Visible Static Mesh Elements under STAT INITVIEWS, but Visible Dynamic Primitives also factors in.
- show camerafrustums
- show bounds
GPU#
-
GPU
- stat GPU
- stat RHI (GPU Memory Pressure) -
Texture Bandwidth
- showMipLevels
- VisRT
- r.VisualizeTexturePool
- ListTextures
- ListStreamingTextures -
GI
- r.Cache.DrawInterpolationPoints
- r.Cache.DrawDirectionalShadowing
- r.Cache.DrawLightingSamples -
Post-Processing
- r.ListSceneColorMaterials -
VR
- stat OculusHMD
- stat Oculus -
Misc
- r.GPUBusyWait
- SynthBenchmark
Advanced#
-
Hitches
-
stat dumphitches
-
CauseHitches
-
Memory
- TODO: Add a Button to explain how to Launch MTuner
- TODO: Add a Button to explain how to Launch igmemtrace
memreport [-full]
stat dumpnonframe [groupname]
stat toggledebug
stat TextureGroup
stat TexturePool
stat LLMPlatform
stat LLM
stat LLMMalloc
stat LLMRHI
stat LLMAssets
stat Memory
stat MemoryPlatform
stat MemoryAllocator
stat MemoryStaticMesh
stat SceneMemory
memreport -fullprof
- Misc
stat dumpnonframe [groupname]
stat Levels
stat LoadTime
stat LoadTimeVerbose
stat AsyncLoad
stat AsyncLoadGameThread
stat Streaming / stat streaming sortby=name
stat StreamingDetails
PauseTextureStreaming
DumpLightmapSizeOnDisk
r.DumpRenderTargetPoolMemory
rhi.DumpMemory
r.RenderTargetPool.Events
r.RenderTargetPoolMin