Options

This section describes in detail all available configuration options exposed by the profiling layer.

output

Type:: enum
Default:: overlay

Type of the output used for presenting the profiling data. By default the layer creates an overlay in the application’s window and intercepts incoming input events when the mouse cursor is on top of the overlay. This is the most convenient output that also supports all features of the layer.

The following outputs are available:

overlay: An overlay is displayed in the application’s window. It intercepts incoming input events to allow the user to interact with it and browse through the profiling data. The overlay displays live performance data, which can be paused and resumed at any time. It also provides useful widgets showing post-processed data, such as command queue utilization graphs, top pipelines list, drawcall statistics, shader code disassembly, and more.
trace_file: The profiling data is written directly to a trace file in the JSON format. It is useful when profiling applications that don’t present the rendered image in a window, such as command line applications and compute-only workloads. The data is limited to timestamp query results only.

output_trace_file

Type:: path
Default:: empty

When output is set to trace_file, this option allows to override the default file name and location of the output trace file.

enable_performance_query_ext

Type:: bool
Default:: true

Enables VK_INTEL_performance_query extension and collects detailed performance metrics from Intel graphics cards. The metrics are collected at VkCommandBuffer level and then aggregated into the entire frame. The scope of available metrics depends on the driver and the GPU used for measurements.

enable_pipeline_executable_properties_ext

Type:: bool
Default:: false

Enables VK_KHR_pipeline_executable_properties extension and collects detailed information about the pipeline shader stages, including shader statistics and internal representations, if available.

Enabling this option may slightly impact shader compilation time and memory usage.

enable_render_pass_begin_end_profiling

Type:: bool
Default:: false

When sampling_mode is set to render_pass or higher, setting this option will enable profiling of vkCmdBeginRenderPass and vkCmdEndRenderPass commands.

capture_indirect_args

Type:: bool
Default:: false

When enabled, the layer will capture indirect draw and dispatch arguments from the command buffer. This allows to analyze the actual parameters used for indirect draws and dispatches, which can be useful for debugging and performance analysis.

The option has significant performance and memory overhead due to additional copying of the indirect argument buffers to the host memory.

set_stable_power_state

Type:: bool
Default:: true

Uses DirectX12 API to set the GPU to a stable power state before profiling. This can help to reduce variability in performance measurements caused by power state changes during the profiling session. The option is only applicable for Windows platforms only.

enable_threading

Type:: bool
Default:: true

Enables multithreading support in the profiling layer.

When enabled, the layer will use multiple threads to process profiling data, which can significantly improve performance and reduce overhead. However, it may cause frequent context switches when the application is heavily multithreaded, which can lead to performance degradation in some cases.

It is recommended to leave this option enabled, but it can be disabled for specific use cases where multithreading is not beneficial.

sampling_mode

Type:: enum
Default:: drawcall

Defines the granularity of the profiling data collected by the layer.

The following sampling modes are available:

drawcall: The layer will collect profiling data for each draw call and dispatch command. This is the most detailed mode and provides the best insight into the performance of individual rendering commands. However, it may have higher overhead, especially for applications with a large number of draw calls or dispatches.
pipeline: The layer will collect profiling data for each pipeline, inserting timestamp queries when a new pipeline state is used. This mode provides a good balance between detail and overhead, allowing to analyze performance of individual pipelines without the overhead of collecting data for each draw call or dispatch command.
render_pass: The layer will collect profiling data for each render pass, dynamic rendering pass, compute or transfer commands pass, inserting timestamp queries at boundaries of those passes.
command_buffer: The layer will collect profiling data for each command buffer, placing timestamp queries at the beginning and end of each command buffer. This is the most coarse-grained mode supported by the layer.

frame_delimiter

Type:: enum
Default:: present

Defines the granularity of the frame boundaries used for profiling data.

The following frame delimiters are available:

present: The layer will delimit frames at swapchain present operations. This is the default mode and is recommended for most applications that use swapchains for rendering.
submit: The layer will delimit frames at command buffer submission operations. This mode is useful for applications that do not use swapchains or each submission should be considered as a separate frame.

frame_count

Type:: int
Default:: 1

The number of frames to profile. When output is set to overlay, this option controls how many frames of profiling data are displayed in the overlay. When output is set to trace_file, this option controls how many frames of profiling data are written to the trace file.

When this option is set to 0, the layer will profile all frames until the profiling session is stopped manually.

ref_pipelines

Type:: path
Default:: empty

ref_metrics

Type:: path
Default:: empty