インテル® VTune™ Amplifier 2018 ヘルプ
Intel® VTune™ Amplifier collects call stack information during User-Mode Sampling and Tracing Collection or Hardware Event-based Sampling Collection with Stacks with stack collection enabled. Use the callstacks report to see how the hot functions are called. This report type focuses on call sequences, beginning from the functions that take most CPU time.
You can use the -column option to filter the callstacks report and focus on the specific metric, for example:
$ amplxe-cl -report -callstacks -r r001ah -column="CPI Rate"
To display a list of columns available for callstacks report, enter: amplxe-cl -report callstacks -r <result_dir> column=?
Example 1: Callstacks Report with Limited Items
The following example generates a callstacks report for the most recent analysis result and limits the number of functions and function stacks to 5 items.
$ amplxe-cl -report callstacks -limit 5
On Windows*:
Function Function Stack CPU Time Module Function (Full) Source File Start Address
-------------- ----------------- -------- ----------------- ------------------------------- ----------------- -------------
grid_intersect 5.436s analyze_locks.exe grid_intersect grid.cpp 0x40d340
intersect_objects 1.918s analyze_locks.exe intersect_objects(struct ray *) intersect.cpp 0x402840
shader 0s analyze_locks.exe shader(struct ray *) shade.cpp 0x404730
trace 0s analyze_locks.exe trace(struct ray *) trace_rest.cpp 0x402370
render_one_pixel 0s analyze_locks.exe render_one_pixel analyze_locks.cpp 0x401db0
...
On Linux*:
Function Function Stack CPU Time Module Function (Full) Source File Start Address
-------------------- ----------------- -------- --------------------- ------------------------ ----------------- -------------
initialize_2D_buffer 22.746s tachyon_find_hotspots initialize_2D_buffer find_hotspots.cpp 0x4018f0
render_one_pixel 22.746s tachyon_find_hotspots render_one_pixel find_hotspots.cpp 0x401950
draw_trace 0s tachyon_find_hotspots draw_trace(void) find_hotspots.cpp 0x401d70
thread_trace 0s tachyon_find_hotspots thread_trace(thr_parms*) find_hotspots.cpp 0x401ef0
trace_shm 0s tachyon_find_hotspots trace_shm trace_rest.cpp 0x410a20
trace_region 0s tachyon_find_hotspots trace_region trace_rest.cpp 0x410aa0
rt_renderscene 0s tachyon_find_hotspots rt_renderscene(void*) api.cpp 0x402360
tachyon_video 0s tachyon_find_hotspots tachyon_video video.cpp 0x402240
main 0s tachyon_find_hotspots main video.cpp 0x4013e0
__libc_start_main 0s libc.so.6 __libc_start_main libc-start.c 0x21dd0
_start 0s tachyon_find_hotspots _start [Unknown] 0x40149c
grid_intersect 7.282s tachyon_find_hotspots grid_intersect grid.cpp 0x408930
intersect_objects 2.756s tachyon_find_hotspots intersect_objects(ray*) intersect.cpp 0x40a400
shader 0s tachyon_find_hotspots shader(ray*) shade.cpp 0x40eae0
...
Example 2: Callstacks Report with Callstack Grouping
This example generates a callstacks report for the r001lw result that is grouped by function call stacks.
$ amplxe-cl -report callstacks -r r001lw -group-by callstack
On Windows*:
Function/Function Stack Wait Time Module Function (Full)
----------------------------------------- --------- ----------------- -----------------------------------------
tbb::internal::acquire_binsem_using_event 20.005s tbb.dll tbb::internal::acquire_binsem_using_event
func@0x10003350 13.857s gdiplus.dll func@0x10003350
func@0x1000c1f0 0s gdiplus.dll func@0x1000c1f0
BaseThreadInitThunk 0s KERNEL32.DLL BaseThreadInitThunk
func@0x6b2dacf0 0s ntdll.dll func@0x6b2dacf0
func@0x6b2daccf 0s ntdll.dll func@0x6b2daccf
video::main_loop 10.111s analyze_locks.exe video::main_loop(void)
main 0s analyze_locks.exe main
WinMain 0s analyze_locks.exe WinMain
_tmainCRTStartup 0s analyze_locks.exe _tmainCRTStartup
[Unknown stack frame(s)] 0s [Unknown] [Unknown stack frame(s)]
BaseThreadInitThunk 0s KERNEL32.DLL BaseThreadInitThunk
func@0x6b2dacf0 0s ntdll.dll func@0x6b2dacf0
...
On Linux*:
Function/Function Stack Wait Time Module Function (Full)
------------------------------- --------- --------------------- -----------------------------------------------------------
draw_task::operator() 98.698s tachyon_analyze_locks draw_task::operator()(tbb::blocked_range<int> const&) const
tbb::interface6::internal 0s tachyon_analyze_locks tbb::interface6::internal
execute<tbb::interface6::internal 0s tachyon_analyze_locks execute::interface6::internal
[TBB parallel_for on draw_task] 0s tachyon_analyze_locks tbb::interface6::internal::execute(void)
[TBB Dispatch Loop] 0s libtbb.so.2 tbb::internal::local_wait_for_all(tbb::task&, tbb::task*)
...