Skip to content

Commit d0cc183

Browse files
committed
Docs manual for powermetricsl
1 parent 02f88f0 commit d0cc183

File tree

1 file changed

+351
-0
lines changed

1 file changed

+351
-0
lines changed

docs/powermetrics.manual

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
powermetrics(1) General Commands Manual powermetrics(1)
2+
3+
NNAAMMEE
4+
ppoowweerrmmeettrriiccss
5+
6+
SSYYNNOOPPSSIISS
7+
ppoowweerrmmeettrriiccss [--ii _s_a_m_p_l_e___i_n_t_e_r_v_a_l___m_s] [--rr _o_r_d_e_r] [--tt _w_a_k_e_u_p___c_o_s_t]
8+
[--oo _o_u_t_p_u_t___f_i_l_e] [--nn _s_a_m_p_l_e___c_o_u_n_t]
9+
10+
DDEESSCCRRIIPPTTIIOONN
11+
ppoowweerrmmeettrriiccss gathers and display CPU usage statistics (divided into time
12+
spent in user mode and supervisor mode), timer and interrupt wakeup
13+
frequency (total and, for near-idle workloads, those that resulted in
14+
package idle exits), and on supported platforms, interrupt frequencies
15+
(categorized by CPU number), package C-state statistics (an indication of
16+
the time the core complex + integrated graphics, if any, were in low-
17+
power idle states), CPU frequency distribution during the sample. The
18+
tool may also display estimated power consumed by various SoC subsystems,
19+
such as CPU, GPU, ANE (Apple Neural Engine). Note: Average power values
20+
reported by powermetrics are estimated and may be inaccurate - hence they
21+
should not be used for any comparison between devices, but can be used to
22+
help optimize apps for energy efficiency.
23+
24+
--hh, ----hheellpp
25+
Print help message.
26+
27+
--ss _s_a_m_p_l_e_r_s, ----ssaammpplleerrss _s_a_m_p_l_e_r_s
28+
Comma separated list of samplers and sampler groups. Run with -h
29+
to see a list of samplers and sampler groups. Specifying
30+
"default" will display the default set, and specifying "all" will
31+
display all supported samplers.
32+
33+
--oo _f_i_l_e, ----oouuttppuutt--ffiillee _f_i_l_e
34+
Output to _f_i_l_e instead of stdout.
35+
36+
--bb _s_i_z_e, ----bbuuffffeerr--ssiizzee _s_i_z_e
37+
Set output buffer _s_i_z_e (0=none, 1=line)
38+
39+
--ii _N, ----ssaammppllee--rraattee _N
40+
sample every _N ms (0=disabled) [default: 5000ms]
41+
42+
--nn _N, ----ssaammppllee--ccoouunntt _N
43+
Obtain _N periodic samples (0=infinite) [default: 0]
44+
45+
--tt _N, ----wwaakkeeuupp--ccoosstt _N
46+
Assume package idle wakeups have a CPU time cost of _N us when
47+
using hybrid sort orders using idle wakeups with time-based
48+
metrics
49+
50+
--rr _m_e_t_h_o_d, ----oorrddeerr _m_e_t_h_o_d
51+
Order process list using specified _m_e_t_h_o_d [default: composite]
52+
53+
[pid]
54+
process identifier
55+
[wakeups]
56+
total package idle wakeups (alias: -W)
57+
[cputime]
58+
total CPU time used (alias: -C)
59+
[composite]
60+
energy number, see --show-process-energy (alias: -O)
61+
62+
--ff _f_o_r_m_a_t, ----ffoorrmmaatt _f_o_r_m_a_t
63+
Display data in specified format [default: text]
64+
65+
[text]
66+
human-readable text output
67+
[plist]
68+
machine-readable property list, NUL-separated
69+
70+
--aa _N, ----ppoowweerraavvgg _N
71+
Display poweravg every _N samples (0=disabled) [default: 10]
72+
73+
----hhiiddee--ccppuu--dduuttyy--ccyyccllee
74+
Hide CPU duty cycle data
75+
76+
----sshhooww--iinniittiiaall--uussaaggee
77+
Print initial sample for entire uptime
78+
79+
----sshhooww--uussaaggee--ssuummmmaarryy
80+
Print final usage summary when exiting
81+
82+
----sshhooww--ppssttaatteess
83+
Show pstate distribution. Only available on certain hardware.
84+
85+
----sshhooww--pplliimmiittss
86+
Show plimits, forced idle and RMBS. Only available on certain
87+
hardware.
88+
89+
----sshhooww--ccppuu--qqooss
90+
Show per cpu QOS breakdowns.
91+
92+
----sshhooww--pprroocceessss--ccooaalliittiioonn
93+
Group processes by coalitions and show per coalition information.
94+
Processes that have exited during the sample will still have
95+
their time billed to the coalition, making this useful for
96+
disambiguating DEAD_TASK time.
97+
98+
----sshhooww--rreessppoonnssiibbllee--ppiidd
99+
Show responsible pid for xpc services and parent pid
100+
101+
----sshhooww--pprroocceessss--wwaaiitt--ttiimmeess
102+
Show per-process sfi wait time info
103+
104+
----sshhooww--pprroocceessss--qqooss--ttiieerrss
105+
Show per-process qos latency and throughput tier
106+
107+
----sshhooww--pprroocceessss--iioo
108+
Show per-process io information
109+
110+
----sshhooww--pprroocceessss--ggppuu
111+
Show per-process gpu time. This is only available on certain
112+
hardware.
113+
114+
----sshhooww--pprroocceessss--nneettssttaattss
115+
Show per-process network information
116+
117+
----sshhooww--pprroocceessss--qqooss
118+
Show QOS times aggregated by process. Per thread information is
119+
not available.
120+
121+
----sshhooww--pprroocceessss--eenneerrggyy
122+
Show per-process energy impact number. This number is a rough
123+
proxy for the total energy the process uses, including CPU, GPU,
124+
disk io and networking. The weighting of each is platform
125+
specific. Enabling this implicitly enables sampling of all the
126+
above per-process statistics.
127+
128+
----sshhooww--pprroocceessss--ssaammpp--nnoorrmm
129+
Show CPU time normailzed by the sample window, rather than the
130+
process start time. For example a process that launched 1 second
131+
before the end of a 5 second sample window and ran continuously
132+
until the end of the window will show up as 200 ms/s here and
133+
1000 ms/s in the regular column.
134+
135+
----sshhooww--pprroocceessss--iippcc
136+
Show per-process Instructions and cycles on ARM machines. Use
137+
with --show-process-amp to show cluster stats.
138+
139+
----sshhooww--aallll
140+
Enables all samplers and displays all the available information
141+
for each sampler.
142+
143+
This tool also implements special behavior upon receipt of certain
144+
signals to aid with the automated collection of data:
145+
146+
SIGINFO
147+
take an immediate sample
148+
SIGIO
149+
flush any buffered output
150+
SIGINT/SIGTERM/SIGHUP
151+
stop sampling and exit
152+
153+
OOUUTTPPUUTT
154+
_G_u_i_d_e_l_i_n_e_s _f_o_r _e_n_e_r_g_y _r_e_d_u_c_t_i_o_n
155+
156+
CPU time, deadlines and interrupt wakeups: Lower is better
157+
158+
Interrupt counts: Lower is better
159+
160+
C-state residency: Higher is better
161+
162+
_R_u_n_n_i_n_g _T_a_s_k_s
163+
164+
1. CPU time consumed by threads assigned to that process, broken down
165+
into time spent in user space and kernel mode.
166+
167+
2. Counts of "short" timers (where the time-to-deadline was < 5
168+
milliseconds in the future at the point of timer creation) which woke up
169+
threads from that process. High frequency timers, which typically have
170+
short time-to-deadlines, can result in significant energy consumption.
171+
172+
3. A count of total interrupt level wakeups which resulted in dispatching
173+
a thread from the process in question. For example, if a thread were
174+
blocked in a usleep() system call, a timer interrupt would cause that
175+
thread to be dispatched, and would increment this counter. For workloads
176+
with a significant idle component, this metric is useful to study in
177+
conjunction with the package idle exit metric reported below.
178+
179+
4. A count of "package idle exits" induced by timers/device interrupts
180+
which awakened threads from the process in question. This is a subset of
181+
the interrupt wakeup count. Timers and other interrupts that trigger
182+
"package idle exits" have a greater impact on energy consumption relative
183+
to other interrupts. With the exception of some Mac Pro systems, Mac and
184+
iOS systems are typically single package systems, wherein all CPUs are
185+
part of a single processor complex (typically a single IC die) with
186+
shared logic that can include (depending on system specifics) shared last
187+
level caches, an integrated memory controller etc. When all CPUs in the
188+
package are idle, the hardware can power-gate significant portions of the
189+
shared logic in addition to each individual processor's logic, as well as
190+
take measures such as placing DRAM in to self-refresh (also referred to
191+
as auto-refresh), place interconnects into lower-power states etc. Hence
192+
a timer or interrupt that triggers an exit from this package idle state
193+
results in a a greater increase in power than a timer that occurred when
194+
the CPU in question was already executing. The process initiating a
195+
package idle wakeup may also be the "prime mover", i.e. it may be the
196+
trigger for further activity in its own or other processes. This metric
197+
is most useful when the system is relatively idle, as with typical light
198+
workloads such as web browsing and movie playback; with heavier
199+
workloads, the CPU activity can be high enough such that package idle
200+
entry is relatively rare, thus masking package idle exits due to the
201+
process/thread in question.
202+
203+
5. If any processes arrived and vanished during the inter-sample
204+
interval, or a previously sampled process vanished, their statistics are
205+
reflected in the row labeled "DEAD_TASKS". This can identify issues
206+
involving transient processes which may be spawned too frequently. dtrace
207+
("execsnoop") or other tools can then be used to identify the transient
208+
processes in question. Running powermetrics in coalition mode, (see
209+
below), will also help track down transient process issues, by billing
210+
the coalition to which the process belongs.
211+
212+
_I_n_t_e_r_r_u_p_t _D_i_s_t_r_i_b_u_t_i_o_n
213+
214+
The interrupts sampler reports interrupt frequencies, classified by
215+
interrupt vector and associated device, on a per-CPU basis. Mac OS
216+
currently assigns all device interrupts to CPU0, but timers and
217+
interprocessor interrupts can occur on other CPUs. Interrupt frequencies
218+
can be useful in identifying misconfigured devices or areas of
219+
improvement in interrupt load, and can serve as a proxy for identifying
220+
device activity across the sample interval. For example, during a
221+
network-heavy workload, an increase in interrupts associated with Airport
222+
wireless ("ARPT"), or wired ethernet ("ETH0" "ETH1" etc.) is not
223+
unexpected. However, if the interrupt frequency for a given device is
224+
non-zero when the device is not active (e.g. if "HDAU" interrupts, for
225+
High Definition Audio, occur even when no audio is playing), that may be
226+
a driver error. The int_sources sampler attributes interrupts to the
227+
responsible InterruptEventSources, which helps disambiguate the cause of
228+
an interrupt if the vector serves more than one source.
229+
230+
_B_a_t_t_e_r_y _S_t_a_t_i_s_t_i_c_s
231+
232+
The battery sampler reports battery discharge rates, current and maximum
233+
charge levels, cycle counts and degradation from design capacity across
234+
the interval in question, if a delta was reported by the battery
235+
management unit. Note that the battery controller data may arrive out-of-
236+
phase with respect to powermetrics samples, which can cause aliasing
237+
issues across short sample intervals. Discharge rates across
238+
discontinuities such as sleep/wake may also be inaccurate on some
239+
systems; however, the rate of change of the total charge level across
240+
longer intervals is a useful indicator of total system load. Powermetrics
241+
does not filter discharge rates for A/C connect/disconnect events, system
242+
sleep residency etc. Battery discharge rates are typically not comparable
243+
across machine models.
244+
245+
_P_r_o_c_e_s_s_o_r _E_n_e_r_g_y _U_s_a_g_e
246+
247+
The cpu_power sampler reports data derived from the Intel energy models;
248+
as of the Sandy Bridge intel microarchitecture, the Intel power control
249+
unit internally maintains an energy consumption model whose details are
250+
proprietary, but are likely based on duty cycles for individual execution
251+
units, current voltage/frequency etc. These numbers are not strictly
252+
accurate but are correlated with actual energy consumption. This section
253+
lists: power dissipated by the processor package which includes the CPU
254+
cores, the integrated GPU and the system agent (integrated memory
255+
controller, last level cache), and separately, CPU core power and GT
256+
(integrated GPU) power (the latter two in a forthcoming version). The
257+
energy model data is generally not comparable across machine models.
258+
259+
The cpu_power sampler next reports, on processors with Nehalem and newer
260+
microarchitectures, hardware derived processor frequency and idle
261+
residency information, labeled "P-states" and "C-states" respectively in
262+
Intel terminology.
263+
264+
C-states are further classified in to "package c-states" and per-core C-
265+
states. The processor enters a "c-state" in the scheduler's idle loop,
266+
which results in clock-gating or power-gating CPU core and, potentially,
267+
package logic, considerably reducing power dissipation. High package c-
268+
state residency is a goal to strive for, as energy consumption of the CPU
269+
complex, integrated memory controller if any and DRAM is significantly
270+
reduced when in a package c-state. Package c-states occur when all CPU
271+
cores within the package are idle, and the on-die integrated GPU if any
272+
(SandyBridge mobile and beyond), on the system is also idle. Powermetrics
273+
reports package c-state residency as a fraction of the time sampled. This
274+
is available on Nehalem microarchitecture and newer processors. Note that
275+
some systems, such as Mac Pros, do not enable "package" c-states.
276+
277+
Powermetrics also reports per-core c-state residencies, signifying when
278+
the core in question (which can include multiple SMTs or "hyperthreads")
279+
is idle, as well as active/inactive duty cycle histograms for each
280+
logical processor within the core. This is available on Nehalem
281+
microarchitecture and newer processors.
282+
283+
This section also lists the average clock frequency at which the given
284+
logical processor executed when not idle within the sampled interval,
285+
expressed as both an absolute frequency in MHz and as a percentage of the
286+
nominal rated frequency. These average frequencies can vary due to the
287+
operating system's demand based dynamic voltage and frequency scaling.
288+
Some systems can execute at frequencies greater than the nominal or "P1"
289+
frequency, which is termed "turbo mode" on Intel systems. Such operation
290+
will manifest as > 100% of nominal frequency. Lengthy execution in turbo
291+
mode is typically energy inefficient, as those frequencies have high
292+
voltage requirements, resulting in a correspondingly quadratic increase
293+
in power insufficient to outweigh the reduction in execution time.
294+
Current systems typically have a single voltage/frequency domain per-
295+
package, but as the processors can execute out-of-phase, they may display
296+
different average execution frequencies.
297+
298+
_D_i_s_k _U_s_a_g_e _a_n_d _N_e_t_w_o_r_k _A_c_t_i_v_i_t_y
299+
300+
The network and disk samplers reports deltas in disk and network activity
301+
that occured during the sample. Also specifying --show-process-netstats
302+
and --show-process-io will give you this information on a per process
303+
basis in the tasks sampler.
304+
305+
_B_a_c_k_l_i_g_h_t _l_e_v_e_l
306+
307+
The battery sampler also reports the instantaneous value of the backlight
308+
luminosity level. This value is likely not comparable across systems and
309+
machine models, but can be useful when comparing scenarios on a given
310+
system.
311+
312+
_D_e_v_i_c_e_s
313+
314+
The devices sampler, for each device, reports the time spent in each of
315+
the device's states over the course of the sample. The meaning of the
316+
different states is specific to each device. Powermetrics denotes low
317+
power states with an "L", device usable states with a "U" and power on
318+
states with an "O".
319+
320+
_S_M_C
321+
322+
The smc sampler displays information supplied by the System Management
323+
Controller. On supported platforms, this includes fan speed and
324+
information from various temperature sensors. These are instantaneous
325+
values taken at the end of the sample window, and do not necessarily
326+
reflect the values at other times in the window.
327+
328+
_T_h_e_r_m_a_l
329+
330+
The thermal sampler displays the current thermal pressure the system is
331+
under. This is an instantaneous value taken at the end of the sample
332+
window, and does not necessarily reflect the value at other times in the
333+
window.
334+
335+
_S_F_I
336+
337+
The sfi sampler shows system wide selective forced idle statistics.
338+
Selective forced idle is a mechanism the operating system uses to limit
339+
system power while minimizing user impact, by throttling certain threads
340+
on the system. Each thread belongs to an SFI class, and this sampler
341+
displays how much each SFI class is currently being throttled or empty if
342+
none of them is throttled. These are instantaneous values taken at the
343+
end of the sample window, and do not necessarily reflect the values at
344+
other times in the window. To get SFI wait time statistics on a per
345+
process basis use --show-process-wait-times.
346+
347+
KKNNOOWWNN IISSSSUUEESS
348+
Changes in system time and sleep/wake can cause minor inaccuracies in
349+
reported cpu time.
350+
351+
Darwin 5/1/12 Darwin

0 commit comments

Comments
 (0)