Scott Lurndal wrote:
"Paul A. Clayton" <paaronclayton@gmail.com> writes:
On 3/24/24 4:39 PM, Scott Lurndal wrote:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
My 66000 Architecture defines 8 performance counters at each layer of
the design:: cores gets 8 counters, L1s gets 8 counters, L3s gets 8
counters Interconnect gets 8 counters, Memory Controller gets 8 counters, >PCIe root gets 8 counters--and every instance multiplies the counters.
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
scott@slp53.sl.home (Scott Lurndal) writes:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
Interesting. I would have expected that the likes of me are few and
far between, and easy to ignore for a big company like ARM, Intel or AMD.
My theory was that the CPU manufacturers put performance monitoring
counters in CPUs in order to understand the performance of real-world programs themselves, and how they should tweak the successor core to
relieve it of bottlenecks.
- anton
scott@slp53.sl.home (Scott Lurndal) writes:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
Interesting. I would have expected that the likes of me are few and
far between, and easy to ignore for a big company like ARM, Intel
or AMD.
scott@slp53.sl.home (Scott Lurndal) writes:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
Interesting. I would have expected that the likes of me are few and
far between, and easy to ignore for a big company like ARM, Intel or AMD.
My theory was that the CPU manufacturers put performance monitoring
counters in CPUs in order to understand the performance of real-world programs themselves, and how they should tweak the successor core to
relieve it of bottlenecks.
In article <2024Mar25.193535@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
Interesting. I would have expected that the likes of me are few and
far between, and easy to ignore for a big company like ARM, Intel
or AMD.
The question is if "users" to ARM Holdings are actual end-users, or the
SoC manufacturers who build chips incorporating Aarch64 cores. I'd expect >most of the latter to want those features so that they can understand the >performance of their silicon better.
Anton Ertl wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
There is a significant demand for performance monitoring. Note
that in addition to to standard performance monitoring registers,
AArch64 also (optionally) supports statistical profiling and
out-of-band instruction tracing (ETF). The demand from users
is such that all those features are present in most designs.
Interesting. I would have expected that the likes of me are few and
far between, and easy to ignore for a big company like ARM, Intel or AMD.
My theory was that the CPU manufacturers put performance monitoring
counters in CPUs in order to understand the performance of real-world
programs themselves, and how they should tweak the successor core to
relieve it of bottlenecks.
Having reverse engineered the original Pentium EMON counters I got a
meeting with Intel about their next cpu (the PentiumPro), what I was
told about the Pentium was that this chip was the first one which was
too complicated to create/sell an In-Circuit Emulator (ICE) version, so >instead they added a bunch of counters for near-zero overhead monitoring
and depended on a bit-serial read-out when they needed to dump all state
for debugging. (I have forgotten the proper term for that interface! :-( )
The question is if "users" to ARM Holdings are actual end-users, or the
SoC manufacturers who build chips incorporating Aarch64 cores. I'd expect >most of the latter to want those features so that they can understand the >performance of their silicon better.
The biggest demand is from the OS vendors. Hardware folks have
simulation and emulators.
Look at vtune, for example.
Terje Mathisen <terje.mathisen@tmsw.no> writes:
Having reverse engineered the original Pentium EMON counters I got a
meeting with Intel about their next cpu (the PentiumPro), what I was
told about the Pentium was that this chip was the first one which was
too complicated to create/sell an In-Circuit Emulator (ICE) version, so
instead they added a bunch of counters for near-zero overhead monitoring
and depended on a bit-serial read-out when they needed to dump all state
for debugging. (I have forgotten the proper term for that interface! :-( )
Scan chains. The modern interface to scan chains (which we used on the mainframes in the late 70's/early 80') is JTAG.
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >long-running program.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >>long-running program.
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies, >bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
Their target is not application analysis.
scott@slp53.sl.home (Scott Lurndal) writes:
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
scott@slp53.sl.home (Scott Lurndal) writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>>>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >>>long-running program.
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies, >>bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
I OTOH expect that designers of out-of-order (and in-order) cores
analyse the performance of various programs to find out where the
bottlenecks of their microarchitectures are in benchmarks and
applications that people look at to determine which CPU to buy. And
that's why we not only just have PMCs for memory accesses, but also
for branch prediction accuracy, functional unit utilization, scheduler utilization, etc.
- anton--- Synchronet 3.20a-Linux NewsLink 1.114
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 915 |
Nodes: | 10 (2 / 8) |
Uptime: | 43:03:31 |
Calls: | 12,170 |
Calls today: | 2 |
Files: | 186,521 |
Messages: | 2,234,528 |