anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
- anton
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
Well, and the secondary irony that it is mainly cost-added for FMUL,
whereas FADD almost invariably has the necessary support hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some cases.
So, FMAC is a single unit that costs more than both units taken
separately, and with a higher latency.
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in
hardware. The additional hardware cost (or the cost of trapping
and software emulation) has been the only argument against
denormals that I ever encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals
became a low cost addition. {And that has been my point--you seem
to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost ofIt is exceedingly difficult to get an IEEE quality rounded result if
native hardware FMAC.
not done in HW.
Well, and the secondary irony that it is mainly cost-added for
FMUL, whereas FADD almost invariably has the necessary support
hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some
cases.
The add stage after the multiplication tree is <essentially> 2ª as
wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
days.
On Thu, 19 Feb 2026 17:30:50 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in
hardware. The additional hardware cost (or the cost of trapping
and software emulation) has been the only argument against
denormals that I ever encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals
became a low cost addition. {And that has been my point--you seem
to have forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of native hardware FMAC.It is exceedingly difficult to get an IEEE quality rounded result if
not done in HW.
Well, and the secondary irony that it is mainly cost-added for
FMUL, whereas FADD almost invariably has the necessary support
hardware already.
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some
cases.
The add stage after the multiplication tree is <essentially> 2× as
wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
Arm Inc. application processors cores have FMAC latency=4 for
multiplicands, but 2 for accumulator.
Maybe we should switch to 18-bit bytes to support UNICODE.
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The >>>> additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever
encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Well, and the secondary irony that it is mainly cost-added for FMUL,
whereas FADD almost invariably has the necessary support hardware already. >>
But:
FMUL is expensive operation + cheap normalizer (if no denormals);
FADD is cheap operation with expensive normalizer.
FMAC then is gluing the costs of the two units together, but:
With roughly the latency of both;
The need to be significantly wider internally to deal with some cases.
The add stage after the multiplication tree is <essentially> 2× as wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
So, FMAC is a single unit that costs more than both units taken
separately, and with a higher latency.
Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).
On 2/19/2026 11:30 AM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. The >>>>> additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever>>>>> encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Likely depends.
Can use the trick of bumping to the next size up and use that for computation.Neither of those work!
So, for Binary32 compute it as Binary64, and for Binary64 compute it as Binary128.
BGB wrote:
On 2/19/2026 11:30 AM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 2/12/2026 11:09 AM, MitchAlsup wrote:It is exceedingly difficult to get an IEEE quality rounded result if
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
{{One can STILL argue whether
deNormals were a plus or a minus in IEEE}}
I am surprised to read that from you, who has always written that
denormals can be implemented cheaply and efficiently in hardware. >>>>>> The
additional hardware cost (or the cost of trapping and software
emulation) has been the only argument against denormals that I ever >>>>>> encountered.
It is only after IEEE 754-2008 came with FMAC that deNormals became
a low cost addition. {And that has been my point--you seem to have
forgotten the -2008 part or the argument}
And, can note, this is assuming that one actually pays the cost of
native hardware FMAC.
not done in HW.
Likely depends.
Can use the trick of bumping to the next size up and use that for
computation.
So, for Binary32 compute it as Binary64, and for Binary64 compute it
as Binary128.
Neither of those work!
I believed this to be true but I was shown the error of my thinking by
more knowledgable people in the 754 working group. I.e. they had a very simple/small example where doing the calculation in the next higher precision would still cause double rounding errors.
Also note that Mitch have stated multiple times that you need ~160
mantissa bits during FMAC double calculations.
Terje
The add stage after the multiplication tree is <essentially> 2× as wide. FMUL needs a 108-bit 2-input adder
FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
The multiplication tree is the same, normalizer is larger.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,105 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492344:25:34 |
| Calls: | 14,158 |
| Calls today: | 2 |
| Files: | 186,284 |
| D/L today: |
2,036 files (795M bytes) |
| Messages: | 2,502,672 |