The latest C2x draft N2596 says for isnormal:
The isnormal macro determines whether its argument value is normal
(neither zero, subnormal, infinite, nor NaN). [...]
(like in C99). But there may be values that are neither normal, zero, subnormal, infinite, nor NaN, e.g. for long double on PowerPC, where
the double-double format is used. This is allowed by 5.2.4.2.2p4:
"and values that are not floating-point numbers, such as infinities
and NaNs" ("such as", not limited to). Note that these additional
values may be in the normal range, or outside (with an absolute value
less than the minimum positive normal number or greater than the
maximum normal number).
What should the behavior be for these values, in particular when
they are in the normal range, i.e. with an absolute value between
the minimum positive normal number and the maximum normal number?
7.12p12, describing the number classification macros, allows that
"Additional implementation-defined floating-point classifications, with
macro definitions beginning with FP_ and an uppercase letter, may also
be specified by the implementation."
7.12.3.1p2 says
"The fpclassify macro classifies its argument value as NaN, infinite,
normal, subnormal, zero, or into another implementation-defined category."
7.12.3.5p3 says
"The isnormal macro returns a nonzero value if and only if its argument
has a normal value."
Since the standard explicitly allows that there may be other implementation-defined categories, 7.12.3.5p2 and 7.12.3.5p3 conflict on
an implementation where other categories are supported. I would
recommend that this discrepancy be resolved in favor of 7.12.3.5p3.
In article <sdripc$n9b$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
7.12p12, describing the number classification macros, allows that
"Additional implementation-defined floating-point classifications, with
macro definitions beginning with FP_ and an uppercase letter, may also
be specified by the implementation."
7.12.3.1p2 says
"The fpclassify macro classifies its argument value as NaN, infinite,
normal, subnormal, zero, or into another implementation-defined category."
7.12.3.5p3 says
"The isnormal macro returns a nonzero value if and only if its argument
has a normal value."
Since the standard explicitly allows that there may be other
implementation-defined categories, 7.12.3.5p2 and 7.12.3.5p3 conflict on
an implementation where other categories are supported. I would
recommend that this discrepancy be resolved in favor of 7.12.3.5p3.
Now I'm wondering of the practical consequences. The fact that there
may exist non-FP numbers between the minimum positive normal number
and the maximum one may have been overlooked by everyone, and users
might use isnormal() to check whether the value is finite and larger
than the minimum positive normal number in absolute value.
GCC assumes the following definition in gcc/builtins.c:
/* isnormal(x) -> isgreaterequal(fabs(x),DBL_MIN) &
islessequal(fabs(x),DBL_MAX). */
which is probably what the users expect, and a more useful definition
in practice. Thoughts?
I think it would be highly inappropriate for isnormal(x) to produce a different result than (fp_classify(x)==FP_NORMAL) for any value of x.
I was not familiar with the term "double-double". A check on Wikipedia
led me to <https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic>,
but doesn't describe it to me in sufficient detail to clarify why there
might be a classification problem. Could you give an example of a value
that cannot be classified as either infinite, NaN, normal, subnormal, or zero? In particular, I'm not sure what you mean by a "non-FP number".
In article <sdun42$9co$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
[...] In particular, I'm not sure what you mean by a "non-FP number".
So x is a number with more than the 106-bit precision of normal
numbers, and both fpclassify(x) and isnormal(x) regard it as a
"normal number", which should actually be interpreted as being
in the range of the normal numbers.
Note: LDBL_MANT_DIG = 106 because with this format, there are
107-bits numbers that cannot be represented exactly (near the
overflow threshold).
In article <20210730150844$4673@zira.vinc17.org>,
Vincent Lefevre <vincent-news@vinc17.net> wrote:
In article <sdun42$9co$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
[...] In particular, I'm not sure what you mean by a "non-FP number".
[...]
So x is a number with more than the 106-bit precision of normal
numbers, and both fpclassify(x) and isnormal(x) regard it as a
"normal number", which should actually be interpreted as being
in the range of the normal numbers.
Note: LDBL_MANT_DIG = 106 because with this format, there are
107-bits numbers that cannot be represented exactly (near the
overflow threshold).
Note about this point: in the ISO C model defined in 5.2.4.2.2,
the sum goes from k = 1 to p. Thus this model does not allow
floating-point numbers to have more than a p-digit precision,
where p = LDBL_MANT_DIG for long double (see the *_MANT_DIG
definitions). ...
... Increasing the value of p here would mean that
some floating-point numbers would not be exactly representable
(actually many of them), which is forbidden (the text from the
ISO C standard is not very clear, but this seems rather obvious,
otherwise this could invalidate common error analysis).
So x is a number with more than the 106-bit precision of normal
numbers, and both fpclassify(x) and isnormal(x) regard it as a
"normal number", which should actually be interpreted as being
in the range of the normal numbers.
The standard frequently applies the adjective "representable" to
floating point numbers - that would be redundant if all floating point numbers were required to be representable. I think the format you
describe should be considered to have p (and therefore, LDBL_MANT_DIG)
large enough to include all representable numbers, even if that would
mean that not all floating point numbers are representable.
"In addition to normalized floating-point numbers ( f_1 > 0 if x ≠ 0), floating types may be able to contain other kinds of floating-point
numbers, such as subnormal floating-point numbers (x ≠ 0, e = e_min ,
f_1 = 0) and unnormalized floating-point numbers (x ≠ 0, e > e_min , f_1
= 0), and values that are not floating-point numbers, such as infinities
and NaNs." (5.2.4.2.2p3)
(I've use underscores to indicate subscripts in the original text).
The phrases "subnormal floating point numbers" and "unnormalized
floating -point numbers" are italicized, an ISO convention indicating
that the containing sentence is the official definition of those terms.
Oddly enough, "normalized floating-point numbers" is not italicized,
despite being followed by a similar description. Normalized, subnormal,
and unnormalized floating point numbers are all defined/described in
terms of whether f_1, the leading base-b digit, is zero. The lower order base-b digits have no role to play any of those
definitions/descriptions. It doesn't matter how many of those other
digits there are, and it therefore shouldn't matter if that number is variable.
Therefore, I would guess that a double-double value of the form a+b,
where fabs(a) > fabs(b), should be classified as normal iff a is normal,
and as subnormal iff a is subnormal - which would fit the behavior you describe for the implementation you were using.
On Fri, 30 Jul 2021 15:24:22 UTC, Vincent Lefevre <vincent-news@vinc17.net> wrote:
So x is a number with more than the 106-bit precision of normal
numbers, and both fpclassify(x) and isnormal(x) regard it as a
"normal number", which should actually be interpreted as being
in the range of the normal numbers.
isnormal() is not a test of is the number normalized.
Your program results match the C standard.
In article <se1v53$47u$1@dont-email.me>,<snip>
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
The standard frequently applies the adjective "representable" to
floating point numbers - that would be redundant if all floating point numbers were required to be representable. I think the format you
describe should be considered to have p (and therefore, LDBL_MANT_DIG) large enough to include all representable numbers, even if that would
mean that not all floating point numbers are representable.
I think that this would be rather useless in practice (completely
unusable for error analysis). And if an implementation chooses to
represent pi exactly (with a special encoding, as part of the
"values that are not floating-point numbers")?
Until now, *_MANT_DIG has always meant that all FP numbers from the
model are representable, AFAIK. That's probably why for long double
on PowerPC (double-double format), LDBL_MANT_DIG is 106 and not 107,
while almost all 107-bit FP numbers are representable (this fails
only near the overflow threshold).
But since f_1 is defined by the formula in 5.2.4.2.2p2, this means that
it is defined only with no more than p digits, where p = LDBL_MANT_DIG
for long double.
Therefore, I would guess that a double-double value of the form a+b,
where fabs(a) > fabs(b), should be classified as normal iff a is normal, and as subnormal iff a is subnormal - which would fit the behavior you describe for the implementation you were using.
The standard would still need to extend the definition of f_1 to
a number of digits larger than p (possibly infinite).
In article <se1v53$47u$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
The standard frequently applies the adjective "representable" to
floating point numbers - that would be redundant if all floating point
numbers were required to be representable. I think the format you
describe should be considered to have p (and therefore, LDBL_MANT_DIG)
large enough to include all representable numbers, even if that would
mean that not all floating point numbers are representable.
I think that this would be rather useless in practice (completely
unusable for error analysis). And if an implementation chooses to
represent pi exactly (with a special encoding, as part of the
"values that are not floating-point numbers")?
Until now, *_MANT_DIG has always meant that all FP numbers from the
model are representable, AFAIK. ...
The phrases "subnormal floating point numbers" and "unnormalized
floating -point numbers" are italicized, an ISO convention indicating
that the containing sentence is the official definition of those terms.
Oddly enough, "normalized floating-point numbers" is not italicized,
despite being followed by a similar description. Normalized, subnormal,
and unnormalized floating point numbers are all defined/described in
terms of whether f_1, the leading base-b digit, is zero. The lower order
base-b digits have no role to play any of those
definitions/descriptions. It doesn't matter how many of those other
digits there are, and it therefore shouldn't matter if that number is
variable.
But since f_1 is defined by the formula in 5.2.4.2.2p2, this means that
it is defined only with no more than p digits, where p = LDBL_MANT_DIG
for long double.
Therefore, I would guess that a double-double value of the form a+b,
where fabs(a) > fabs(b), should be classified as normal iff a is normal,
and as subnormal iff a is subnormal - which would fit the behavior you
describe for the implementation you were using.
The standard would still need to extend the definition of f_1 to
a number of digits larger than p (possibly infinite).
AFAICS double-double is poor fit to the standard. Your implementation
made sane choice. From point of view of standard purity, your
implementation probably should have conformace statement explaning
what LDBL_MANT_DIG means and that double-double does not fit
standard model.
On 8/1/21 7:08 AM, Vincent Lefevre wrote:
In article <se1v53$47u$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
The standard frequently applies the adjective "representable" to
floating point numbers - that would be redundant if all floating point
numbers were required to be representable. I think the format you
describe should be considered to have p (and therefore, LDBL_MANT_DIG)
large enough to include all representable numbers, even if that would
mean that not all floating point numbers are representable.
I think that this would be rather useless in practice (completely
unusable for error analysis). And if an implementation chooses to
represent pi exactly (with a special encoding, as part of the
"values that are not floating-point numbers")?
Until now, *_MANT_DIG has always meant that all FP numbers from the
model are representable, AFAIK. ...
When you use floating point format that is not a good fit to the
standard's model, you're inherently going to be breaking some
assumptions that are based upon that model - the only question is, which ones.
Error analysis for double-double will necessarily be quite different
from the error analysis that applies to a format that fits the
standard's model, regardless of what value you choose for LDBL_MANT_DIG.
The consequences of this decision are therefore relatively limited
outside of 5.2.4.2.2. Many standard library functions that take floating point arguments have defined behavior only when passed a floating-point number.
In some cases, the behavior is also defined if they are passed a
NaN or an infinity.
The phrases "subnormal floating point numbers" and "unnormalized
floating -point numbers" are italicized, an ISO convention indicating
that the containing sentence is the official definition of those terms.
Oddly enough, "normalized floating-point numbers" is not italicized,
despite being followed by a similar description. Normalized, subnormal,
and unnormalized floating point numbers are all defined/described in
terms of whether f_1, the leading base-b digit, is zero. The lower order >> base-b digits have no role to play any of those
definitions/descriptions. It doesn't matter how many of those other
digits there are, and it therefore shouldn't matter if that number is
variable.
But since f_1 is defined by the formula in 5.2.4.2.2p2, this means that
it is defined only with no more than p digits, where p = LDBL_MANT_DIG
for long double.
No, f_1 can trivially be identified as the leading base-b digit,
regardless of whether or not there are too many base-b digits for the representation to qualify as a floating-point number.
Therefore, I would guess that a double-double value of the form a+b,
where fabs(a) > fabs(b), should be classified as normal iff a is normal, >> and as subnormal iff a is subnormal - which would fit the behavior you
describe for the implementation you were using.
The standard would still need to extend the definition of f_1 to
a number of digits larger than p (possibly infinite).
Why would it be infinite?
In article <se79uh$ejo$1@dont-email.me>,...
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 8/1/21 7:08 AM, Vincent Lefevre wrote:
Until now, *_MANT_DIG has always meant that all FP numbers from the
model are representable, AFAIK. ...
When you use floating point format that is not a good fit to the
standard's model, you're inherently going to be breaking some
assumptions that are based upon that model - the only question is, which
ones.
This is FUD. Double-double is a good fit to the standard's model.
There are additional numbers (as allowed by the standard), so that
this won't follow the same behavior as IEEE FP, but so does
contraction of FP expressions.
Error analysis for double-double will necessarily be quite different
from the error analysis that applies to a format that fits the
standard's model, regardless of what value you choose for LDBL_MANT_DIG.
Assuming some error bound in ulp (which must be done whatever the
format), the error analysis will be the same.
fabs(b) && fabs(b) > 0, with x being the largest power of FLT_RADIXthat is no larger than b, 1 ulp will be DBL_EPSILON*x. That expression
The consequences of this decision are therefore relatively limited
outside of 5.2.4.2.2. Many standard library functions that take floating
point arguments have defined behavior only when passed a floating-point
number.
The main ones, like expl, have a defined behavior for other real
values: "The exp functions compute the base-e exponential of x.
A range error occurs if the magnitude of x is too large."
Note that x is *not* required to be a floating-point number.
In some cases, the behavior is also defined if they are passed a
NaN or an infinity.
Only in Annex F, AFAIK.
But since f_1 is defined by the formula in 5.2.4.2.2p2, this means that
it is defined only with no more than p digits, where p = LDBL_MANT_DIG
for long double.
No, f_1 can trivially be identified as the leading base-b digit,
regardless of whether or not there are too many base-b digits for the
representation to qualify as a floating-point number.
It can, but this is not what the standard says. This can be solved
if p is replaced by the infinity over the sum symbol in the formula.
Therefore, I would guess that a double-double value of the form a+b,
where fabs(a) > fabs(b), should be classified as normal iff a is normal, >>>> and as subnormal iff a is subnormal - which would fit the behavior you >>>> describe for the implementation you were using.
The standard would still need to extend the definition of f_1 to
a number of digits larger than p (possibly infinite).
Why would it be infinite?
So that the definition remains valid with a format that chooses to
make pi exactly representable, for instance.
On 8/2/21 3:05 PM, Vincent Lefevre wrote:
In article <se79uh$ejo$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...On 8/1/21 7:08 AM, Vincent Lefevre wrote:
Until now, *_MANT_DIG has always meant that all FP numbers from the
model are representable, AFAIK. ...
When you use floating point format that is not a good fit to the
standard's model, you're inherently going to be breaking some
assumptions that are based upon that model - the only question is, which >> ones.
This is FUD. Double-double is a good fit to the standard's model.
There are additional numbers (as allowed by the standard), so that
this won't follow the same behavior as IEEE FP, but so does
contraction of FP expressions.
We're clearly using "good fit" in different senses. I would consider a floating point format to be a "perfect fit" to the C standard's model if every number representable in that format qualifies as a floating-point number, and every number that qualifies as a floating point number is representable in that format. "Goodness of fit" would depend upon how
closely any given format approaches that ideal. There's no single value
of the model parameters that makes both of those statements even come
close to being true for double-double format.
[...]Error analysis for double-double will necessarily be quite different
from the error analysis that applies to a format that fits the
standard's model, regardless of what value you choose for LDBL_MANT_DIG.
Assuming some error bound in ulp (which must be done whatever the
format), the error analysis will be the same.
Given a number in double-double format represented by a+b, where
fabs(a) > fabs(b) && fabs(b) > 0, with x being the largest power of
FLT_RADIX that is no larger than b, 1 ulp will be DBL_EPSILON*x.
That expression will vary over a very large dynamic range while the
number being represented by a+b changes only negligibly.
The consequences of this decision are therefore relatively limited
outside of 5.2.4.2.2. Many standard library functions that take floating >> point arguments have defined behavior only when passed a floating-point
number.
The main ones, like expl, have a defined behavior for other real
values: "The exp functions compute the base-e exponential of x.
A range error occurs if the magnitude of x is too large."
Note that x is *not* required to be a floating-point number.
The frexp(), ldexp(), and fabs() functions have specified behavior only
when the double argument qualifies as a floating point number.
The printf() family of functions have specified behavior only for floating-point numbers, NaN's and infinities, when using f,F,e,E,g,G,a,
or A formats.
The ceil() and floor() functions have a return value which is required
to be floating point number.
The scanf() family of functions (when using f,F,e,E,g,G,a, or A formats)
and strtod(). are required to return floating-point numbers, NaN's or infinities,
and, of course, this also applies to the float, long double, and complex versions of all of those functions, and to the wide-character version of
the text functions.
In some cases, the behavior is also defined if they are passed a
NaN or an infinity.
Only in Annex F, AFAIK.
No, the descriptions of printf() and scanf() families of functions also
allow for NaNs and infinities.
...
But since f_1 is defined by the formula in 5.2.4.2.2p2, this means that >>> it is defined only with no more than p digits, where p = LDBL_MANT_DIG >>> for long double.
No, f_1 can trivially be identified as the leading base-b digit,
regardless of whether or not there are too many base-b digits for the
representation to qualify as a floating-point number.
It can, but this is not what the standard says. This can be solved
if p is replaced by the infinity over the sum symbol in the formula.
The C standard defines what normalized, subnormal, and unnormalized floating-point numbers are; it does not define what those terms mean for things that could qualify as floating-point numbers, but only if LDBL_MANT_DIG were larger. However, the extension of those concepts to
such representations is trivial. I think that increasing the value of LDBL_MANT_DIG to allow them to qualify as floating-point numbers is the
more appropriate approach.
If p is replaced by infinity, it no longer defines a floating point
format. Such formats are defined, in part, by their finite maximum value
of p.
In article <se9ui6$m1q$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 8/2/21 3:05 PM, Vincent Lefevre wrote:...
This is FUD. Double-double is a good fit to the standard's model.
There are additional numbers (as allowed by the standard), so that
this won't follow the same behavior as IEEE FP, but so does
contraction of FP expressions.
We're clearly using "good fit" in different senses. I would consider a
floating point format to be a "perfect fit" to the C standard's model if
every number representable in that format qualifies as a floating-point
number, and every number that qualifies as a floating point number is
representable in that format. "Goodness of fit" would depend upon how
closely any given format approaches that ideal. There's no single value
of the model parameters that makes both of those statements even come
close to being true for double-double format.
The C standard does not have a notion of "goodness of fit",
following the FP model. There are additional representable numbers,
but are not a problem: for the error analysis, they will typically
be ignored (though in some cases, one can take advantage of them),
Assuming some error bound in ulp (which must be done whatever the
format), the error analysis will be the same.
Given a number in double-double format represented by a+b, where[...]
fabs(a) > fabs(b) && fabs(b) > 0, with x being the largest power of
FLT_RADIX that is no larger than b, 1 ulp will be DBL_EPSILON*x.
That expression will vary over a very large dynamic range while the
number being represented by a+b changes only negligibly.
The notion of ulp is defined (and usable in practice) only with a floating-point format.
The frexp(), ldexp(), and fabs() functions have specified behavior only
when the double argument qualifies as a floating point number.
Well, in the particular case of double-double, they can easily be
generalized to non-floating-point numbers, with some drawbacks:
frexp and ldexp may introduce rounding. In practice, they can just
be defined by the implementation. This is not worse than the fact
that the standard doesn't define the accuracy of the floating-point operations and functions.
The printf() family of functions have specified behavior only for
floating-point numbers, NaN's and infinities, when using f,F,e,E,g,G,a,
or A formats.
The ceil() and floor() functions have a return value which is required
to be floating point number.
The scanf() family of functions (when using f,F,e,E,g,G,a, or A formats)
and strtod(). are required to return floating-point numbers, NaN's or
infinities,
and, of course, this also applies to the float, long double, and complex
versions of all of those functions, and to the wide-character version of
the text functions.
Ditto, they can be defined by the implementation.
No, the descriptions of printf() and scanf() families of functions also
allow for NaNs and infinities.
OK, but this is not much useful, as you need Annex F or definitions
by the implementation to know how infinities and NaNs are handled
by most operations, starting with the basic arithmetic operations.
The C standard defines what normalized, subnormal, and unnormalized
floating-point numbers are; it does not define what those terms mean for
things that could qualify as floating-point numbers, but only if
LDBL_MANT_DIG were larger. However, the extension of those concepts to
such representations is trivial. I think that increasing the value of
LDBL_MANT_DIG to allow them to qualify as floating-point numbers is the
more appropriate approach.
If p is replaced by infinity, it no longer defines a floating point
format. Such formats are defined, in part, by their finite maximum value
of p.
So, you mean that to follow the definition of "normal" used with double-double on PowerPC, LDBL_MANT_DIG needs to be increased to
a large value (something like e_max - e_min + 53), even though
not all floating-point values would be representable?
But then, various specifications would be incorrect, such as:
* LDBL_MAX, as (1 - b^(-p)) b^emax would not be representable
(p would be too large).
* LDBL_EPSILON would no longer make any sense and would not be
representable, as b^(1-p) would be too small.
* frexp, because the condition "value equals x times 2^(*exp)" could
not always be satisfied (that's probably why the standard says "If
value is not a floating-point number [...]", which is OK if all
floating-point numbers are assumed to be exactly representable).
On 8/3/21 1:01 PM, Vincent Lefevre wrote:
In article <se9ui6$m1q$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 8/2/21 3:05 PM, Vincent Lefevre wrote:...
This is FUD. Double-double is a good fit to the standard's model.
There are additional numbers (as allowed by the standard), so that
this won't follow the same behavior as IEEE FP, but so does
contraction of FP expressions.
We're clearly using "good fit" in different senses. I would consider a
floating point format to be a "perfect fit" to the C standard's model if >> every number representable in that format qualifies as a floating-point
number, and every number that qualifies as a floating point number is
representable in that format. "Goodness of fit" would depend upon how
closely any given format approaches that ideal. There's no single value
of the model parameters that makes both of those statements even come
close to being true for double-double format.
The C standard does not have a notion of "goodness of fit",
Agreed - this is simply a matter of ordinary English usage, not standard-defined usage. When trying to use the standard's model to
describe a particular format runs into problems (see below), that format
is a bad fit to the model.
...
Assuming some error bound in ulp (which must be done whatever the
format), the error analysis will be the same.
Given a number in double-double format represented by a+b, where[...]
fabs(a) > fabs(b) && fabs(b) > 0, with x being the largest power of
FLT_RADIX that is no larger than b, 1 ulp will be DBL_EPSILON*x.
That expression will vary over a very large dynamic range while the
number being represented by a+b changes only negligibly.
The notion of ulp is defined (and usable in practice) only with a floating-point format.
It's perfectly well-defined as the difference between two consecutive representable numbers. That definition ties directly into the way that
the term is used. If you want to use a number larger that that one, you shouldn't call it ulp.
Ditto, they can be defined by the implementation.
There would be no need for the behavior to be implementation-defined if
the proper choice had been made for LDBL_MANT_DIG.
The only reason I brought that up was because the simplest statement of
my point "only defined if passed a floating point number", did not apply
to those functions. The point is still that those functions have
gratuitously unspecified behavior just because you don't want to choose
a more accurate value for LDBL_MANT_DIG.
But then, various specifications would be incorrect, such as:
* LDBL_MAX, as (1 - b^(-p)) b^emax would not be representable
(p would be too large).
As I said earlier, the standard's textual descriptions are the ones that
are normative - the formulas are merely show what the the correct value
would be when using a floating point format that is a good fit to the standard's model.
LDBL_MAX is, by definition, "maximum representable finite floating-point number". By definition, it must be representable.
If I understand double-double format correctly, the maximum[...]
representable value is represented by a+b when a and b are both
DBL_MAX,
[...]* LDBL_EPSILON would no longer make any sense and would not be
representable, as b^(1-p) would be too small.
LDBL_EPSILON is "the difference between 1 and the least value greater
than 1 that is representable in the given floating point type,".
* frexp, because the condition "value equals x times 2^(*exp)" could
not always be satisfied (that's probably why the standard says "If
value is not a floating-point number [...]", which is OK if all
floating-point numbers are assumed to be exactly representable).
Again, that's merely a sign that double-double is not a good fit to the standard's model.
In article <691aadf4-74c7-e739-d93a-5907d00f6bb5@alumni.caltech.edu>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
The only reason I brought that up was because the simplest statement of
my point "only defined if passed a floating point number", did not apply
to those functions. The point is still that those functions have gratuitously unspecified behavior just because you don't want to choose
a more accurate value for LDBL_MANT_DIG.
This is very silly. I did *not* choose the value of LDBL_MANT_DIG.
It has probably been chosen a long time ago by the initial vendor
of the compiler for PowerPC (IBM?), and this value has been used
by various compilers, including GCC, for a long time. And AFAIK,
no-one found any issue with it.
Choosing a larger value could potentially break many programs that use LDBL_MANT_DIG. It would also defeat the purpose of minimum values for *_MANT_DIG required by the standard. For instance, the standard says
that DBL_MANT_DIG must be at least 53, so that the implementer is not
tempted to define a low-precision double type. But if it is allowed
to choose arbitrary large values for DBL_MANT_DIG, not reflecting the
real precision, what's the point?
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (0 / 10) |
Uptime: | 58:49:08 |
Calls: | 13,349 |
Calls today: | 1 |
Files: | 186,574 |
D/L today: |
729 files (186M bytes) |
Messages: | 3,358,503 |