From Newsgroup: comp.lang.c
On 10/22/2025 12:25 PM, Thiago Adams wrote:
On 10/22/2025 2:23 PM, Thiago Adams wrote:
On 10/22/2025 1:42 PM, BGB wrote:
On 10/22/2025 7:45 AM, Thiago Adams wrote:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
In my project, with my own compiler, I have made some use of it...
The use case I have for _BitInt(N) N is dynamic, so I am not planning
to use it.
In my compiler, only constant N is allowed.
N is allowed over a range of 1 to 16383, though anything large is
generally implemented with runtime calls:
1..64: Mapped to integer operations.
65..128: Mapped to 128-bit integer operations.
Optional partial support in my ISA.
Rest is runtime calls.
129..256: Runtime calls for 256-bit integer ops.
257+: Runtime calls for generic large integers.
Storage is padded to a multiple of 128 bits, with 16-byte alignment.
In my compiler:
Largest fully-supported integer type is 128 bits.
__int128, __uint128, unsigned __int128
Partial handling exists for 256-bit values, but they are not exposed as
their own types. Stuff for very large integers is mostly untested.
Ironically, while it does support large integer constants, its support
for very large integer constants generally involves representing them
inside the compiler as string literals (Base85 encoded).
IIRC, there is a limit of 128 bits for decimal literals though (so going larger is only really possible with hexadecimal).
Contrast, say:
GCC: Refuses to support integer types over 64 bits on most targets tested; Clang: Sorta works, but has a lot of limitations, like the inability to
have 128-bit integer literals.
Also maybe fun is the wonk that UTF-8 string literals in BGBCC are
effectively double-encoded. Though, actual scheme is a little more complicated:
00: Escaped as 2-byte (C0-80).
01..7F: As-is
0080..00FF: Encodes Bytes 0x80..0xFF;
0100..06FF: Pass Through
0700..077F: Encodes 00..7F byte followed by 00.
0780..07FF: Encodes 0080..00FF.
0800..7FFF: Pass Through
8000..FFFF: Interpreted as a 2-byte pair (80..FF followed by 00..FF).
Some of this is an attempt to reduce the relative inefficiency of the double-encoding scheme (the naive approach would effectively double the encoded size of each codepoint, whereas this scheme as a worse case of
1.5x but on-average closer to 1x).
The above scheme might also slightly compact data expressed in string
literals if it happens to resemble these patterns (happens to match
UTF-8 byte sequences).
As noted, the ASCII byte followed by 00 is to try to avoid bloat for
string literals like "S\0o\0m\0e\0 \0S\0t\0r\0i\0n\0g\0\0" (sometimes
seen, most often in old code originally written for the Win32 API; in
the era when MS thought it was a good idea to move parts of the Win32
API over to UCS-2 / UTF-16 but not yet bothering to add UCS-2 string
literals to MSVC...).
For UTF-16 literals, it is basically M-UTF-8.
Note that non-BMP codepoints are:
Double encoded, for UTF-8 literals;
Encoded as surrogate pairs for UTF-16 (or UTF-32) literals.
Where, for the base-level encoding, values above 010000 may instead potentially encode intra-string LZ matches (as a way to compactify large string literals and text blobs). Though, this is optional and not
enabled ATM IIRC (not always 100% stable; and edge cases here may turn
large strings into confetti).
Though, for large numbers or similar encoded via strings, generally the
most space-efficient way ATM is Base85 or similar.
...
--- Synchronet 3.21a-Linux NewsLink 1.2