Forum: War Ensemble BBS

Dark
Log in

Username Password

OT: unicode (Was: Re: Upcoming gfortran 15 will contain unsigned numbers)

From Wolfgang Agnes@wagnes@example.com to comp.lang.fortran on Mon Nov 25 08:35:48 2024

From Newsgroup: comp.lang.fortran

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

On Sat, 23 Nov 2024 09:18:11 -0300, Wolfgang Agnes wrote:

How about UCS-2?

“UCS-2” was the name of the encoding back when it was assumed that Unicode
was always going to be just 16 bits. After the coding was extended, those “surrogate” ranges were introduced, to allow representation of the extra characters within a 16-bit encoding, and so “UCS-2” was renamed to “UTF-16”.

In short, “UTF-16” is basically “UCS-2 with surrogates”.

Nice to know! Thanks. So, UCS means ``Universal Character Set''. I
thought it was a whole different character set. It's a bit difficult to understand ``surrogates''. So many definitions come up such as ``Basic Multilingual Plane''. Can you explain what surrogates are?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.fortran on Mon Nov 25 14:39:37 2024

From Newsgroup: comp.lang.fortran

On 11/25/2024 5:35 AM, Wolfgang Agnes wrote:

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

On Sat, 23 Nov 2024 09:18:11 -0300, Wolfgang Agnes wrote:

How about UCS-2?

“UCS-2” was the name of the encoding back when it was assumed that Unicode
was always going to be just 16 bits. After the coding was extended, those
“surrogate” ranges were introduced, to allow representation of the extra >> characters within a 16-bit encoding, and so “UCS-2” was renamed to
“UTF-16”.

In short, “UTF-16” is basically “UCS-2 with surrogates”.

Nice to know! Thanks. So, UCS means ``Universal Character Set''. I
thought it was a whole different character set. It's a bit difficult to understand ``surrogates''. So many definitions come up such as ``Basic Multilingual Plane''. Can you explain what surrogates are?

There is lots of information at
https://home.unicode.org/

And
https://stackoverflow.com/

Lynn

--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.fortran on Mon Nov 25 23:35:34 2024

From Newsgroup: comp.lang.fortran

On Mon, 25 Nov 2024 08:35:48 -0300, Wolfgang Agnes wrote:

It's a bit difficult to understand ``surrogates''.

The Unicode folks just decided that the ranges 0xD800-0xDBFF (1024 codes
of “high surrogates”) and 0xDC00-0xDFFF (1024 codes of “low surrogates”)
would be used in pairs to represent codes above 0xFFFF in UTF-16 encoding. This gives an additional 1024×1024 = 1048576 different codes, which should
be enough to cover the entire (current) Unicode range, which officially
goes up to 0x10FFFF. At least, that’s what they’re saying right now.

In the full UCS-4 encoding, those ranges are considered invalid.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,096
Nodes:	10 (0 / 10)
Uptime:	359:19:02
Calls:	14,032
Files:	187,081
D/L today:	729 files (213M bytes)
Messages:	2,478,342