• u8"" c11 c23

    From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Mon Oct 20 15:35:00 2025
    From Newsgroup: comp.lang.c

    speaking on signed x unsigned,

    u8"a" in C11 had the type char [N]. Normally char is signed

    in C23 it is unsigned char8_t [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""






    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source code
    and I just assume const char* is utf8.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Oct 20 15:19:11 2025
    From Newsgroup: comp.lang.c

    Thiago Adams <thiago.adams@gmail.com> writes:
    speaking on signed x unsigned,

    u8"a" in C11 had the type char [N]. Normally char is signed

    I would have said "commonly" rather than "normally". Not an
    important point.

    in C23 it is unsigned char8_t [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""


    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source
    code and I just assume const char* is utf8.

    That raises another issue.

    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.

    There doesn't seem to be any way, other than checking the value of __STDC_VERSION__ to determine whether char8_t is defined or not. There
    are not *_MIN or *_MAX macros for these types, either in <uchar.h> or in <limits.h>. A test program I just wrote would have been a little
    simpler if I could have used `#ifdef CHAR8_MAX`.

    Here's the test program :

    #include <stdio.h>
    #include <uchar.h>

    #define TYPEOF(x) \
    (_Generic(x, \
    char: "char", \
    signed char: "signed char", \
    unsigned char: "unsigned char", \
    short: "short", \
    unsigned short: "unsigned short", \
    int: "int", \
    unsigned int: "unsigned int", \
    long: "long", \
    unsigned long: "unsigned long", \
    long long: "long long", \
    unsigned long long: "unsigned long long"))

    int main(void) {
    printf("__STDC_VERSION__ = %ldL\n", __STDC_VERSION__);
    printf("u8\"a\"[0] is of type %s\n",
    TYPEOF(u8"a"[0]));
    #if __STDC_VERSION__ >= 202311L
    printf("char8_t is %s\n", TYPEOF((char8_t)0));
    #endif
    printf("char16_t is %s\n", TYPEOF((char16_t)0));
    printf("char32_t is %s\n", TYPEOF((char32_t)0));
    }

    Its output with `gcc -std=c17` :

    __STDC_VERSION__ = 201710L
    u8"a"[0] is of type char
    char16_t is unsigned short
    char32_t is unsigned int

    Its output with `gcc -std=c23` :

    __STDC_VERSION__ = 202311L
    u8"a"[0] is of type unsigned char
    char8_t is unsigned char
    char16_t is unsigned short
    char32_t is unsigned int
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Oct 21 10:35:45 2025
    From Newsgroup: comp.lang.c

    Am 20.10.2025 um 20:35 schrieb Thiago Adams:
    speaking on signed x unsigned,

    u8"a"  in C11 had the type char [N]. Normally char is signed

    in C23 it is unsigned char8_t  [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""






    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source code
    and I just assume const char*  is utf8.



    What is there to discuss ? Just cast and that's it.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Oct 21 07:07:58 2025
    From Newsgroup: comp.lang.c

    Em 21/10/2025 05:35, Bonita Montero escreveu:
    Am 20.10.2025 um 20:35 schrieb Thiago Adams:
    speaking on signed x unsigned,

    u8"a"  in C11 had the type char [N]. Normally char is signed

    in C23 it is unsigned char8_t  [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""






    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source code
    and I just assume const char*  is utf8.



    What is there to discuss ? Just cast and that's it.

    When converting code from c11 to c23 we have a error here
    const char* s = u8""

    I think it is a big change..the ones C does not normally do.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Tue Oct 21 12:09:11 2025
    From Newsgroup: comp.lang.c

    Am 21.10.2025 um 12:07 schrieb Thiago Adams:
    Em 21/10/2025 05:35, Bonita Montero escreveu:
    Am 20.10.2025 um 20:35 schrieb Thiago Adams:
    speaking on signed x unsigned,

    u8"a"  in C11 had the type char [N]. Normally char is signed

    in C23 it is unsigned char8_t  [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""






    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source code
    and I just assume const char*  is utf8.



    What is there to discuss ? Just cast and that's it.

    When converting code from c11 to c23 we have a error here
    const char* s = u8""

    No, because the null-terminator doesn't become negative with that.
    ;-)


    I think it is a big change..the ones C does not normally do.




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Oct 21 07:57:21 2025
    From Newsgroup: comp.lang.c

    On 10/20/2025 7:19 PM, Keith Thompson wrote:
    Thiago Adams <thiago.adams@gmail.com> writes:
    speaking on signed x unsigned,

    u8"a" in C11 had the type char [N]. Normally char is signed

    I would have said "commonly" rather than "normally". Not an
    important point.

    in C23 it is unsigned char8_t [N].

    when converting code from c11 to c23 we have a error here
    const char* s = u8""


    I generally "cast char* " to "unsigned char*" when handling something
    with utf8. I am not u8"" , I use just " " with utf8 encoded source
    code and I just assume const char* is utf8.

    That raises another issue.

    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.


    I think for all these typedefs related with language concepts, like
    size_t which is related with sizeof, char8_t which is related with u8"" char16_t u"", char32_t U""... etc.. should be built-in typedefs.

    And even others that does not have a association with language features
    like int16_t.




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Oct 21 10:26:16 2025
    From Newsgroup: comp.lang.c

    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/20/2025 7:19 PM, Keith Thompson wrote:
    [...]
    That raises another issue.
    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.

    I think for all these typedefs related with language concepts, like
    size_t which is related with sizeof, char8_t which is related with
    u8"" char16_t u"", char32_t U""... etc.. should be built-in typedefs.

    And even others that does not have a association with language
    features like int16_t.

    By "built-in typedefs", do you mean typedefs that are visible without
    a #include?

    That would be unprecedented, but I suppose it could work. But I'm not
    sure it would be all that advantageous. The type of the result of
    sizeof is some implementation-defined unsigned integer type. The
    <stddef.h> header merely provides a consistent name for that type.

    I can see that having language features depend (indirectly) on types
    defined in library headers is a bit messy, but I don't think it causes
    any real problems.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Oct 21 15:04:15 2025
    From Newsgroup: comp.lang.c

    On 10/21/2025 2:26 PM, Keith Thompson wrote:
    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/20/2025 7:19 PM, Keith Thompson wrote:
    [...]
    That raises another issue.
    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.

    I think for all these typedefs related with language concepts, like
    size_t which is related with sizeof, char8_t which is related with
    u8"" char16_t u"", char32_t U""... etc.. should be built-in typedefs.

    And even others that does not have a association with language
    features like int16_t.

    By "built-in typedefs", do you mean typedefs that are visible without
    a #include?


    yes.

    That would be unprecedented, but I suppose it could work. But I'm not
    sure it would be all that advantageous. The type of the result of
    sizeof is some implementation-defined unsigned integer type. The
    <stddef.h> header merely provides a consistent name for that type.

    I can see that having language features depend (indirectly) on types
    defined in library headers is a bit messy, but I don't think it causes
    any real problems.



    It's not really a problem, but it depends on the includes, which in turn depend on the preprocessor.

    It seems like the language is partially configured through macros and
    typedefs in includes.


    Some types that have direct relation with the language:

    typedef typeof_unqual(sizeof(0)) size_t;
    typedef typeof_unqual(((char*)1)-((char*)0)) ptrdiff_t;
    typedef typeof_unqual(u8' ') char8_t;
    typedef typeof_unqual(u' ') char16_t;
    typedef typeof_unqual(U' ') char32_t;
    typedef typeof_unqual(L' ') wchar_t;
    typedef typeof_unqual(nullptr) nullptr_t;



    I think it does not make sense to have to include a file to describe
    size_t because we can use sizeof without having to include anything.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Oct 21 11:51:40 2025
    From Newsgroup: comp.lang.c

    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/21/2025 2:26 PM, Keith Thompson wrote:
    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/20/2025 7:19 PM, Keith Thompson wrote:
    [...]
    That raises another issue.
    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.

    I think for all these typedefs related with language concepts, like
    size_t which is related with sizeof, char8_t which is related with
    u8"" char16_t u"", char32_t U""... etc.. should be built-in typedefs.

    And even others that does not have a association with language
    features like int16_t.
    By "built-in typedefs", do you mean typedefs that are visible
    without
    a #include?


    yes.

    That would be unprecedented, but I suppose it could work. But I'm not
    sure it would be all that advantageous. The type of the result of
    sizeof is some implementation-defined unsigned integer type. The
    <stddef.h> header merely provides a consistent name for that type.
    I can see that having language features depend (indirectly) on types
    defined in library headers is a bit messy, but I don't think it causes
    any real problems.



    It's not really a problem, but it depends on the includes, which in
    turn depend on the preprocessor.

    It seems like the language is partially configured through macros and typedefs in includes.

    The way I'd describe it is that the type of a sizeof expression is
    chosen by the compiler, and the definition of size_t in <stddef.h>
    documents that choice and makes it visible to programmers.

    Some types that have direct relation with the language:

    typedef typeof_unqual(sizeof(0)) size_t;
    typedef typeof_unqual(((char*)1)-((char*)0)) ptrdiff_t;
    typedef typeof_unqual(u8' ') char8_t;
    typedef typeof_unqual(u' ') char16_t;
    typedef typeof_unqual(U' ') char32_t;
    typedef typeof_unqual(L' ') wchar_t;
    typedef typeof_unqual(nullptr) nullptr_t;

    I think it does not make sense to have to include a file to describe
    size_t because we can use sizeof without having to include anything.

    I suppose if I were defining a new language from scratch, I probably
    wouldn't have those types defined in library headers. I might have
    made size_t a keyword, for example.

    One data point: C++ has wchar_t as a keyword, while C defines it as
    a typedef in <sddef.h>. C++'s wchar_t has the same representation
    as one of the other integral types, called its underlying type.
    That could have been a nice approach for C, but I'd say it's too
    late to fix it, and the benefits aren't worth the cost.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Oct 21 16:17:19 2025
    From Newsgroup: comp.lang.c

    On 10/21/2025 3:51 PM, Keith Thompson wrote:
    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/21/2025 2:26 PM, Keith Thompson wrote:
    Thiago Adams <thiago.adams@gmail.com> writes:
    On 10/20/2025 7:19 PM, Keith Thompson wrote:
    [...]
    That raises another issue.
    The <uchar.h> header was introduced in C99. In C99, C11, and C17,
    that header defines char16_t and char32_t. C23 introduces char8_t.

    I think for all these typedefs related with language concepts, like
    size_t which is related with sizeof, char8_t which is related with
    u8"" char16_t u"", char32_t U""... etc.. should be built-in typedefs. >>>>
    And even others that does not have a association with language
    features like int16_t.
    By "built-in typedefs", do you mean typedefs that are visible
    without
    a #include?


    yes.

    That would be unprecedented, but I suppose it could work. But I'm not
    sure it would be all that advantageous. The type of the result of
    sizeof is some implementation-defined unsigned integer type. The
    <stddef.h> header merely provides a consistent name for that type.
    I can see that having language features depend (indirectly) on types
    defined in library headers is a bit messy, but I don't think it causes
    any real problems.



    It's not really a problem, but it depends on the includes, which in
    turn depend on the preprocessor.

    It seems like the language is partially configured through macros and
    typedefs in includes.

    The way I'd describe it is that the type of a sizeof expression is
    chosen by the compiler, and the definition of size_t in <stddef.h>
    documents that choice and makes it visible to programmers.

    Some types that have direct relation with the language:

    typedef typeof_unqual(sizeof(0)) size_t;
    typedef typeof_unqual(((char*)1)-((char*)0)) ptrdiff_t;
    typedef typeof_unqual(u8' ') char8_t;
    typedef typeof_unqual(u' ') char16_t;
    typedef typeof_unqual(U' ') char32_t;
    typedef typeof_unqual(L' ') wchar_t;
    typedef typeof_unqual(nullptr) nullptr_t;

    I think it does not make sense to have to include a file to describe
    size_t because we can use sizeof without having to include anything.

    I suppose if I were defining a new language from scratch, I probably
    wouldn't have those types defined in library headers. I might have
    made size_t a keyword, for example.

    One data point: C++ has wchar_t as a keyword, while C defines it as
    a typedef in <sddef.h>. C++'s wchar_t has the same representation
    as one of the other integral types, called its underlying type.
    That could have been a nice approach for C, but I'd say it's too
    late to fix it, and the benefits aren't worth the cost.


    yes I think keywords make sense. In some ways, all C types are
    typedefs for the "real" types.



    --- Synchronet 3.21a-Linux NewsLink 1.2