• Can the new generic string functions accept void* arguments?

    From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Thu Jun 1 21:41:03 2023
    From Newsgroup: comp.std.c

    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).

    The problem this solves is that calling strchr() with a const char*
    argument yields a non-const char* result that points into the array.
    For example:

    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const char s[] = "hello";
    char *p = strchr(s, 'h');
    *p = 'J'; // Undefined behavior
    printf("%s\n", s); // Likely to print "Jello"
    }

    This makes it possible to obtain a non-const pointer to a const object
    without a pointer cast.

    The C23 strchr() generic function returns a char* if the first argument
    is a char*, or a const char* if the first argument is a const char*.

    The stateless search functions in this section (memchr, strchr,
    strpbrk, strrchr, strstr) are *generic functions*. These functions
    are generic in the qualification of the array to be searched and
    will return a result pointer to an element with the same
    qualification as the passed array. If the array to be searched is
    const-qualified, the result pointer will be to a const-qualified
    element. If the array to be searched is not const-qualified, the
    result pointer will be to an unqualified element.

    So far so good, and I definitely approve of this change. It does break
    code that calls strchr() with a const char* argument and assigns the
    result to a (non-const) char* object. That's IMHO a minor issue, and
    arguably breaking such code is part of the point of the change. (Making
    string literals const would be similar, but I suppose that's still a
    bridge too far.)

    But I've thought of away in which this could break some existing valid
    code, namely code that passes a void* or const void* argument to
    strchr().

    Currently, since void* can be implicitly converted to char* and vice
    versa, such a call is valid. (I can't think of a *good* reason to write
    such a call, but my imagination is not unlimited.)

    Question: Is this a valid call in C23? (It's valid in C17.)

    char hello[] = "hello";
    void *p = strchr((void*)hello, 'h');

    An implementation of the generic strchr() will presumably use a generic selection in a macro definition. If the generic selection covers only
    types char* and const char*, the call will violate a constraint. If it
    also covers void* and const void*, the call will be valid.

    The current wording in N3096 suggests that only char* and const char*
    are covered, implying that a call with a void* or const void* argument
    is a constraint violation.

    I suggest that the C23 standard should specify whether void* arguments
    are valid or not. I have a slight preference for making them valid. If
    so, the simplest approach would be for strchr() to return a char* given
    a char* or void* argument, or a const char* given a const char* or const
    void* argument.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Thu Jun 1 22:18:40 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]

    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.

    In C17 and earlier, memchr() has this declaration:

    void *memchr(const void *s, int c, size_t n);

    Given the implicit conversions between void* and other object pointer
    types, the first argument can be a pointer to any const object type.
    This is something that might plausibly be used in practice, unlike
    (I think) passing a void pointer to the str*() functions.

    It's probably impractical to fix this, since it would require
    the generic selection to cover all possible object pointer types.
    Any code that depends on the current behavior would have to add
    (void*) or (const void*) casts to ensure that the type actually
    matches.

    For example, this (contrived) program is valid in C17 and earlier:

    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const unsigned u = 0x12345678;
    printf("u = 0x%x", u);
    unsigned char *p = memchr(&u, 0x34, sizeof u);
    if (p != NULL) printf(", p points to 0x%x", *p);
    putchar('\n');
    }

    The output is:

    u = 0x12345678, p points to 0x34

    (Conceivably p might be a null pointer if unsigned int has padding
    bits that cause 0x34 not to be stored in a single byte.)

    A call to memchr with a char* argument is, I suspect, more likely to
    appear in real code.

    The underlying issue is that the implicit conversions that happen with
    function arguments do not happen with operands of a generic selection.
    (The generic functions in <tgmath.h> are defined in a way that this
    isn't an issue, as far as I can tell.)
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jakob Bohm@jb-usenet@wisemo.com.invalid to comp.std.c on Fri Jun 2 15:03:00 2023
    From Newsgroup: comp.std.c

    On 2023-06-02 07:18, Keith Thompson wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]

    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.

    In C17 and earlier, memchr() has this declaration:

    void *memchr(const void *s, int c, size_t n);

    Given the implicit conversions between void* and other object pointer
    types, the first argument can be a pointer to any const object type.
    This is something that might plausibly be used in practice, unlike
    (I think) passing a void pointer to the str*() functions.

    It's probably impractical to fix this, since it would require
    the generic selection to cover all possible object pointer types.
    Any code that depends on the current behavior would have to add
    (void*) or (const void*) casts to ensure that the type actually
    matches.

    For example, this (contrived) program is valid in C17 and earlier:

    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const unsigned u = 0x12345678;
    printf("u = 0x%x", u);
    unsigned char *p = memchr(&u, 0x34, sizeof u);
    if (p != NULL) printf(", p points to 0x%x", *p);
    putchar('\n');
    }

    The output is:

    u = 0x12345678, p points to 0x34

    (Conceivably p might be a null pointer if unsigned int has padding
    bits that cause 0x34 not to be stored in a single byte.)

    A call to memchr with a char* argument is, I suspect, more likely to
    appear in real code.

    The underlying issue is that the implicit conversions that happen with function arguments do not happen with operands of a generic selection.
    (The generic functions in <tgmath.h> are defined in a way that this
    isn't an issue, as far as I can tell.)


    Would the ability of the (new) generic mechanism to choose among a short prioritized list of types, combined with a rule that all the argument promotion rules continue to apply to the selection solve the conundrum?
    This is what typically happens with C++ overloads done for the same
    purposes.

    So if the generic declaration gives the priority list [char*, const
    char* ], then non-const pointers compatible with char* formal argument
    types will get selected first and return a non-const char*, while other pointers compatbile with const char* formal arguments will be selected
    second and return a const char*. This would even work if the generic declaration also covered the wchar_t and related types, omitting
    whichever of UTF-16/UCS-4 is equivalent to the implementation defined
    wchar_t* .

    Sorry for not having a copy of the new syntax handy.


    Enjoy

    Jakob
    --
    Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
    Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
    This public discussion message is non-binding and may contain errors.
    WiseMo - Remote Service Management for PCs, Phones and Embedded
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Fri Jun 2 11:52:38 2023
    From Newsgroup: comp.std.c

    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
    On 2023-06-02 07:18, Keith Thompson wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]
    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.
    In C17 and earlier, memchr() has this declaration:
    void *memchr(const void *s, int c, size_t n);
    Given the implicit conversions between void* and other object
    pointer
    types, the first argument can be a pointer to any const object type.
    This is something that might plausibly be used in practice, unlike
    (I think) passing a void pointer to the str*() functions.
    It's probably impractical to fix this, since it would require
    the generic selection to cover all possible object pointer types.
    Any code that depends on the current behavior would have to add
    (void*) or (const void*) casts to ensure that the type actually
    matches.
    For example, this (contrived) program is valid in C17 and earlier:
    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const unsigned u = 0x12345678;
    printf("u = 0x%x", u);
    unsigned char *p = memchr(&u, 0x34, sizeof u);
    if (p != NULL) printf(", p points to 0x%x", *p);
    putchar('\n');
    }
    The output is:
    u = 0x12345678, p points to 0x34
    (Conceivably p might be a null pointer if unsigned int has padding
    bits that cause 0x34 not to be stored in a single byte.)
    A call to memchr with a char* argument is, I suspect, more likely to
    appear in real code.
    The underlying issue is that the implicit conversions that happen
    with
    function arguments do not happen with operands of a generic selection.
    (The generic functions in <tgmath.h> are defined in a way that this
    isn't an issue, as far as I can tell.)


    Would the ability of the (new) generic mechanism to choose among a
    short prioritized list of types, combined with a rule that all the
    argument promotion rules continue to apply to the selection solve the conundrum?
    This is what typically happens with C++ overloads done for the same
    purposes.

    I don't think so.

    Generic selections (_Generic) have been in the language since C11. The
    issue is that the argument promotion rules do *not* apply. C23 adds a
    new use for them for several functions declared in <string.h>. (And the corresponding functions in <wchar.h>; I had forgotten about those.)

    So if the generic declaration gives the priority list [char*, const
    char* ], then non-const pointers compatible with char* formal argument
    types will get selected first and return a non-const char*, while other pointers compatbile with const char* formal arguments will be selected
    second and return a const char*. This would even work if the generic declaration also covered the wchar_t and related types, omitting
    whichever of UTF-16/UCS-4 is equivalent to the implementation defined wchar_t* .

    The str*() generic functions can be handled by accepting:
    char* // selects function returning char*
    const char* // selects function returning const char*
    void* // selects function returning char*
    const void* // selects function returning const char*

    For memchr(), which currently takes an argument of type const
    void*, I don't think there's any way for the new generic function
    to accept arguments of all pointer-to-[const]-object types (without
    adding a new language mechanism, which isn't practical this late
    in the process). It would be possible to accept pointers to char,
    signed char, and unsigned char as well as pointers to void, which
    might handle most of the existing cases, but it might be cleaner
    just to require a pointer to void (which could break some existing
    valid code).
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Fri Jun 2 22:01:38 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]

    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.

    In C17 and earlier, memchr() has this declaration:

    void *memchr(const void *s, int c, size_t n);

    Given the implicit conversions between void* and other object pointer
    types, the first argument can be a pointer to any const object type.
    This is something that might plausibly be used in practice, unlike
    (I think) passing a void pointer to the str*() functions.

    It's probably impractical to fix this, since it would require
    the generic selection to cover all possible object pointer types.

    There may be a way round that... This trick converts any object pointer
    to a const void * or a void * depending on the qualifiers of the object pointer:

    #include <stdio.h>

    #ifndef T
    #define T const int
    #endif

    int main(void)
    {
    T i;
    puts(_Generic((1 ? &i : (void *)&(int){0}),
    void *: "void *",
    const void *: "const void *",
    default: "other"));
    }

    (Compile with -DT=int for example to test the other case.)

    Taking the address of (int){0} is simply a way to get a void * that is
    not a null pointer constant. One could, in a macro taking pointer, just
    use

    (1 ? (p) (void *)(p))

    but some compilers will warn that the cast discards the const even
    though the overall effect of the expression is to keep it.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Fri Jun 2 15:11:30 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]

    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.
    [snip]

    And I think I've found an even more serious issue with bsearch().

    In C17 and earlier, bsearch() is declared as:

    void *bsearch(const void *key, const void *base,
    size_t nmemb, size_t size,
    int (*compar)(const void *, const void *));

    `base` points to the object being searched. The returned value is a
    pointer to non-const void pointing to an element of the searched object.

    C23 (as of N3096) has:

    QVoid *bsearch(const void *key, QVoid *base, size_t nmemb, size_t size,
    int (*compar)(const void *, const void *));

    where QVoid is either void or const void, depending on the type of the
    base argument.

    The obvious implementation using _Generic will reject a base argument of
    a type other than `void* or `const void*`.

    (I see Ben posted a followup with a possible solution. I haven't
    studied it yet.)

    I've been discussing this by email with the editors (listed on the first
    page of N3096).

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Sun Jun 4 21:37:00 2023
    From Newsgroup: comp.std.c

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The latest draft of the upcoming C23 standard is:
    https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
    It introduces several type-generic functions in <string.h>, replacing
    normal functions of the same names: memchr, strchr, strpbrk, strrchr,
    strstr.

    I'll use strchr() as an example; the same applies to the other str*()
    generic functions (but not to memchr()).
    [...]

    Just after I posted the above, I thought of a potential issue with
    memchr() that just might affect real code.

    In C17 and earlier, memchr() has this declaration:

    void *memchr(const void *s, int c, size_t n);

    Given the implicit conversions between void* and other object pointer
    types, the first argument can be a pointer to any const object type.
    This is something that might plausibly be used in practice, unlike
    (I think) passing a void pointer to the str*() functions.

    It's probably impractical to fix this, since it would require
    the generic selection to cover all possible object pointer types.

    There may be a way round that... This trick converts any object pointer
    to a const void * or a void * depending on the qualifiers of the object pointer:

    #include <stdio.h>

    #ifndef T
    #define T const int
    #endif

    int main(void)
    {
    T i;
    puts(_Generic((1 ? &i : (void *)&(int){0}),
    void *: "void *",
    const void *: "const void *",
    default: "other"));
    }

    (Compile with -DT=int for example to test the other case.)

    Taking the address of (int){0} is simply a way to get a void * that is
    not a null pointer constant. One could, in a macro taking pointer, just
    use

    (1 ? (p) (void *)(p))

    but some compilers will warn that the cast discards the const even
    though the overall effect of the expression is to keep it.

    I think you're right. I'll pass it on to the editors.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Sun Jun 4 21:50:55 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    [SNIP]
    I think you're right. I'll pass it on to the editors.

    Ben, I Cc'ed you on the email and got a bounce indicating that your
    mailbox is full. Everyone else, sorry about the off-topic noise.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114