• sizeof struct with flexible array: when did it change?

    From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Oct 7 02:32:13 2024
    From Newsgroup: comp.lang.c

    C99 said that the size of a structure that ends in a flexible array
    member is the same as the offset of that flexible member in a
    similar structure in which the array has some unspecified size.

    The latest draft says that the size is calculated as if the flexible
    array member were omitted, except that there may be more padding than
    the omission would imply.

    I can't think of a reasonable interpretation of the original
    wording which would allow the size to be other than the offset
    of the array, when the array is of a character type.

    The current wording clearly does allow the size to go beyond the offset
    in that case.

    Don't get burned: don't rely on the size of a flexible array struct.
    Use the offsetof that flexible member.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Oct 6 20:43:52 2024
    From Newsgroup: comp.lang.c

    Kaz Kylheku <643-408-1753@kylheku.com> writes:

    C99 said that the size of a structure that ends in a flexible array
    member is the same as the offset of that flexible member in a
    similar structure in which the array has some unspecified size.

    The latest draft says that the size is calculated as if the flexible
    array member were omitted, except that there may be more padding than
    the omission would imply.

    The change was made in a TC to C99, sometime in the early
    2000s. (No I don't know which TC specifically, but the
    wording change can be seen in N1256.)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Nick Bowler@nbowler@draconx.ca to comp.lang.c on Mon Oct 7 18:32:33 2024
    From Newsgroup: comp.lang.c

    On Mon, 7 Oct 2024 02:32:13 -0000 (UTC), Kaz Kylheku wrote:
    I can't think of a reasonable interpretation of the original wording
    which would allow the size to be other than the offset of the array,
    when the array is of a character type.

    The current wording clearly does allow the size to go beyond the offset
    in that case.

    The original wording includes no requirement that the offset of the
    replacement array used for the size calculation has any relationship
    whatsoever with the offset of the flexible array member.

    For example, in

    struct foo { int a; char b; char c[]; };

    in many real-world implementations the offset of c is 5 but the size
    of the structure is 8. On these implementations, the size matches the
    offset of c in a similar structure where c is replaced by a length-1
    array of int, and also matches the size of a similar structure with c
    deleted, so this is consistent with old and new wordings.

    I don't think the updated wording alters any implementation requirement,
    but it does seem quite a bit less complicated to explain.

    Don't get burned: don't rely on the size of a flexible array struct.
    Use the offsetof that flexible member.

    An evil compiler could probably make the size less than the offset
    of the flexible array member and be conforming, with both old and
    new wordings. This would break some examples but an evil compiler
    obviously won't care about non-normative trivialities like examples.

    So you need to use offsetof when porting to the DeathStation 9000.

    Otherwise avoid evil compilers and the handful of extra bytes to
    some malloc calls probably makes no practical difference.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Oct 7 23:42:41 2024
    From Newsgroup: comp.lang.c

    On 2024-10-07, Nick Bowler <nbowler@draconx.ca> wrote:
    On Mon, 7 Oct 2024 02:32:13 -0000 (UTC), Kaz Kylheku wrote:
    I can't think of a reasonable interpretation of the original wording
    which would allow the size to be other than the offset of the array,
    when the array is of a character type.

    The current wording clearly does allow the size to go beyond the offset
    in that case.

    The original wording includes no requirement that the offset of the replacement array used for the size calculation has any relationship whatsoever with the offset of the flexible array member.

    But there is no reason why they would be different.


    For example, in

    struct foo { int a; char b; char c[]; };

    in many real-world implementations the offset of c is 5 but the size
    of the structure is 8.

    How that can be is clear under the new wording. The most strictly
    aligned member is the int a. Assuming sizeof(int) is 4, that calls
    for 3 byte padding after the b.

    On these implementations, the size matches the
    offset of c in a similar structure where c is replaced by a length-1
    array of int, and also matches the size of a similar structure with c deleted, so this is consistent with old and new wordings.

    array of int, what? The element type of the replacement array must be
    the same:

    "First, the size of the structure shall be equal to the offset of the
    last element of an otherwise identical structure that replaces the
    flexible array member with an array of unspecified length.(106)
    ---
    106. The length is unspecified to allow for the fact that
    implementations may give array members different
    alignments according to their lengths."

    The structure being "otherwise identical" means only the array size
    varies in an unspecified manner.

    The alignment changing for an array of char, due to variation in size,
    makes no sense.

    Those compilers you refer to do *not* vary the alignment of a char
    array based on its size. For any X from, say, 1 to 256,
    the offset of c will be 5 in the following:

    struct foo { int a; char b; char c[X]; };

    C99 requires them to give a size of 5 to this structure. And that
    of course causes an alignment problem if the structure is arrayed.

    The real solution to all this would have been to specify that
    structures which have a flexible array member, directly or
    recursively, are incomplete types.

    And thus:

    sizeof (struct foo) -> constraint violation

    { struct foo local_foo; } -> constraint violation

    typedef struct foo foo_array[42]; -> constraint violation

    Don't support taking the size, defining objects, or making arrays.

    The present wording does allow sane use of these structures as array
    elements.

    What GCC seems to be doing is simply nothing special. When determining
    the most strictly aligned member of the struct, it takes the flexible
    array into account (the alignment of its element type). It otherwise
    ignores it (or perhaps treats it as a size zero subobject). The
    structure is padded after that for the sake of the most strictly aligned member.

    If it were specified that way in ISO C, it would be an improvement:
    that the array's element type is used when determining what is the
    most strictly aligned member of the structure, but the array is
    otherwise considered deleted.

    I don't think the updated wording alters any implementation requirement,
    but it does seem quite a bit less complicated to explain.

    Don't get burned: don't rely on the size of a flexible array struct.
    Use the offsetof that flexible member.

    An evil compiler could probably make the size less than the offset
    of the flexible array member and be conforming, with both old and
    new wordings. This would break some examples but an evil compiler
    obviously won't care about non-normative trivialities like examples.

    If the size is anything other than what the program expects, whether
    it is larger or smaller, that breaks the program.

    For instance, if the wrong value is used when displacing a pointer to
    the flexible member to recover a pointer to the struct.

    This issue showed up in exactly one program of mine in which I
    experimented with using the flexible array member.

    It was reported by a user who ran into a crash.

    In all previous coding I have always used the [1] struct hack,
    or else something like this:

    #if __STDC_VERSION__ >= 199901L
    #define FLEX_ARRAY
    #elif __GNUC__
    #define FLEX_ARRAY 0
    #else
    #define FLEX_ARRAY 1
    #endif

    struct foo { ...; char s[FLEX_ARRAY]; }

    and then of course in such a program you wouldn't think of
    using anything other than offsetof(struct foo, s).

    But it has also showed up in another way in another program of mine.

    In the TXR Lisp FFI type compiler, the size of a flexible struct
    is not calculated the way GCC does it.

    TXR 296:

    1> (typedef test (struct test (i int) (s short) (a (array char))))
    #<ffi-type (struct test (i int) (s short) (a (array char)))>
    2> (sizeof test)
    6
    3> (offsetof test a)
    6

    Private repo with fix:

    1> (typedef test (struct test (i int) (s short) (a (array char))))
    #<ffi-type (struct test (i int) (s short) (a (array char)))>
    2> (sizeof test)
    8
    3> (offsetof test a)
    6
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jeremy Brubaker@jbrubake.362@orionarts.invalid to comp.lang.c on Wed Oct 9 12:55:42 2024
    From Newsgroup: comp.lang.c

    On 2024-10-07, Kaz Kylheku wrote:
    On 2024-10-07, Nick Bowler <nbowler@draconx.ca> wrote:
    On Mon, 7 Oct 2024 02:32:13 -0000 (UTC), Kaz Kylheku wrote:
    What GCC seems to be doing is simply nothing special. When determining
    the most strictly aligned member of the struct, it takes the flexible
    array into account (the alignment of its element type). It otherwise
    ignores it (or perhaps treats it as a size zero subobject). The
    structure is padded after that for the sake of the most strictly
    aligned member.

    Don't get burned: don't rely on the size of a flexible array struct.
    Use the offsetof that flexible member.

    If the size is anything other than what the program expects, whether
    it is larger or smaller, that breaks the program.

    For instance, if the wrong value is used when displacing a pointer to
    the flexible member to recover a pointer to the struct.

    This issue showed up in exactly one program of mine in which I
    experimented with using the flexible array member.

    It was reported by a user who ran into a crash.


    As the user who had the pleasure of running into said crash, here is a
    brief demo of the sizes and addresses reported by my system (gcc 13.3.1)
    using both methods of determining the start of the struct:


    #include <stdio.h>
    #include <stdlib.h>
    #include <stddef.h>

    typedef struct dstr {
    int a;
    size_t b;
    int c;
    char str[];
    } dstr;

    typedef struct ref {
    int a;
    size_t b;
    int c;
    } ref;

    #define old_dstr_of(str) ((dstr *) ((str) - sizeof (dstr)))
    #define new_dstr_of(s) ((dstr *) ((s) - offsetof (struct dstr, str)))

    int main (int argc, char ** argv)
    {
    dstr *ds = malloc (sizeof (dstr));

    printf ("sizeof(int) %zu\n", sizeof (int));
    printf ("sizeof(char) %zu\n", sizeof (char));
    printf ("sizeof(size_t) %zu\n", sizeof (size_t));
    printf ("sizeof(dstr) %zu\n", sizeof (dstr));
    printf ("sizeof(ref) %zu\n", sizeof (ref));
    puts ("");

    puts ("Addresses:");
    printf ("ds %p\n", ds);
    printf ("ds->str %p\n", ds->str);
    printf ("old dstr_of %p\n", old_dstr_of(ds->str));
    printf ("new dstr_of %p\n", new_dstr_of(ds->str));

    }

    And the output on my machine:

    sizeof(int) 4
    sizeof(char) 1
    sizeof(size_t) 8
    sizeof(dstr) 24
    sizeof(ref) 24

    Addresses:
    ds 0x9d62a0
    str 0x9d62b4
    old dstr_of 0x9d629c
    new dstr_of 0x9d62a0
    --
    () www.asciiribbon.org | Jeremy Brubaker
    /\ - against html mail | јЬruЬаkе@оrіоnаrtѕ.іо / neonrex on IRC

    Success is something I will dress for when I get there, and not until.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Wed Oct 9 15:06:23 2024
    From Newsgroup: comp.lang.c

    Jeremy Brubaker <jbrubake.362@orionarts.invalid> writes:
    On 2024-10-07, Kaz Kylheku wrote:
    On 2024-10-07, Nick Bowler <nbowler@draconx.ca> wrote:
    On Mon, 7 Oct 2024 02:32:13 -0000 (UTC), Kaz Kylheku wrote:
    What GCC seems to be doing is simply nothing special. When determining
    the most strictly aligned member of the struct, it takes the flexible
    array into account (the alignment of its element type). It otherwise
    ignores it (or perhaps treats it as a size zero subobject). The
    structure is padded after that for the sake of the most strictly
    aligned member.

    Don't get burned: don't rely on the size of a flexible array struct.
    Use the offsetof that flexible member.

    If the size is anything other than what the program expects, whether
    it is larger or smaller, that breaks the program.

    For instance, if the wrong value is used when displacing a pointer to
    the flexible member to recover a pointer to the struct.

    This issue showed up in exactly one program of mine in which I
    experimented with using the flexible array member.

    It was reported by a user who ran into a crash.


    As the user who had the pleasure of running into said crash, here is a
    brief demo of the sizes and addresses reported by my system (gcc 13.3.1) >using both methods of determining the start of the struct:


    #include <stdio.h>
    #include <stdlib.h>
    #include <stddef.h>

    typedef struct dstr {
    int a;
    size_t b;
    int c;
    char str[];
    } dstr;

    typedef struct ref {
    int a;
    size_t b;
    int c;
    } ref;

    #define old_dstr_of(str) ((dstr *) ((str) - sizeof (dstr)))
    #define new_dstr_of(s) ((dstr *) ((s) - offsetof (struct dstr, str)))

    int main (int argc, char ** argv)
    {
    dstr *ds = malloc (sizeof (dstr));

    printf ("sizeof(int) %zu\n", sizeof (int));
    printf ("sizeof(char) %zu\n", sizeof (char));
    printf ("sizeof(size_t) %zu\n", sizeof (size_t));
    printf ("sizeof(dstr) %zu\n", sizeof (dstr));
    printf ("sizeof(ref) %zu\n", sizeof (ref));
    puts ("");

    puts ("Addresses:");
    printf ("ds %p\n", ds);
    printf ("ds->str %p\n", ds->str);
    printf ("old dstr_of %p\n", old_dstr_of(ds->str));
    printf ("new dstr_of %p\n", new_dstr_of(ds->str));

    }

    And the output on my machine:

    sizeof(int) 4
    sizeof(char) 1
    sizeof(size_t) 8
    sizeof(dstr) 24
    sizeof(ref) 24

    Addresses:
    ds 0x9d62a0
    str 0x9d62b4
    old dstr_of 0x9d629c
    new dstr_of 0x9d62a0

    On my system your program produces
    similar results.

    $ /tmp/aa
    sizeof(int) 4
    sizeof(char) 1
    sizeof(size_t) 8
    sizeof(dstr) 24
    sizeof(ref) 24

    Addresses:
    ds 0x2350010
    str 0x2350024
    old dstr_of 0x235000c
    new dstr_of 0x2350010

    However, after modifying the
    structure definitions to avoid internal padding:

    typedef struct dstr {
    int a;
    int c;
    size_t b;
    char str[];
    } dstr;

    typedef struct ref {
    int a;
    int c;
    size_t b;
    } ref;


    $ /tmp/aa
    sizeof(int) 4
    sizeof(char) 1
    sizeof(size_t) 8
    sizeof(dstr) 16
    sizeof(ref) 16

    Addresses:
    ds 0xbce010
    str 0xbce020
    old dstr_of 0xbce010
    new dstr_of 0xbce010
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Oct 13 21:55:32 2024
    From Newsgroup: comp.lang.c

    Nick Bowler <nbowler@draconx.ca> writes:

    On Mon, 7 Oct 2024 02:32:13 -0000 (UTC), Kaz Kylheku wrote:

    I can't think of a reasonable interpretation of the original
    wording which would allow the size to be other than the offset
    of the array, when the array is of a character type.

    The current wording clearly does allow the size to go beyond
    the offset in that case.

    The original wording includes no requirement that the offset of
    the replacement array used for the size calculation has any
    relationship whatsoever with the offset of the flexible array
    member.

    The original wording is moot because it was superseded by the TC.
    The purpose of a TC is not to change the language but to clarify
    what semantics are intended. The point of the revised wording in
    the TC is to say "this is what the earlier wording meant".
    --- Synchronet 3.20a-Linux NewsLink 1.114