• Breaking a table of record rows into an array

    From Mr. Man-wai Chang@toylet.toylet@gmail.com to comp.lang.awk on Fri Mar 1 21:33:55 2024
    From Newsgroup: comp.lang.awk

    I am new to Awk programmin.

    Given a text table with the following sample entry:

    [ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6] frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
    dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

    How do you use Awk to quickly & easily break it into:

    bssid="04:9F:xx:xx:xx:xx";
    ssid[bssid]="[HOME]";
    channel[bssid]="6";
    frequency[bssid]="2437";
    ....
    rate[bssid]="450;
    enc[bssid]="Group-AES-CCMP CCMP PSK2";
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Mar 1 15:52:42 2024
    From Newsgroup: comp.lang.awk

    On 01.03.2024 14:33, Mr. Man-wai Chang wrote:
    I am new to Awk programmin.

    Given a text table with the following sample entry:

    [ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6] frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
    dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

    Is that all on one line? (If it's on multiple lines you should
    provide more context information, how more than one records are
    separated from each other.)


    How do you use Awk to quickly & easily break it into:

    The nasty thing is the nested '[...]'.

    One quick way is to choose an appropriate field separator. For
    example

    BEGIN { FS="] " }
    { for (i=1; i<=NF; i++)
    print $i
    }

    will produce on one data line like the above (it also works if
    the data is spread across three lines, but you still need to
    know the record separators then)...

    [ 8
    SSID[ [HOME]
    BSSID[04:9F:xx:xx:xx:xx
    channel[ 6]
    frequency[2437
    numsta[1
    rssi[-63
    noise[-75
    beacon[98
    cap[1411]
    dtim[0
    rate[450
    enc[Group-AES-CCMP CCMP PSK2

    If the basic splitting is okay you can do the formatting;
    using sub() or gsub() on $i to remove/replace parts of the
    text (e.g. to remove undesired spaces), use string
    concatenation (e.g. to add the "]" again which had been
    removed with the field splitting), etc., whatever needed.

    Janis


    bssid="04:9F:xx:xx:xx:xx";
    ssid[bssid]="[HOME]";
    channel[bssid]="6";
    frequency[bssid]="2437";
    ....
    rate[bssid]="450;
    enc[bssid]="Group-AES-CCMP CCMP PSK2";

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jeorge@jeorge@invalid.invalid to comp.lang.awk on Fri Mar 1 15:59:59 2024
    From Newsgroup: comp.lang.awk

    I am new to Awk programming.

    Given a text table with the following sample entry:

    [ 8] SSID[ [HOME]] BSSID[04:9F:xx:xx:xx:xx] channel[ 6] frequency[2437] numsta[1] rssi[-63] noise[-75] beacon[98] cap[1411]
    dtim[0] rate[450] enc[Group-AES-CCMP CCMP PSK2 ]

    How do you use Awk to quickly & easily break it into:

    bssid="04:9F:xx:xx:xx:xx";
    ssid[bssid]="[HOME]";
    channel[bssid]="6";
    frequency[bssid]="2437";
    ....
    rate[bssid]="450;
    enc[bssid]="Group-AES-CCMP CCMP PSK2";

    Found your issue interesting enough to attempt a solution:


    #../sandbox/test.awk
    BEGIN { FS="\\[[ []*" ; RS="]" }
    { sub("\n","")
    for (i=1; i<=NF; i+=2) {
    ($i ~ /^$/) ? $i = "Station" : sub(/^ */,"\t",$i)
    if ($(i+1) != "")
    printf "%s[bssid] = %s\n", $i,$(i+1)
    } }

    $ nawk -f test.awk test.data
    Station[bssid] = 8
    SSID[bssid] = HOME
    BSSID[bssid] = 04:9F:xx:xx:xx:xx
    channel[bssid] = 6
    frequency[bssid] = 2437
    numsta[bssid] = 1
    rssi[bssid] = -63
    noise[bssid] = -75
    beacon[bssid] = 98
    cap[bssid] = 1411
    dtim[bssid] = 0
    rate[bssid] = 450
    enc[bssid] = Group-AES-CCMP CCMP PSK2
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Mr. Man-wai Chang@toylet.toylet@gmail.com to comp.lang.awk on Sat Mar 2 00:23:22 2024
    From Newsgroup: comp.lang.awk

    On 1/3/2024 11:59 pm, jeorge@invalid.invalid wrote:
    I am new to Awk programming.

    Given a text table with the following sample entry:


    Being new to Awk programming, I am amazed to learn that Awk can
    automaticlaly use a string as an array index. There is also automatic type-conversion. Very much like Visual Foxpro and other dBase dialects I
    am more fluent with! :)

    But all dBase dialects cannot directly use a string as an array index.
    You can work around it using macro substitution, but it's not direct
    solution like Awk array.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Mr. Man-wai Chang@toylet.toylet@gmail.com to comp.lang.awk on Sat Mar 2 00:26:12 2024
    From Newsgroup: comp.lang.awk

    On 1/3/2024 10:52 pm, Janis Papanagnou wrote:

    The nasty thing is the nested '[...]'.

    One quick way is to choose an appropriate field separator. For
    example


    Even more nasty is that wifi SSID can use any kind of printable
    characters, INCLUDING Unicode! :)

    Some hardware manufactures like Cisco do restrict the printable
    characters you can use in setting the SSID.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Mr. Man-wai Chang@toylet.toylet@gmail.com to comp.lang.awk on Tue Mar 12 01:41:32 2024
    From Newsgroup: comp.lang.awk

    On 1/3/2024 10:52 pm, Janis Papanagnou wrote:

    BEGIN { FS="] " }
    { for (i=1; i<=NF; i++)
    print $i
    }

    Use of `NF` in awk command - Stack Overflow https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Mon Mar 11 11:46:41 2024
    From Newsgroup: comp.lang.awk

    "Mr. Man-wai Chang" <toylet.toylet@gmail.com> writes:
    On 1/3/2024 10:52 pm, Janis Papanagnou wrote:
    BEGIN { FS="] " }
    { for (i=1; i<=NF; i++)
    print $i
    }

    Use of `NF` in awk command - Stack Overflow https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command

    That's a question about code that overwrites the value of NF.
    How is it relevant?
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Tue Mar 12 00:08:17 2024
    From Newsgroup: comp.lang.awk

    On 11.03.2024 18:41, Mr. Man-wai Chang wrote:
    On 1/3/2024 10:52 pm, Janis Papanagnou wrote:

    BEGIN { FS="] " }
    { for (i=1; i<=NF; i++)
    print $i
    }

    Use of `NF` in awk command - Stack Overflow

    So what?

    You want a more cryptic way? - Here it is...

    BEGIN { FS="] " ; OFS="\n" }
    { NF=NF } 1

    or

    BEGIN { FS="] " ; OFS="\n" }
    { $1=$1 } 1


    Mind, though, that for a program skeleton to solve your task
    my original code is easier to adjust for your data processing.
    You are aware that it's just the first step and needs further
    processing, aren't you?

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Tue Mar 12 17:21:09 2024
    From Newsgroup: comp.lang.awk

    On 3/11/2024 12:41 PM, Mr. Man-wai Chang wrote:
    On 1/3/2024 10:52 pm, Janis Papanagnou wrote:

       BEGIN { FS="] " }
       { for (i=1; i<=NF; i++)
           print $i
       }

    Use of `NF` in awk command - Stack Overflow https://stackoverflow.com/questions/47216786/use-of-nf-in-awk-command

    Why did you post that link to an apparently unrelated question which has
    all wrong answers (or incomplete at best - the effect of setting `NF` is undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently depending on whether you're setting it to a higher or lower than
    original value)?

    Please always provide enough context in your posts for us to be able to understand why you're posting.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From arnold@arnold@freefriends.org (Aharon Robbins) to comp.lang.awk on Wed Mar 13 09:21:44 2024
    From Newsgroup: comp.lang.awk

    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently >depending on whether you're setting it to a higher or lower than
    original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Wed Mar 13 09:22:35 2024
    From Newsgroup: comp.lang.awk

    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently >>depending on whether you're setting it to a higher or lower than
    original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
    """
    NF
    The number of fields in the current record. Inside a BEGIN action,
    the use of NF is undefined unless a getline function without a var
    argument is executed previously. Inside an END action, NF shall
    retain the value it had for the last record read, unless a
    subsequent, redirected, getline function without a var argument is
    performed prior to entering the END action.
    """

    I don't see an explicit statement that assigning to NF has undefined
    behavior. The last sentence seems to imply, if taken literally, that
    assigning to NF doesn't change its value, at least within an END
    section. Perhaps it's merely an oversight, or perhaps I've missed
    something.

    Do you see something in POSIX that defines the behavior of assigning to
    NF?
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@433-929-6894@kylheku.com to comp.lang.awk on Wed Mar 13 18:24:37 2024
    From Newsgroup: comp.lang.awk

    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in >>>different awk variants and even in 1 awk variant can behave differently >>>depending on whether you're setting it to a higher or lower than >>>original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were to restore $4 then that would mean it had continued to exist, but was only hidden.

    The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.

    I reproduced the behavior carefully in the awk macro of TXR Lisp:

    $ echo '1 2 3 4' | txr -e '(awk (t (set nf 1) (set nf 3) (prn [f 1])))'

    $ echo '1 2 3 4' | txr -e '(awk (t (set nf 3) (prn [f 1])))'
    2

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
    """
    NF
    The number of fields in the current record. Inside a BEGIN action,
    the use of NF is undefined unless a getline function without a var
    argument is executed previously. Inside an END action, NF shall
    retain the value it had for the last record read, unless a
    subsequent, redirected, getline function without a var argument is
    performed prior to entering the END action.

    This looks defective. The value of NF observed in END must obviously
    be the last stored one, however it was stored, whether by assignment
    or getline.

    Note that NF is also recalculated if $0 is assigned, which is
    explicitly required in the document; it is glaringly defective to
    be appearing to be making an exception for getline but not for
    assignment to $0.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Wed Mar 13 14:15:56 2024
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in >>>>different awk variants and even in 1 awk variant can behave differently >>>>depending on whether you're setting it to a higher or lower than >>>>original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That describes what happens if NF is modified by assignment, but I don't
    see that it implies that such an assignment is allowed.

    That implies it must cease to exist; i.e. be destroyed. If setting NF = 4 were
    to restore $4 then that would mean it had continued to exist, but was only hidden.

    The behavior is present in GNU Awk, Mawk, BusyBox Awk and others.

    I accept that most, quite possible all, implementations of Awk allow
    assignment to NF, with the semantics of dropping fields after $NF or
    adding new fields if the value decreases or increases, respectively.
    And on the basis of that, I accept that POSIX *should* specify the
    behavior of assigning to NF -- especially if the original AWK book
    defines it. The second edition briefly mentions modifying NF:
    "Conversely, if NF changes, $0 is recomputed when its value is needed."

    But I can imagine a hypothetical awk-like language in which assigning to
    NF has undefined behavior. My question is, how does the POSIX
    specification not describe that language?

    Looking more closely at
    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
    it can be argued that assigning to NF *is* well defined, but it could be
    much clearer. The syntax for a simple assignment is:
    lvalue '=' expr
    where an lvalue is one of:
    NAME
    NAME '[' expr_list ']'
    '$' expr
    and:
    The token NAME shall consist of a word that is not a keyword or a
    name of a built-in function and is not followed immediately (without
    any delimiters) by the '(' character.

    Which implies that, for example, `NF = 10` is valid.

    Also, NF is a "special variable", which weakly implies that it's
    assignable.

    On the other hand, it also implies that `foo = 42` is valid where `foo`
    is the name of a user-defined function (gawk disallows it). It should
    say that the name of a user-defined function is not an lvalue.

    The POSIX description reads to me as if the authors just didn't think
    about whether assigning to NR, or to user-defined function names, should
    be permitted. The behavior of adding or removing fields when NR is
    modified by assignment is, I suggest, something that should be stated explicitly.

    [...]

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
    """
    NF
    The number of fields in the current record. Inside a BEGIN action,
    the use of NF is undefined unless a getline function without a var
    argument is executed previously. Inside an END action, NF shall
    retain the value it had for the last record read, unless a
    subsequent, redirected, getline function without a var argument is
    performed prior to entering the END action.

    This looks defective. The value of NF observed in END must obviously
    be the last stored one, however it was stored, whether by assignment
    or getline.

    Note that NF is also recalculated if $0 is assigned, which is
    explicitly required in the document; it is glaringly defective to
    be appearing to be making an exception for getline but not for
    assignment to $0.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@433-929-6894@kylheku.com to comp.lang.awk on Wed Mar 13 21:49:26 2024
    From Newsgroup: comp.lang.awk

    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in >>>>>different awk variants and even in 1 awk variant can behave differently >>>>>depending on whether you're setting it to a higher or lower than >>>>>original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That describes what happens if NF is modified by assignment, but I don't
    see that it implies that such an assignment is allowed.

    "The left-hand side of an assignment and the target of increment and
    decrement operators can be one of a variable, an array with index, or a
    field selector."

    NF is described as a variable. Some unique remarks are made about NF,
    but none deny that it's assignable like any other variable.

    But I can imagine a hypothetical awk-like language in which assigning to
    NF has undefined behavior. My question is, how does the POSIX
    specification not describe that language?

    That language is failing to support an instance of a variable
    being the left operand of an assignment, which a variable "can be".

    It looks like the violation of a requirement.

    On the other hand, it also implies that `foo = 42` is valid where `foo`
    is the name of a user-defined function (gawk disallows it).

    POSIX does say that "[t]he same name shall not be used as both a
    function parameter name and as the name of a function or a special awk variable." So foo = 42 isn't valid if foo is already a function.

    Also: "The same name shall not be used both as a variable name with
    global scope and as the name of a function. The same name shall not be
    used within the same scope both as a scalar variable and as an array."

    All that said, the business of the NF tail wagging the $1, $2, ...
    legs of the dog should be the target of at least one clarifying remark,
    and the other defects should also be corrected:

    - In a BEGIN clause NF should be undefined unless any action
    whatsoever is executed that sets its value: direct assignment,
    use of getline or assignment to $0.

    - At the start of the execution of an END clause, NF retains
    its current value (or undefined status, if it was never set);
    the END clause has no implicit effect on NF.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Wed Mar 13 18:27:30 2024
    From Newsgroup: comp.lang.awk

    On 3/13/2024 4:21 AM, Aharon Robbins wrote:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently
    depending on whether you're setting it to a higher or lower than
    original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    Arnold - I don't know about the original awk book but POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html)
    only defines what happens if you populate $X, not what happens if you
    populate NF. If you set $X awk rebuilds the record and if X is some
    value higher than the current value of NF then awk adds the intervening
    fields with the null string as their values, but POSIX doesn't specify
    what happens if you set NF to any value.

    If I'm wrong about that I'd love for you or anyone else to point me to
    the section that defines it as I've scoured the standard several times
    looking for it over the years.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Wed Mar 13 18:45:41 2024
    From Newsgroup: comp.lang.awk

    On 3/13/2024 4:49 PM, Kaz Kylheku wrote:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently >>>>>> depending on whether you're setting it to a higher or lower than
    original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That's a bit like the argument from an old episode of the comedy TV show
    "Yes, Prime Minister" in the UK where his aide says (paraphrased) "Some country has done X, we must go something. War is something, therefore we
    must go to war".

    Being able to set NF to 3 does not mean you must delete $4. Why not
    delete $1 or $2 instead? You'd still end up with 3 fields to satisfy the
    value of NF. Lots of things you can do are undefined by POSIX despite
    how sensible some impacts may seem, assigning a value to NF is just 1
    more of them.

    You could say that "$0 holds the last record read, you can use $0 in the
    END section, therefore in the END section $0 must contain the value of
    the last record read". Except that's not true. From the gawk manual (https://www.gnu.org/software/gawk/manual/html_node/I_002fO-And-BEGIN_002fEND.html#I_002fO-And-BEGIN_002fEND):

    ----
    Most probably due to an oversight, the standard does not say that $0 is
    also preserved, although logically one would think that it should be. In
    fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in
    END rules. Be aware, however, that some other implementations and many
    older versions of Unix awk do not.
    ----


    That describes what happens if NF is modified by assignment, but I don't
    see that it implies that such an assignment is allowed.

    "The left-hand side of an assignment and the target of increment and decrement operators can be one of a variable, an array with index, or a
    field selector."

    NF is described as a variable. Some unique remarks are made about NF,
    but none deny that it's assignable like any other variable.

    But I can imagine a hypothetical awk-like language in which assigning to
    NF has undefined behavior. My question is, how does the POSIX
    specification not describe that language?

    That language is failing to support an instance of a variable
    being the left operand of an assignment, which a variable "can be".

    It looks like the violation of a requirement.

    On the other hand, it also implies that `foo = 42` is valid where `foo`
    is the name of a user-defined function (gawk disallows it).

    POSIX does say that "[t]he same name shall not be used as both a
    function parameter name and as the name of a function or a special awk variable." So foo = 42 isn't valid if foo is already a function.

    Also: "The same name shall not be used both as a variable name with
    global scope and as the name of a function. The same name shall not be
    used within the same scope both as a scalar variable and as an array."

    All that said, the business of the NF tail wagging the $1, $2, ...
    legs of the dog should be the target of at least one clarifying remark,
    and the other defects should also be corrected:

    - In a BEGIN clause NF should be undefined unless any action
    whatsoever is executed that sets its value: direct assignment,
    use of getline or assignment to $0.

    - At the start of the execution of an END clause, NF retains
    its current value (or undefined status, if it was never set);
    the END clause has no implicit effect on NF.


    All of the above claims that POSIX states you can assign a value to NF.
    That may or may not be correct, I expect it is but I don't care because nothing above nor in the POSIX spec states what the IMPACT is of
    assigning a value to NF. As far as I can see there is absolutely nothing
    in the POSIX spec that says anything like "if you set NF to a higher
    value fields will be created and if you set NF to a lower value fields
    will be removed" but I'd honestly love to be proven wrong and shown the section that does defined the impact of assigning a higher or lower
    value to NF.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@433-929-6894@kylheku.com to comp.lang.awk on Thu Mar 14 00:17:48 2024
    From Newsgroup: comp.lang.awk

    On 2024-03-13, Ed Morton <mortonspam@gmail.com> wrote:
    On 3/13/2024 4:49 PM, Kaz Kylheku wrote:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in
    different awk variants and even in 1 awk variant can behave differently >>>>>>> depending on whether you're setting it to a higher or lower than >>>>>>> original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall >>>> evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That's a bit like the argument from an old episode of the comedy TV show "Yes, Prime Minister"

    But that show is the reference model for how ISO and IEEE standarization
    works.

    in the UK where his aide says (paraphrased) "Some
    country has done X, we must go something. War is something, therefore we must go to war".

    Being able to set NF to 3 does not mean you must delete $4.

    The passage says that fields do not exist beyond $NF. So if NF
    is 3, $4 doesn't exist.

    Why not
    delete $1 or $2 instead?
    You'd still end up with 3 fields to satisfy the
    value of NF.

    Because those are less than 3, the value in NF. Those exist.
    $2 and $3 exist while NF is originally 4; and continue to
    exist if it is decremented to 3. Why would $2 be victimized,
    when at no point had NF been less than 2?
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@433-929-6894@kylheku.com to comp.lang.awk on Thu Mar 14 00:22:56 2024
    From Newsgroup: comp.lang.awk

    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    That describes what happens if NF is modified by assignment, but I don't
    see that it implies that such an assignment is allowed.

    Here is a problem. In numerous implementations, when you set NF, not
    only does that set the number of fields, but $0 is recomputed.
    So instead of $1=$1 you can use NF=NF.

    $ echo '1 2 3 4' | awk -v OFS=: '{ NF=NF; print $0; }'
    1:2:3:4

    $ echo '1 2 3 4' | awk -v OFS=: '{ NF=2; print $0; }'
    1:2


    We can continue to infer that if setting NF causes certain fields to
    exist, and not others, then $0 must be reconstituted accordingly,
    just like when a field is assigned, according to the idea that Awk
    implements a kind of "reactive programming" paradigm whereby $0
    and the fields are kept in sync.

    But that's going a little unconfortably far on the proverbial limb,
    without assurance from the text.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Wed Mar 13 18:34:27 2024
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-13, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    arnold@freefriends.org (Aharon Robbins) writes:
    In article <usqkgn$he7u$2@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    the effect of setting `NF` is
    undefined behavior per POSIX and so will do different things in >>>>>>different awk variants and even in 1 awk variant can behave differently >>>>>>depending on whether you're setting it to a higher or lower than >>>>>>original value

    This is not true. The effect of setting NF was well defined
    by the original awk book and also in POSIX.

    Decreasing NF throws away fields. Increasing NF adds the
    intervening fields with the null string as their values
    and rebuilds the record.

    I don't see that in the POSIX specification.

    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.

    That describes what happens if NF is modified by assignment, but I don't
    see that it implies that such an assignment is allowed.

    "The left-hand side of an assignment and the target of increment and decrement operators can be one of a variable, an array with index, or a
    field selector."

    NF is described as a variable. Some unique remarks are made about NF,
    but none deny that it's assignable like any other variable.

    OK, I concede. It can be inferred from the POSIX specification that
    assigning to NF is allowed.

    And the specification is in serious need of a definition of what
    assigning to NF actually *does*, other than changing the value of NF.

    But I can imagine a hypothetical awk-like language in which assigning to
    NF has undefined behavior. My question is, how does the POSIX
    specification not describe that language?

    That language is failing to support an instance of a variable
    being the left operand of an assignment, which a variable "can be".

    It looks like the violation of a requirement.

    Agreed. I think.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From arnold@arnold@freefriends.org (Aharon Robbins) to comp.lang.awk on Thu Mar 14 06:19:40 2024
    From Newsgroup: comp.lang.awk

    In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Do you see something in POSIX that defines the behavior of assigning to
    NF?

    In the section "Variables and Special Values"

    | References to nonexistent fields (that is, fields after $NF), shall
    | evaluate to the uninitialized value. Such references shall not create
    | new fields. However, assigning to a nonexistent field (for example,
    | $(NF+2)=5) shall increase the value of NF; create any intervening fields
    | with the uninitialized value; and cause the value of $0 to be
    | recomputed, with the fields being separated by the value of OFS. Each
    | field variable shall have a string value or an uninitialized value when
    | created.

    It doesn't say what happens when you do NF -= 2; nonetheless, all
    traditional awks throw away fields when you do something like that.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Wed Mar 13 23:43:25 2024
    From Newsgroup: comp.lang.awk

    arnold@freefriends.org (Aharon Robbins) writes:
    In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Do you see something in POSIX that defines the behavior of assigning to
    NF?

    In the section "Variables and Special Values"

    | References to nonexistent fields (that is, fields after $NF), shall
    | evaluate to the uninitialized value. Such references shall not create
    | new fields. However, assigning to a nonexistent field (for example,
    | $(NF+2)=5) shall increase the value of NF; create any intervening fields
    | with the uninitialized value; and cause the value of $0 to be
    | recomputed, with the fields being separated by the value of OFS. Each
    | field variable shall have a string value or an uninitialized value when
    | created.

    It doesn't say what happens when you do NF -= 2; nonetheless, all
    traditional awks throw away fields when you do something like that.

    Kaz already addressed this. It's not sufficiently explicit about this behavior, but:

    """ Kaz:
    The key is this:

    References to nonexistent fields (that is, fields after $NF), shall
    evaluate to the uninitialized value.

    NF is assignable, and fields after $NF do not exist. Thus if we
    have four fields and set NF = 3, then $4 doesn't exist.
    """

    (At the time I wasn't convinced that POSIX requires NF to be
    assignable.)
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Thu Mar 14 05:50:11 2024
    From Newsgroup: comp.lang.awk

    On 3/14/2024 1:19 AM, Aharon Robbins wrote:
    In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Do you see something in POSIX that defines the behavior of assigning to
    NF?

    In the section "Variables and Special Values"

    | References to nonexistent fields (that is, fields after $NF), shall
    | evaluate to the uninitialized value. Such references shall not create
    | new fields. However, assigning to a nonexistent field (for example,
    | $(NF+2)=5) shall increase the value of NF; create any intervening fields
    | with the uninitialized value; and cause the value of $0 to be
    | recomputed, with the fields being separated by the value of OFS. Each
    | field variable shall have a string value or an uninitialized value when
    | created.

    It doesn't say what happens when you do NF -= 2; nonetheless, all
    traditional awks throw away fields when you do something like that.

    It doesn't say what happens when you do NF += 2 either. All I'm saying
    is that changing the value of NF is undefined behavior per POSIX.

    I'm not sure which awks would be considered "traditional" vs otherwise
    but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than dictates the behavior of X, so if the appropriate set of awk variants
    all behave the same way for any behavior such as this that's currently undefined by POSIX (changing the value of NF, the value of $0 in the end section, and field splitting with a null FS being the 3 most commonly
    used cases IMO) then maybe the folks who write that spec could/should
    update it to describe that behavior but I don't know which awks all
    behave the same way for those cases, nor if that's enough of them for
    POSIX to make a definition.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Thu Mar 14 07:09:59 2024
    From Newsgroup: comp.lang.awk

    On 3/14/2024 5:50 AM, Ed Morton wrote:
    On 3/14/2024 1:19 AM, Aharon Robbins wrote:
    In article <87y1am5cfo.fsf@nosuchdomain.example.com>,
    Keith Thompson  <Keith.S.Thompson+u@gmail.com> wrote:
    Do you see something in POSIX that defines the behavior of assigning to
    NF?

    In the section "Variables and Special Values"

    | References to nonexistent fields (that is, fields after $NF), shall
    | evaluate to the uninitialized value. Such references shall not create
    | new fields. However, assigning to a nonexistent field (for example,
    | $(NF+2)=5) shall increase the value of NF; create any intervening
    fields
    | with the uninitialized value; and cause the value of $0 to be
    | recomputed, with the fields being separated by the value of OFS. Each
    | field variable shall have a string value or an uninitialized value when
    | created.

    It doesn't say what happens when you do NF -= 2; nonetheless, all
    traditional awks throw away fields when you do something like that.

    It doesn't say what happens when you do NF += 2 either. All I'm saying
    is that changing the value of NF is undefined behavior per POSIX.

    I'm not sure which awks would be considered "traditional" vs otherwise
    but AFAIK POSIX is descriptive, i.e. describes how X behaves rather than dictates the behavior of X, so if the appropriate set of awk variants
    all behave the same way for any behavior such as this that's currently undefined by POSIX (changing the value of NF, the value of $0 in the end section, and field splitting with a null FS being the 3 most commonly
    used cases IMO) then maybe the folks who write that spec could/should
    update it to describe that behavior but I don't know which awks all
    behave the same way for those cases, nor if that's enough of them for
    POSIX to make a definition.

        Ed.

    I couldn't find any existing tickets so I just created tickets with the
    Austin Group to request that definitions for the 3 cases I listed above
    be added to the POSIX spec:

    1) Changing the value of NF =
    https://www.austingroupbugs.net/view.php?id=1820
    2) The value of $0, $1, etc. in an END section = https://www.austingroupbugs.net/view.php?id=1821
    3) Splitting using a null field separator = https://www.austingroupbugs.net/view.php?id=1822

    Obviously I've no idea if they'll be implemented or not but AFAIK it
    doesn't hurt to ask. I said "in most modern awks..." in each of them, if anyone knows which specific awks behave in the ways I described (or
    which don't) then feel free to comment on the issues if you can, I just
    don't have access to multiple awk variants at this time.

    Regards,

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114