• Awk output redirection to expression - defined or not?

    From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Thu May 25 09:30:53 2023
    From Newsgroup: comp.lang.awk

    I'm certain I remember years ago reading a document that said
    (paraphrasing) "an unparenthesized expression on the right side of input
    or output redirection is undefined behavior" and I thought it was an
    older version of the POSIX spec. I now can't find that (or similar)
    statement in any of these:

    SUSV2 - https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
    SUSV3 - https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
    Current POSIX spec - https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

    or by googling.

    What I do see in the current POSIX spec is a related statement just
    about input redirection:

    Historical practice has been that:
    getline < "a" "b"

    is parsed as:
    ( getline < "a" ) "b"

    although many would argue that the intent was that the file ab should
    be read. However:
    getline < "x" + 1

    parses as:
    getline < ( "x" + 1 )

    ...
    Since in most cases such constructs are not (or at least should not)
    be used (because they have a natural ambiguity for which there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified.

    and:

    The getline operator can form ambiguous constructs when there are
    unparenthesized binary operators (including concatenate) to the right of
    the '<' (up to the end of the expression containing the getline). The
    result of evaluating such a construct is unspecified

    but nothing about output redirection. I know gawk doesn't require parens around the expression for output redirection but other awks do (e.g. see https://stackoverflow.com/q/21093626/1745001) and it's not obvious to me
    why `getline < "a" "b"` should be undefined behavior while `print > "a"
    "b"` wouldn't be so intuitively if one of them is undefined then so
    should the other be.

    Does anyone else recall seeing a statement about output redirection to
    an expression requiring parens and, if so, do you recall where it existed?

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Thu May 25 17:37:30 2023
    From Newsgroup: comp.lang.awk

    On 25.05.23 16:30, Ed Morton wrote:
    I'm certain I remember years ago reading a document that said
    (paraphrasing) "an unparenthesized expression on the right side of input
    or output redirection is undefined behavior" and I thought it was an
    older version of the POSIX spec. I now can't find that (or similar) statement in any of these:

        SUSV2 - https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
        SUSV3 - https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
        Current POSIX spec - https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

    or by googling.

    What I do see in the current POSIX spec is a related statement just
    about input redirection:

    Historical practice has been that:
    getline < "a" "b"

    is parsed as:
    ( getline < "a" ) "b"

    although many would argue that the intent was that the file ab should
    be read. However:
    getline < "x" + 1

    parses as:
    getline < ( "x" + 1 )

    ...
    Since in most cases such constructs are not (or at least should not)
    be used (because they have a natural ambiguity for which there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified.

    and:

    The getline operator can form ambiguous constructs when there are
    unparenthesized binary operators (including concatenate) to the right of
    the '<' (up to the end of the expression containing the getline). The
    result of evaluating such a construct is unspecified

    but nothing about output redirection. I know gawk doesn't require parens around the expression for output redirection but other awks do (e.g. see https://stackoverflow.com/q/21093626/1745001) and it's not obvious to me
    why `getline < "a" "b"` should be undefined behavior while `print > "a"
    "b"` wouldn't be so intuitively if one of them is undefined then so
    should the other be.

    Does anyone else recall seeing a statement about output redirection to
    an expression requiring parens and, if so, do you recall where it existed?

    What I recall is that a few times there were discussions about that,
    but there was (AFAIR) never a formal explanation.

    My thoughts about your question above are as follows...

    getline expressions might consider precedence rules, and since in
    C-like languages (as opposed to e.g. Algol68) have the precedence
    associated with the concrete symbol ('<', '>') as opposed to the
    semantic context, so 'less than' would bind stronger than 'concat'.
    In cases where (as quoted above) "conventional parsing" deviates
    from that (whatever "conventional" or "non-conventional" will be)
    it might be different.

    Note also that I wrote "getline *expressions*" as opposed to, say,
    "print *statement*"; getline is part of the expression (it has a
    value) where print has an expression argument. There is (I think)
    no expression that starts with '>' in awk, so 'print >' should be
    a redirection indication, generally.

    Depending on semantical context an expression
    if (getline < "a" + i) ...
    can make sense in both cases, try reading from "a" and adding a
    constant to the return value, or reading from "a1", "a42", etc.

    So I can see why one is undefined but not the other. And my coding
    approach would be to make the intention visible by parenthesis.

    Janis


        Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun May 28 08:52:03 2023
    From Newsgroup: comp.lang.awk

    On 5/25/2023 10:37 AM, Janis Papanagnou wrote:
    On 25.05.23 16:30, Ed Morton wrote:
    I'm certain I remember years ago reading a document that said
    (paraphrasing) "an unparenthesized expression on the right side of
    input or output redirection is undefined behavior" and I thought it
    was an older version of the POSIX spec. I now can't find that (or
    similar) statement in any of these:

         SUSV2 -
    https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
         SUSV3 -
    https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
         Current POSIX spec -
    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

    or by googling.

    What I do see in the current POSIX spec is a related statement just
    about input redirection:

    Historical practice has been that:
    getline < "a" "b"
    ;
    is parsed as:
    ( getline < "a" ) "b"
    ;
    although many would argue that the intent was that the file ab
    should be read. However:
    getline < "x" + 1
    ;
    parses as:
    getline < ( "x" + 1 )
    ;
    ...
    Since in most cases such constructs are not (or at least should
    not) be used (because they have a natural ambiguity for which there is
    no conventional parsing), the meaning of these constructs has been
    made explicitly unspecified.

    and:

    The getline operator can form ambiguous constructs when there are
    unparenthesized binary operators (including concatenate) to the right
    of the '<' (up to the end of the expression containing the getline).
    The result of evaluating such a construct is unspecified

    but nothing about output redirection. I know gawk doesn't require
    parens around the expression for output redirection but other awks do
    (e.g. see https://stackoverflow.com/q/21093626/1745001) and it's not
    obvious to me why `getline < "a" "b"` should be undefined behavior
    while `print > "a" "b"` wouldn't be so intuitively if one of them is
    undefined then so should the other be.

    Does anyone else recall seeing a statement about output redirection to
    an expression requiring parens and, if so, do you recall where it
    existed?

    What I recall is that a few times there were discussions about that,
    but there was (AFAIR) never a formal explanation.

    My thoughts about your question above are as follows...

    getline expressions might consider precedence rules, and since in
    C-like languages (as opposed to e.g. Algol68) have the precedence
    associated with the concrete symbol ('<', '>') as opposed to the
    semantic context, so 'less than' would bind stronger than 'concat'.
    In cases where (as quoted above) "conventional parsing" deviates
    from that (whatever "conventional" or "non-conventional" will be)
    it might be different.

    Note also that I wrote "getline *expressions*" as opposed to, say,
    "print *statement*"; getline is part of the expression (it has a
    value) where print has an expression argument. There is (I think)
    no expression that starts with '>' in awk, so 'print >' should be
    a redirection indication, generally.

    Depending on semantical context an expression
      if (getline < "a" + i) ...
    can make sense in both cases, try reading from "a" and adding a
    constant to the return value, or reading from "a1", "a42", etc.

    So I can see why one is undefined but not the other. And my coding
    approach would be to make the intention visible by parenthesis.

    Janis


         Ed.


    Good point about `if (getline < "foo")` being valid while `if (print >
    "foo")` is not, thanks.

    In different parts of the POSIX spec they refer to `getline` as a
    "function" and an "operator" and a "keyword" (while "print" is referred
    to as a "statement" and a "keyword") so it's a little hard to say
    exactly what `getline` is but they do also say at one point "the
    expression containing getline" so that does match your thought above
    about getline being part of an expression.

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun May 28 18:12:33 2023
    From Newsgroup: comp.lang.awk

    On 28.05.2023 15:52, Ed Morton wrote:
    On 5/25/2023 10:37 AM, Janis Papanagnou wrote:
    [...]

    Good point about `if (getline < "foo")` being valid while `if (print > "foo")` is not, thanks.

    In different parts of the POSIX spec they refer to `getline` as a
    "function" and an "operator" and a "keyword" (while "print" is referred
    to as a "statement" and a "keyword") so it's a little hard to say
    exactly what `getline` is but they do also say at one point "the
    expression containing getline" so that does match your thought above
    about getline being part of an expression.

    I haven't inspected the POSIX specs for that, but the points you
    quote here are (quite) consistent and coherent.
    Of course both (print/getline) can be [implemented as] "keywords"
    whether they are "statements" or "functions".
    The qualification [of getline] between "function" and "operator"
    I consider a bit unprecise; usually I think of these as syntactic
    differing forms

    minus(x) vs. -x
    minus(x,y) vs. x-y or even x minus y

    A function and an operator can of course both be part of an
    expression.

    Janis


    Ed.


    --- Synchronet 3.20a-Linux NewsLink 1.114