• printing words without newlines?

    From David Chmelik@dchmelik@gmail.com to alt.comp.lang.awk,comp.lang.awk on Sun May 12 04:57:16 2024
    From Newsgroup: comp.lang.awk

    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I approaching it wrong? I recall BASIC prints new lines, but as I learned
    basic C and some derivatives, I'm used to newlines only being specified... ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bruce Horrocks@07.013@scorecrow.com to alt.comp.lang.awk,comp.lang.awk on Sun May 12 09:52:51 2024
    From Newsgroup: comp.lang.awk

    On 12/05/2024 05:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I approaching it wrong? I recall BASIC prints new lines, but as I learned basic C and some derivatives, I'm used to newlines only being specified... ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    You need to set ORS in the BEGIN { } section (or on the command line).

    See <https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> for an example - just replace the "\n\n" in the example with " " to see
    the effect you are looking for.
    --
    Bruce Horrocks
    Surrey, England

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bruce Horrocks@07.013@scorecrow.com to alt.comp.lang.awk,comp.lang.awk on Sun May 12 09:55:52 2024
    From Newsgroup: comp.lang.awk

    On 12/05/2024 09:52, Bruce Horrocks wrote:
    On 12/05/2024 05:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print.  I use GNU AWK (gawk) and its sort but printing is harder to get
    working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I
    approaching it wrong?  I recall BASIC prints new lines, but as I learned
    basic C and some derivatives, I'm used to newlines only being
    specified...
    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
       while(getline<file) arr[$1]=$0
       PROCINFO["sorted_in"]="@ind_num_asc"
       for(i in arr)
       {
         split(arr[i],arr2)
         # output all words or on one line with ORS
         print arr2[2]
         # output all words on one line without needing ORS
         #printf("%s ",arr2[2])
       }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    You need to set ORS in the BEGIN { } section (or on the command line).

    See <https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> for an example - just replace the "\n\n" in the example with " " to see the effect you are looking for.


    Let me re-phrase that: it would be better to set ORS in the BEGIN {}
    section. I'm not sure why yours is not working but with some commented
    out code and some not, your example is unclear.

    If what I have suggested doesn't work for you then please re-post your
    exact code.
    --
    Bruce Horrocks
    Surrey, England

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Sun May 12 12:11:27 2024
    From Newsgroup: comp.lang.awk

    In article <e0be0c38-e14e-45ba-ac87-5e2e4bd4f5cd@scorecrow.com>,
    Bruce Horrocks <07.013@scorecrow.com> wrote:
    ...
    You need to set ORS in the BEGIN { } section (or on the command line).

    This is demonstrably false. You can set ORS whenever/wherever you want. Whatever value it has when a plain "print" statement is executed, is what
    will be used. You are probably about thinking about the various variables
    that affect input parsing. These variables clearly must be set prior to the reading of the input, which usually means they need to be set in BEGIN (or
    via something like -F or -v on the command line).

    One of my favorite idioms (and one that might actually be useful to OP) is:

    # Print every 3 input lines as a single output line
    # Yes, this single line is the whole program!
    ORS = NR % 3 ? " " : "\n"

    See ><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> >for an example - just replace the "\n\n" in the example with " " to see
    the effect you are looking for.

    Of course, the whole point of this thread is that none of us has any idea
    what OP is talking about or what his actual problem is. We can only guess... --
    "It does a lot of things half well and it's just a garbage heap of ideas that are
    mutually exclusive."

    - Ken Thompson, on C++ -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jeojet@jeojet@addr.invalid to alt.comp.lang.awk,comp.lang.awk on Sun May 12 18:22:05 2024
    From Newsgroup: comp.lang.awk

    <snip>
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get >working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I >approaching it wrong? I recall BASIC prints new lines, but as I learned >basic C and some derivatives, I'm used to newlines only being specified... >------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    <snip>

    I think you forgot that arr2 is now an array => you have to iterate over
    it as well. There were also a few other coding errors, ie. not closing
    the data.txt file; not declaring local vars in print_file_words:

    --
    $ cat test.awk
    BEGIN { print_file_words("data.txt") }

    function print_file_words(file, i,j) {
    ORS = " "
    PROCINFO["sorted_in"]="@ind_num_asc"
    while (getline <file >0)
    arr[$1] = $0
    close (file)

    for(i in arr) {
    split(arr[i],arr2)
    for (j in arr2)
    print arr2[j]
    }
    ORS = "\n"
    print ""
    }

    $ gawk -f test.awk
    all are base belong to us your
    --

    Probably this is not the best way of doing things but I think you're
    mainly just experimenting with sorting/printing so..
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Chmelik@dchmelik@gmail.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 01:09:28 2024
    From Newsgroup: comp.lang.awk

    On Sun, 12 May 2024 18:22:05 -0000 (UTC), jeojet wrote:

    <snip>
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get >>working than anything... separate lines work, but when I use printf() or >>set ORS then use print (for words one line) all awk outputs (on FreeBSD >>UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I >>approaching it wrong? I recall BASIC prints new lines, but as I learned >>basic C and some derivatives, I'm used to newlines only being
    specified... >>------------------------------------------------------------------------
    # print_file_words.awk # pass filename to function BEGIN { >>print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print function >>print_file_words(file) {
    # set record separator then use print # ORS=" "
    while(getline<file) arr[$1]=$0 PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS print arr2[2]
    # output all words on one line without needing ORS #printf("%s
    ",arr2[2])
    }
    }
    <snip>

    I think you forgot that arr2 is now an array => you have to iterate over
    it as well. There were also a few other coding errors, ie. not closing
    the data.txt file; not declaring local vars in print_file_words:

    The split() sets arr[2] equal to arr[i] current word (second column) so
    the for() already iterates to update arr2 (it only ever is a two-element
    array with a number (not printed) then word) and prints each word fine on
    new lines when not trying to print them on one line. The only problem is something went wrong with printf or ORS & print.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Chmelik@dchmelik@gmail.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 02:04:50 2024
    From Newsgroup: comp.lang.awk

    On Sun, 12 May 2024 12:11:27 -0000 (UTC), Kenny McCormack wrote:
    Of course, the whole point of this thread is that none of us has any
    idea what OP is talking about or what his actual problem is. We can
    only guess...

    Not the point. I stated I'm trying AWK... problem is in subject line. Surprisingly, after rebooting PC, it all works now (un)commenting
    particular parts (OSR or commenting out print and uncommenting printf).

    On 12/05/2024 09:52, Bruce Horrocks wrote:
    Let me re-phrase that: it would be better to set ORS in the BEGIN {}
    section. I'm not sure why yours is not working but with some commented
    out code and some not, your example is unclear.

    Okay. What I posted works to read file, sort, print lines; I commented
    out two versions that (initially) didn't work to print all on one line
    (OSR or commenting out print and uncommenting printf). After rebooting
    (maybe just needed to restart shell?) those worked as expected... with ORS
    in BEGIN but alternatively in function I wrote. I guess as Mr McCormack explained, one might have reasons to change OSR in different functions.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Chmelik@dchmelik@gmail.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 02:13:28 2024
    From Newsgroup: comp.lang.awk

    On Sun, 12 May 2024 18:22:05 -0000 (UTC), jeojet wrote:

    <snip>
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get >>working than anything... separate lines work, but when I use printf() or >>set ORS then use print (for words one line) all awk outputs (on FreeBSD >>UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I >>approaching it wrong? I recall BASIC prints new lines, but as I learned >>basic C and some derivatives, I'm used to newlines only being
    specified... >>------------------------------------------------------------------------
    # print_file_words.awk # pass filename to function BEGIN { >>print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print function >>print_file_words(file) {
    # set record separator then use print # ORS=" "
    while(getline<file) arr[$1]=$0 PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS print arr2[2]
    # output all words on one line without needing ORS #printf("%s
    ",arr2[2])
    }
    }
    <snip>

    I think you forgot that arr2 is now an array => you have to iterate over
    it as well. There were also a few other coding errors, ie. not closing
    the data.txt file; not declaring local vars in print_file_words:

    --
    $ cat test.awk BEGIN { print_file_words("data.txt") }

    function print_file_words(file, i,j) {
    ORS = " " PROCINFO["sorted_in"]="@ind_num_asc"
    while (getline <file >0)
    arr[$1] = $0
    close (file)

    for(i in arr) {
    split(arr[i],arr2)
    for (j in arr2)
    print arr2[j]
    }
    ORS = "\n"
    print ""
    }

    $ gawk -f test.awk all are base belong to us your

    My original works after rebooting after discussion in main thread (without 'Re') but thanks for instruction to close file, though I don't know you
    need to pass in i--not used outside. It's odd iterating over arr2 even
    still prints all words (wrong order) because the way I used arr2 it only
    ever had one number and one word--its point was to split out & get word,
    then for the next i, it's split again onto arr2 which is erased/updated.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jeojet@jeojet@addr.invalid to alt.comp.lang.awk,comp.lang.awk on Mon May 13 04:50:39 2024
    From Newsgroup: comp.lang.awk

    <snip>
    My original works after rebooting after discussion in main thread (without 'Re') but thanks for instruction to close file, though I don't know you
    need to pass in i--not used outside. It's odd iterating over arr2 even
    still prints all words (wrong order) because the way I used arr2 it only
    ever had one number and one word--its point was to split out & get word,
    then for the next i, it's split again onto arr2 which is erased/updated.

    You're right that in your particular data case --one word per line--
    arr2 is always of length 1 => you could use arr2[1]. But creating
    the arr2 array via split() isn't even necessary since arr will print
    out in the order specified in PROCINFO["sorted_in"]:
    --
    $ cat test.awk
    BEGIN { print_file_words("data.txt") }

    function print_file_words(file, i) {
    ORS = " "
    PROCINFO["sorted_in"]="@ind_num_asc"
    while (getline <file >0)
    arr[$1] = $0
    close (file)

    for(i in arr)
    print arr[i]
    ORS = "\n"
    print ""
    }

    $ gawk -f test.awk data.txt
    all are base belong to us your
    -

    WRT close() you should do it whenever you're finish reading from a
    file OR command. WRT user-defined functions, variables intended to be
    local to the function should be declared otherwise they become global variables; try removing the "i" from the function print_file_words()
    definition and tacking on the following to your code:

    END { print "i =", i }

    which will print "i = your" as the last line of output.

    Have fun,
    -j
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Mon May 13 06:56:50 2024
    From Newsgroup: comp.lang.awk

    In article <v1pi7c$2b87j$1@dont-email.me>,
    David Chmelik <dchmelik@gmail.com> wrote:
    ...
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    I guess this is what you actually want:

    { A[$1] = $2 }
    END {
    len = length(A)
    for (i=1; i<=len; i++)
    printf("%s%s",A[i],i<len ? " " : "\n")
    }
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Noam
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 10:18:40 2024
    From Newsgroup: comp.lang.awk

    On 12.05.2024 06:57, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)... is this normal (and I made mistake?) or am I approaching it wrong? I recall BASIC prints new lines, but as I learned basic C and some derivatives, I'm used to newlines only being specified...

    IIUC you meanwhile have your script running, and probably code similar
    to

    BEGIN { print_file_words("data.txt"); }

    function print_file_words(file) {
    while (getline <file >0)
    arr[$1] = $0
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (i in arr) {
    split (arr[i], arr2)
    printf "%s ", arr2[2]
    }
    printf "\n"
    }

    I suggest to add the '>0' test to your code, and also print a final
    "\n" so that your command line prompt doesn't overwrite your output.
    Note also that printf (like print) is a command, no function. Adding
    local variable declarations is also sensible to not get problems if
    you operate your code in other source code contexts.

    Janis

    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "
    while(getline<file) arr[$1]=$0
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }
    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Mon May 13 14:53:38 2024
    From Newsgroup: comp.lang.awk

    In article <v1sdji$tofu$2@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    ...
    I guess this is what you actually want:

    { A[$1] = $2 }
    END {
    len = length(A)
    for (i=1; i<=len; i++)
    printf("%s%s",A[i],i<len ? " " : "\n")
    }

    Improved version:

    { A[$1] = $2 }
    END {
    for (i=1; i<=NR; i++)
    printf("%s%s",A[i],i<NR ? " " : "\n")
    }

    Note that the value of NR in END is sort of a gray area, but it works as expected in GAWK, which is really all we care about.
    --
    [Donald] Trump didn't have it all handed to him by his parents,
    like Hillary Clinton did.

    - Some dumb cluck in Ohio; featured in Michael Moore's "Trumpland" - --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 16:49:59 2024
    From Newsgroup: comp.lang.awk

    On 2024-05-12, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <e0be0c38-e14e-45ba-ac87-5e2e4bd4f5cd@scorecrow.com>,
    Bruce Horrocks <07.013@scorecrow.com> wrote:
    ...
    You need to set ORS in the BEGIN { } section (or on the command line).

    This is demonstrably false. You can set ORS whenever/wherever you want. Whatever value it has when a plain "print" statement is executed, is what will be used. You are probably about thinking about the various variables that affect input parsing. These variables clearly must be set prior to the reading of the input, which usually means they need to be set in BEGIN (or via something like -F or -v on the command line).

    One of my favorite idioms (and one that might actually be useful to OP) is:

    # Print every 3 input lines as a single output line
    # Yes, this single line is the whole program!
    ORS = NR % 3 ? " " : "\n"

    See >><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> >>for an example - just replace the "\n\n" in the example with " " to see >>the effect you are looking for.

    Of course, the whole point of this thread is that none of us has any idea what OP is talking about or what his actual problem is. We can only guess...

    The problem seems to be that there is a file of words preceded by
    unique integer ranks which indicate the order. They are to be reproduced
    in rank order, on one line.

    s is the TXR Lisp interactive listener of TXR 294.
    Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet. Self-assembly keeps TXR costs low; but ask about our installation service!
    (flow "data.txt"
    file-get-lines
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    transpose
    (select (second @1) (first @1))
    (join-with " ")
    put-line)
    all your base are belong to us

    We can insert prints into the pipeline to see the transformations:

    (flow "data.txt"
    prinl
    file-get-lines
    prinl
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    prinl
    transpose
    prinl
    (select (second @1) (first @1))
    prinl
    (join-with " ")
    prinl
    put-line)
    "data.txt"
    ("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
    (#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
    #(5 "to"))
    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
    #("all" "your" "base" "are" "belong" "to" "us")
    "all your base are belong to us"
    all your base are belong to us
    t

    That is tedious; say, why not make a macro dflow (debug flow) which inserts those prinl's for us?

    (defmacro dflow (. args)
    ^(flow ,*(interpose 'prinl args)))
    dflow

    Sanity check: is it inserting prinls?

    (macroexpand-1 '(dflow a b c d))
    (flow a prinl
    b prinl c prinl
    d)

    Use dflow:

    (dflow "data.txt"
    file-get-lines
    (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
    transpose
    (select (second @1) (first @1))
    (join-with " ")
    put-line)
    "data.txt"
    ("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
    (#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
    #(5 "to"))
    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
    #("all" "your" "base" "are" "belong" "to" "us")
    "all your base are belong to us"
    all your base are belong to us
    t

    After file-get-lines we have a list of strings like "2 your".

    We map those through an anonymous function which matches the
    string pattern `@a @b` to capture the space-separated text pieces.
    A is converted to integer and mapped to its predecessor
    (because we want to use it as an index, and indexing is zero based).
    We map each string to a two element vector consisting of the
    zero-based index as an integer type, and a string, so now we have:

    (#(1 "your") #(0 "all") ...)

    #(a b c) is a vector notation.

    Then we want to transpose rows to columns to get the integer
    column as a vector, and the values as a vector.

    #(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))

    Now we use the built-in function select which selects elements out
    of a sequence, based on indices supplied in another sequence.

    Now we have the vector of words in the right order; we just
    join with a space.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 17:17:05 2024
    From Newsgroup: comp.lang.awk

    On 2024-05-12, David Chmelik <dchmelik@gmail.com> wrote:
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is silly, because data is already sorted: we are given the positional rank of every word, which is a way of capturing order. All we have to do is visit the words in that order.

    We can do that by iterating an index i from 1 to the highest index
    we have seen. If there is a rank[i] entry, then we print it.
    (We do this "(i in rank)" check in case there are gaps in the rank
    sequence.)

    After we print one word, we start using the " " separator before all
    subsequent words.

    If we must sort, there is the sort utility:

    $ sort -n data.txt | awk '{ printf("%s%s", sep, $2); sep = " " }' && echo
    all your base are belong to us

    Also, if we can suffer a spurious trailing space:

    $ sort -n data.txt | awk '{ print $2 }' | tr '\n' ' ' && echo
    all your base are belong to us
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Mon May 13 17:26:56 2024
    From Newsgroup: comp.lang.awk

    In article <20240513100418.652@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    ...
    (This version more complicated than it needs to be, but essentially the
    same as what I posted earlier)
    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is

    But GNU extensions are good - especially since OP specifically mentioned
    using GAWK. And much more on-topic than Lisp (et al).

    Final note: In fact, it has been established (on this newsgroup as well as empirically by me and others) that if the indices are small integers, you
    get sorting for free (in GAWK, which, as noted, is all we care about). So,
    you don't even really need to mess with PROCINFO[]...

    And, one more note about sorting. Some responders on this thread have
    gotten confused about what is to be sorted. They assumed that OP wanted
    the words sorted (alphabetically), when, in fact, he just wants them sorted (numerically) by the position number (the first field in the data line).
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Mandela
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to alt.comp.lang.awk,comp.lang.awk on Mon May 13 23:33:07 2024
    From Newsgroup: comp.lang.awk

    On 2024-05-13, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <20240513100418.652@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    ...
    (This version more complicated than it needs to be, but essentially the
    same as what I posted earlier)
    $ awk '{
    if ($1 > max) max = $1;
    rank[$1] = $2
    }

    END {
    for (i = 1; i <= max; i++)
    if (i in rank) {
    printf("%s%s", sep, rank[i]);
    sep = " "
    }
    print ""
    }' data.txt
    all your base are belong to us

    We do not perform any sort, and so we don't require GNU extensions. Sorting is

    But GNU extensions are good - especially since OP specifically mentioned using GAWK. And much more on-topic than Lisp (et al).

    The above performs O(N) steps, whereas sorting is O(N log N),
    and sometimes worse due to degenerate cases in some algorithms.

    Why use an extension that only makes the program more verbose and brings
    in an unnecessary algorithm.

    Final note: In fact, it has been established (on this newsgroup as well as empirically by me and others) that if the indices are small integers, you
    get sorting for free (in GAWK, which, as noted, is all we care about). So, you don't even really need to mess with PROCINFO[]...

    Are you referring to the idea of just replacing the above for + if
    structure with:

    for (i in rank) {

    }

    and relying on the small integer indices being hashed in order?

    Where is that documented? The manual reiterates that this is not
    specified: "By default, the order in which a ‘for (indx in array)’ loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk."
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to alt.comp.lang.awk,comp.lang.awk on Thu May 16 08:11:35 2024
    From Newsgroup: comp.lang.awk

    On 5/11/2024 11:57 PM, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)...

    Your input file probably has DOS line endings, see https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it
    for what that means and how to deal with them but basically either run `dos2unix` on your file before calling awk or add `sub(\r$/,"")` as I
    show below*.

    is this normal (and I made mistake?) or am I
    approaching it wrong? I recall BASIC prints new lines, but as I learned basic C and some derivatives, I'm used to newlines only being specified... ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "

    Move the above to a BEGIN section so it is executed once total instead
    of once per input line.

    while(getline<file) arr[$1]=$0

    The above would spin off into an infinite loop if getline failed since
    in that case it'd return a negative number which would still evaluate to "true" when tested as a condition. It needs to be:

    while ( (getline < file) > 0 ) arr[$1] = $0

    See http://awk.freeshell.org/AllAboutGetline for that and more info on
    using getline.

    *This is where you'd strip CRs from the end of input lines. Do either of these, the first uses a non-POSIX extension function gensub() (which
    gawk has), the second would work in any awk:

    a) while ( (getline < file) > 0 ) arr[$1] = gensub(/\r$/,"",1)

    b) while ( (getline < file) > 0 ) { sub(/\r$/,""); arr[$1] = $0 }


    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in arr)
    {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
    }

    Add `print RS` after the loop if you had set ORS to a blank so the
    output ends in a newline and therefore is a valid POSIX text file,
    otherwise YMMV with what subsequent text processing tools can do with it.

    Ed.

    }
    ------------------------------------------------------------------------
    # sample data.txt
    2 your
    1 all
    3 base
    5 belong
    4 are
    7 us
    6 to

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to alt.comp.lang.awk,comp.lang.awk on Thu May 16 15:55:35 2024
    From Newsgroup: comp.lang.awk

    On 16.05.2024 15:11, Ed Morton wrote:
    On 5/11/2024 11:57 PM, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get
    working than anything... separate lines work, but when I use printf() or
    set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)...

    [...]
    ------------------------------------------------------------------------
    # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "

    Move the above to a BEGIN section so it is executed once total instead
    of once per input line.

    A function definition called once from the BEGIN section isn't
    called "once per input line".

    Janis


    [...]
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Thu May 16 14:15:59 2024
    From Newsgroup: comp.lang.awk

    In article <v2538p$1jmvm$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    ...
    A function definition called once from the BEGIN section isn't
    called "once per input line".

    Especially since it is commented out, so it executes exactly zero times.

    Actually setting ORS (or any other similar variable) inside a function definition is not such a bad idea, in terms of modularity.
    --
    To all the people worried about how bad it would look to have a public trial of a
    former president (and all the usual verbiage that we heard in 1974), I say this to DJT:
    Just plead guilty, take your medicine, do your time, just fade away.
    For the good of the country. Do the right thing.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Thu May 16 15:17:42 2024
    From Newsgroup: comp.lang.awk

    In article <v254ev$125p2$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <v2538p$1jmvm$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    ...
    A function definition called once from the BEGIN section isn't
    called "once per input line".

    Especially since it is commented out, so it executes exactly zero times.

    Actually setting ORS (or any other similar variable) inside a function >definition is not such a bad idea, in terms of modularity.

    In fact, I'd like to expand on that. It is commonly held that a
    well-written function that changes the values of "special variables" should save and restore them. I.e.:

    function foo(arg1, arg2, ...) {
    oldORS = ORS
    ORS = new value
    ...
    ORS = oldORS
    }

    But in fact, in practice, this can get tricky - due to vagaries of the AWK language. What would really be nice is if you could declare special
    variables in the parameter list - which would give them the "local
    variable" treatment. I.e.:

    function foo(arg1, arg2, ..., ORS) {
    ORS = new value
    ...
    }

    Now, ORS would be magically restored to its previous value w/o the function having to deal with it (**). Unfortunately, neither GAWK nor TAWK allows this. GAWK gives an error message saying you can't use special variables in arg lists. TAWK just silently ignores the attempt.

    What would be even better is if this happened magically w/o needing to do
    the above parameter trick. An argument can be made that changes to special variables should, by default, be local to functions. Now, as it happens,
    this would break one of my functions - which I call "setsort", which sets PROCINFO["sorted_in"] for me. Basically, I can never remember the special names of the internal sorting functions (e.g., @ind_whatever), so I wrote a function setsort() and can now just do: setsort(1) to get the most commonly used sorting functionality. I find it easier to remember the numbers than
    to remember the exact spelling of those names.

    This, in turn, could be fixed if there was a "global" statement that would
    make a selected variable global rather than local (*). This is, in part, inspired by Tcl syntax, where everything is local by default and you have
    to explicitly use "global var" to make "var" global. I've often thought
    that, if it could be done all over again, AWK might be better if it had followed the Tcl model for function variables. Of course, it can't be
    changed now.

    (*) So, my setsort() function, I would write: global PROCINFO
    and that would make changes to PROCINFO visible to the caller.

    (**) Or, you could even pass a value for ORS in as part of the function call. --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/PennJillette
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to alt.comp.lang.awk,comp.lang.awk on Thu May 16 19:40:18 2024
    From Newsgroup: comp.lang.awk

    On 5/16/2024 8:55 AM, Janis Papanagnou wrote:
    On 16.05.2024 15:11, Ed Morton wrote:
    On 5/11/2024 11:57 PM, David Chmelik wrote:
    I'm learning more AWK basics and wrote function to read file, sort,
    print. I use GNU AWK (gawk) and its sort but printing is harder to get
    working than anything... separate lines work, but when I use printf() or >>> set ORS then use print (for words one line) all awk outputs (on FreeBSD
    UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
    before shell prompt)...

    [...]
    ------------------------------------------------------------------------ >>> # print_file_words.awk
    # pass filename to function
    BEGIN { print_file_words("data.txt"); }

    # read two-column array from file and sort lines and print
    function print_file_words(file) {
    # set record separator then use print
    # ORS=" "

    Move the above to a BEGIN section so it is executed once total instead
    of once per input line.

    A function definition called once from the BEGIN section isn't
    called "once per input line".

    I didn't notice the function keyword nestled in the preceding comments
    and didn't give it much thought, thanks for pointing that out.

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to alt.comp.lang.awk,comp.lang.awk on Mon Jul 15 18:10:56 2024
    From Newsgroup: comp.lang.awk

    In article <v1t9hi$u4lh$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    ...
    Improved version:

    { A[$1] = $2 }
    END {
    for (i=1; i<=NR; i++)
    printf("%s%s",A[i],i<NR ? " " : "\n")
    }

    Note that the value of NR in END is sort of a gray area, but it works as >expected in GAWK, which is really all we care about.

    Here's an even tighter version. Saves about 20 bytes of code.
    Yes, I know this code makes a lot of assumptions, but all the assumptions
    are valid in the instant case (and that's all that matters):

    { A[$1] = $2 }
    END {
    for (i=1; i<=NR; i++) $i = A[i]
    print
    }
    --
    Joni Ernst (2014): Obama should be impeached because 2 people have died of Ebola.
    Joni Ernst (2020): Trump is doing great things, because only 65,000 times as many people have died of COVID-19.

    Josef Stalin (1947): When one person dies, it is a tragedy; when a million die, it is merely statistics.
    --- Synchronet 3.20a-Linux NewsLink 1.114