• Nth (Ordinal Numeral Suffix)

    From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 19:30:50 2023
    From Newsgroup: comp.lang.awk

    # tags: nth, ordinal, suffix, digit, numbers, awk, code
    #
    # appends ordinal suffix to space delimited numerals
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/nth.txt
    #
    # usage example: echo 101 42 23 98 foo | awk -f nth.txt
    #
    # output (1 per line): 101st 42nd 23rd 98th foo
    #
    # further reading:
    # https://en.wikipedia.org/wiki/Ordinal_numeral

    function nth(day) {
    if (day ~ /^[0-9]+$/) {
    if (day ~ /^1[1-3]$/ || day > 20) {
    if (day % 10 == 1) return day "st"
    if (day % 10 == 2) return day "nd"
    if (day % 10 == 3) return day "rd"
    }
    return day "th"
    }
    return day
    }

    {
    delete v
    split($0, v)
    for (x in v) print nth(v[x])
    }

    # eof
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 19:57:25 2023
    From Newsgroup: comp.lang.awk

    Mike Sanders <porkchop@invalid.foo> wrote:

    function nth(day) {
    if (day ~ /^[0-9]+$/) {
    if (day ~ /^1[1-3]$/ || day > 20) {
    if (day % 10 == 1) return day "st"
    if (day % 10 == 2) return day "nd"
    if (day % 10 == 3) return day "rd"
    }
    return day "th"
    }
    return day
    }

    On 2nd thought, I think this could be better rendered as:

    # tags: nth, ordinal, suffix, digit, numbers, awk, code
    #
    # appends ordinal suffix to space delimited numerals
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/nth.txt
    #
    # usage example: echo 101 42 23 98 foo | awk -f nth.txt
    #
    # output (1 per line): 101st 42nd 23rd 98th foo
    #
    # further reading:
    # https://en.wikipedia.org/wiki/Ordinal_numeral

    function nth(num) {
    if (num ~ /^[0-9]+$/) {
    if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
    if (num % 10 == 1) return num "st"
    if (num % 10 == 2) return num "nd"
    if (num % 10 == 3) return num "rd"
    }
    return num "th"
    }
    return num
    }

    {
    delete v
    split($0, v)
    for (x in v) print nth(v[x])
    }

    # eof
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Fri Nov 3 20:49:53 2023
    From Newsgroup: comp.lang.awk

    porkchop@invalid.foo (Mike Sanders) writes:

    Mike Sanders <porkchop@invalid.foo> wrote:

    function nth(day) {
    if (day ~ /^[0-9]+$/) {
    if (day ~ /^1[1-3]$/ || day > 20) {
    if (day % 10 == 1) return day "st"
    if (day % 10 == 2) return day "nd"
    if (day % 10 == 3) return day "rd"
    }
    return day "th"
    }
    return day
    }

    On 2nd thought, I think this could be better rendered as:

    That's not really what "better rendered" means. The two bits of code
    are functionally very different.

    # tags: nth, ordinal, suffix, digit, numbers, awk, code
    #
    # appends ordinal suffix to space delimited numerals
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/nth.txt
    #
    # usage example: echo 101 42 23 98 foo | awk -f nth.txt
    #
    # output (1 per line): 101st 42nd 23rd 98th foo
    #
    # further reading:
    # https://en.wikipedia.org/wiki/Ordinal_numeral

    function nth(num) {
    if (num ~ /^[0-9]+$/) {
    if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
    if (num % 10 == 1) return num "st"
    if (num % 10 == 2) return num "nd"
    if (num % 10 == 3) return num "rd"
    }
    return num "th"
    }
    return num
    }

    {
    delete v
    split($0, v)
    for (x in v) print nth(v[x])

    This is a little odd in that the output order will not necessarily match
    the input order. Whilst I understand that this is probably just driver
    code to test the function, it's going to make automatic testing harder.

    Especially as (as you probably know) you can scan the fields in a line,
    in order, like this

    for (i = 1; i <= NF; i++) print nth($i)

    }

    # eof
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 21:47:46 2023
    From Newsgroup: comp.lang.awk

    Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

    Hey Ben =)

    On 2nd thought, I think this could be better rendered as:

    That's not really what "better rendered" means. The two bits of code
    are functionally very different.

    Oh c'mon now you're being fussy on this point & besides for you or me?
    The distinction is important because you're speaking for yourself
    & using that same logic, since I wrote the snippet, I can define my
    own grammar no? Anyone can plainly read the 1st & 2nd versions of the
    script & discern the differences. But 'quibble not'.

    This is a little odd in that the output order will not necessarily match
    the input order. Whilst I understand that this is probably just driver
    code to test the function, it's going to make automatic testing harder.

    Nothing odd about it, I believe several implementations awk using:

    'for (x in array)...'

    say the output in not guaranteed to be in sequential order BUT...

    Aye - I'll concede this point kind sir & update the script accordingly as
    it is more inline with what the user would expect (& less code to boot).

    So script updated as per your suggestion:

    https://busybox.neocities.org/notes/nth.txt

    Good catch Ben & thank you.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Fri Nov 3 22:06:54 2023
    From Newsgroup: comp.lang.awk

    porkchop@invalid.foo (Mike Sanders) writes:

    Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

    Hey Ben =)

    On 2nd thought, I think this could be better rendered as:

    That's not really what "better rendered" means. The two bits of code
    are functionally very different.

    Oh c'mon now you're being fussy on this point & besides for you or me?

    This is a very short function, so maybe a reader will see that the two
    do different things, but in general I would not necessarily take a new
    copy if someone posted a "better rendering" of some code. I would
    expect at most superficial, aesthetic changes.

    I don't want to assume you are a native speaker of English, so it's
    possible that you don't know how minor a change "a better rendering" of something is likely to be.

    And I don't know what you mean by "& besides for you or me?".

    The distinction is important because you're speaking for yourself
    & using that same logic, since I wrote the snippet, I can define my
    own grammar no? Anyone can plainly read the 1st & 2nd versions of the
    script & discern the differences. But 'quibble not'.

    I don't follow this.

    This is a little odd in that the output order will not necessarily match
    the input order. Whilst I understand that this is probably just driver
    code to test the function, it's going to make automatic testing harder.

    Nothing odd about it, I believe several implementations awk using:

    'for (x in array)...'

    say the output in not guaranteed to be in sequential order BUT...

    Aye - I'll concede this point kind sir & update the script accordingly as
    it is more inline with what the user would expect (& less code to boot).

    So script updated as per your suggestion:

    https://busybox.neocities.org/notes/nth.txt

    Good catch Ben & thank you.

    You're welcome.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Nov 3 23:14:57 2023
    From Newsgroup: comp.lang.awk

    On 03.11.2023 20:57, Mike Sanders wrote:
    Mike Sanders <porkchop@invalid.foo> wrote:

    function nth(day) {
    if (day ~ /^[0-9]+$/) {
    if (day ~ /^1[1-3]$/ || day > 20) {
    if (day % 10 == 1) return day "st"
    if (day % 10 == 2) return day "nd"
    if (day % 10 == 3) return day "rd"
    }
    return day "th"
    }
    return day
    }

    On 2nd thought, I think this could be better rendered as:

    [...]

    function nth(num) {
    if (num ~ /^[0-9]+$/) {
    if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
    if (num % 10 == 1) return num "st"
    if (num % 10 == 2) return num "nd"
    if (num % 10 == 3) return num "rd"
    }
    return num "th"
    }
    return num
    }

    [...]

    Hi Mike, I like your second version better since it doesn't _mix_
    arithmetic with pattern comparisons. (Okay, there's still the
    initial pattern, but as a overall test pattern that's fine, IMO.)

    I had written such a function in shell and it was using patterns

    case ${num} in
    (*![0-9]*) x="" ;;
    (*11|*12|*13) x=th ;;
    (*1) x=st ;;
    (*2) x=nd ;;
    (*3) x=rd ;;
    (*) x=th ;;
    esac

    I think (in shell) patterns are better legible. But also the Awk
    transcript with patterns has a good legibility and reflects the
    (literal) definition of the definition (e.g. Wikipedia)

    switch (num) {
    case /[^0-9]/: x="" ; break ;
    case /11$|12$|13$/: x="th" ; break ;
    case /1$/: x="st" ; break ;
    case /2$/: x="nd" ; break ;
    case /3$/: x="rd" ; break ;
    default: x="th" ; break ;
    }

    (I've used GNU Awk's switch, but it can also be written with 'if'.)

    Take care when using anchors; in your first version with /^1[1-3]$/
    you where matching only three numbers. Maybe /1[1-3]$/ was intended?

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Nov 3 23:24:24 2023
    From Newsgroup: comp.lang.awk

    On 03.11.2023 23:14, Janis Papanagnou wrote:
    [...]

    Hi Mike, I like your second version better since it doesn't _mix_
    arithmetic with pattern comparisons. (Okay, there's still the
    initial pattern, but as a overall test pattern that's fine, IMO.)

    Just one additional comment about why I like the pattern approach
    better; three levels of nested 'if' makes legibility unnecessary
    difficult, especially in comparison.

    [...]
    I think (in shell) patterns are better legible. But also the Awk
    transcript with patterns has a good legibility and reflects the
    (literal) definition of the definition (e.g. Wikipedia)

    "(literal) description (e.g. of the Wikipedia definition)."

    (Sorry for my sloppy writing.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bruce Horrocks@07.013@scorecrow.com to comp.lang.awk on Fri Nov 3 23:40:45 2023
    From Newsgroup: comp.lang.awk

    On 03/11/2023 19:57, Mike Sanders wrote:
    if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {

    You could trivially re-write this line as

    if (num % 100 < 11 || num % 100 > 13) {

    to save a comparison but the logic is slightly less clear.

    Even less clear is to re-write as

    if (num % 100 > 13 || num % 100 < 11) {

    to take better advantage of lazy evaluation.
    --
    Bruce Horrocks
    Surrey, England

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:35:30 2023
    From Newsgroup: comp.lang.awk

    Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

    I don't follow this.

    No biggie Ben (it was my lame attempt at being facetious).
    Ultimately the burden of clarity lies squarely on the
    shoulders of the poster, and in this case, that would be me.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:38:00 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    Yes, thinking the same here Janis & even still, the 1st version seemed
    a little off. And the 1st pattern? Prevents 'Footh' (chuckle sounds
    silly to even write much less speak aloud).

    I think (in shell) patterns are better legible. But also the Awk
    transcript with patterns has a good legibility and reflects the
    (literal) definition of the definition (e.g. Wikipedia)

    switch (num) {
    case /[^0-9]/: x="" ; break ;
    case /11$|12$|13$/: x="th" ; break ;
    case /1$/: x="st" ; break ;
    case /2$/: x="nd" ; break ;
    case /3$/: x="rd" ; break ;
    default: x="th" ; break ;
    }

    Sure enough, it is very legible & concise at least to my eyes.

    (I've used GNU Awk's switch, but it can also be written with 'if'.)

    I know, Arnold has done an outstanding job with Gawk, 'case' is very
    practical & function pointers too, those are so nifty!

    Take care when using anchors; in your first version with /^1[1-3]$/
    you where matching only three numbers. Maybe /1[1-3]$/ was intended?

    Yeah, the whole thing was sort of a mess (I'd forgotten I had that script).

    (Sorry for my sloppy writing.)

    Shoot, no worries Janis. My writing is hardly ever error three.

    No wait! I meant 'error free' =)

    You know, where I call home, here in the Prairies of North America,
    our dialect of English is very colloquial (meaning informal, or rustic).
    For instance, if I wanted to ask another if s/he agreed that a fence
    was constructed in a robust & strong way, I might ask:

    Q: She's hell built for stout, yeah?

    A: Sure enough, if ever there was, she is.

    ...so you can see its relative. We at comp.lang.awk can work it out.

    Also, my earnest thanks to all for putting up with my flood of posts. Sometimes, when you have an itch, well you have to scratch, & that's
    where I'm at right now it seems.

    Well folks, I'm off for the weekend. My 5yr old granddaughter is en-route
    even as I write this & she's just beginning to learn to read. And I'll be
    front & center to witness her recite either 'Curious George' or
    'Cat In The Hat'. She's so excited she's beside herself & I want to
    honor her efforts at greater cognition. =)
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:39:32 2023
    From Newsgroup: comp.lang.awk

    Bruce Horrocks <07.013@scorecrow.com> wrote:

    You could trivially re-write this line as

    if (num % 100 < 11 || num % 100 > 13) {

    to save a comparison but the logic is slightly less clear.

    Even less clear is to re-write as

    if (num % 100 > 13 || num % 100 < 11) {

    to take better advantage of lazy evaluation.

    Though the latter edges out the former, I'll take your 1st
    construct Bruce just to keep a little clarity (Lord knows
    I need it, chuckle).

    Script updated & also added contributing author's names:

    https://busybox.neocities.org/notes/nth.txt
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 08:04:30 2023
    From Newsgroup: comp.lang.awk

    On 11/3/2023 2:57 PM, Mike Sanders wrote:
    Mike Sanders <porkchop@invalid.foo> wrote:

    function nth(day) {
    if (day ~ /^[0-9]+$/) {
    if (day ~ /^1[1-3]$/ || day > 20) {
    if (day % 10 == 1) return day "st"
    if (day % 10 == 2) return day "nd"
    if (day % 10 == 3) return day "rd"
    }
    return day "th"
    }
    return day
    }

    On 2nd thought, I think this could be better rendered as:

    # tags: nth, ordinal, suffix, digit, numbers, awk, code
    #
    # appends ordinal suffix to space delimited numerals
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/nth.txt
    #
    # usage example: echo 101 42 23 98 foo | awk -f nth.txt
    #
    # output (1 per line): 101st 42nd 23rd 98th foo
    #
    # further reading:
    # https://en.wikipedia.org/wiki/Ordinal_numeral

    function nth(num) {
    if (num ~ /^[0-9]+$/) {
    if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
    if (num % 10 == 1) return num "st"
    if (num % 10 == 2) return num "nd"
    if (num % 10 == 3) return num "rd"
    }
    return num "th"
    }
    return num
    }

    {
    delete v

    `split($0,v)` will delete v before repopulating it, no need to do it explicitly before calling `split()` plus that would make your code non-portable as `delete array` isn't defined by POSIX (yet).

    split($0, v)
    for (x in v) print nth(v[x])

    The would print the output in a "random" order, do `for (x=1; x in v;
    x++)` instead to get the same output order as the input order.

    You don't need split() and an array at all, though, all you need is `for
    (x=1; x<=NF; x++) print nth($x)`.

    }


    Consider doing this instead (untested) to address the above points and
    for improved efficiency:

    BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    {
    for (x=1; x<=NF; x++) print nth($x)
    }

    Regards,

    Ed.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 08:13:46 2023
    From Newsgroup: comp.lang.awk

    On 11/5/2023 8:04 AM, Ed Morton wrote:
    On 11/3/2023 2:57 PM, Mike Sanders wrote:
    Mike Sanders <porkchop@invalid.foo> wrote:

    function nth(day) {
       if (day ~ /^[0-9]+$/) {
         if (day ~ /^1[1-3]$/ || day > 20) {
           if (day % 10 == 1) return day "st"
           if (day % 10 == 2) return day "nd"
           if (day % 10 == 3) return day "rd"
         }
           return day "th"
       }
       return day
    }

    On 2nd thought, I think this could be better rendered as:

    # tags: nth, ordinal, suffix, digit, numbers, awk, code
    #
    # appends ordinal suffix to space delimited numerals
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/nth.txt
    #
    # usage example: echo 101 42 23 98 foo | awk -f nth.txt
    #
    # output (1 per line): 101st 42nd 23rd 98th foo
    #
    # further reading:
    # https://en.wikipedia.org/wiki/Ordinal_numeral

    function nth(num) {
       if (num ~ /^[0-9]+$/) {
         if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
           if (num % 10 == 1) return num "st"
           if (num % 10 == 2) return num "nd"
           if (num % 10 == 3) return num "rd"
         }
         return num "th"
       }
       return num
    }

    {
       delete v

    `split($0,v)` will delete v before repopulating it, no need to do it explicitly before calling `split()` plus that would make your code non-portable as `delete array` isn't defined by POSIX (yet).

       split($0, v)
       for (x in v) print nth(v[x])

    The would print the output in a "random" order, do `for (x=1; x in v;
    x++)` instead to get the same output order as the input order.

    You don't need split() and an array at all, though, all you need is `for (x=1; x<=NF; x++) print nth($x)`.

    }


    Consider doing this instead (untested) to address the above points and
    for improved efficiency:

    BEGIN {
        huns[11]; huns[12]; huns[13]
        split("st nd rd th th th th th th",tens)
        tens[0]="th"
    }

    function nth(num,       sfx) {
       if (num ~ /^[0-9]+$/) {
          if ( !((num % 100) in huns) ) {
             sfx = tens[num % 10]
          }
       }
       return num sfx
    }

    {
       for (x=1; x<=NF; x++) print nth($x)
    }

    Regards,

        Ed.



    or if you don't want to use a BEGIN section for some reason then remove
    it and change `nth()` to this which is very, very slightly less
    efficient than the above:

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !(1 in tens) ) {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    You may want to come up with some naming convention for huns[] and
    tens[] to make it clear they're global and avoid clashing with anything
    else of the same name anywhere else in the script such as prefixing them
    with the name of the function that uses them, "Nth_huns", or some common indicator you use for all global variables, e.g. "G_huns" or whatever
    else makes sense to you.

    Regards,

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 17:21:52 2023
    From Newsgroup: comp.lang.awk

    Hi Ed!

    On 05.11.2023 15:13, Ed Morton wrote:

    or if you don't want to use a BEGIN section for some reason then remove
    it and change `nth()` to this which is very, very slightly less
    efficient than the above:

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !(1 in tens) ) {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    I don't see where the advantage here is. It is (IMO) unnecessary complex
    (many 'if' control constructs, incomplete branches, undefined variables)
    for such a simple task and also harder to understand (or analyze in case
    of errors[*]).

    Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

    Janis

    [...]

    [*] The code does not produce correct results as presented. If corrected
    it would probably get even (at least a bit) more complex, I suppose.

    [**] In case that would have been the reason for this implementation.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 18:01:41 2023
    From Newsgroup: comp.lang.awk

    On 05.11.2023 17:21, Janis Papanagnou wrote:
    Hi Ed!

    On 05.11.2023 15:13, Ed Morton wrote:

    or if you don't want to use a BEGIN section for some reason then remove
    it and change `nth()` to this which is very, very slightly less
    efficient than the above:

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !(1 in tens) ) {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
    for such a simple task and also harder to understand (or analyze in case
    of errors[*]).

    Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

    Being curious I've compared timing of above [not corrected] function
    with the simpler and clearer pattern matching based algorithm

    function nth (num)
    {
    if (num ~ /[^0-9]/) return num;
    else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
    else if (num ~ /1$/) return num "st";
    else if (num ~ /2$/) return num "nd";
    else if (num ~ /3$/) return num "rd";
    else return num "th";
    }

    For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
    (Tested with GNU Awk 4.2.0)


    Janis

    [...]

    [*] The code does not produce correct results as presented. If corrected
    it would probably get even (at least a bit) more complex, I suppose.

    [**] In case that would have been the reason for this implementation.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:14:53 2023
    From Newsgroup: comp.lang.awk

    On 11/5/2023 10:21 AM, Janis Papanagnou wrote:
    Hi Ed!

    On 05.11.2023 15:13, Ed Morton wrote:

    or if you don't want to use a BEGIN section for some reason then remove
    it and change `nth()` to this which is very, very slightly less
    efficient than the above:

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !(1 in tens) ) {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
    for such a simple task and also harder to understand (or analyze in case
    of errors[*]).
    Not sure where you're seeing any of those things. There are fewer "if"s
    than were in the OPs code, if by "incomplete branches" you mean "if"
    without an "else" there's nothing wrong with that and the OPs c9ode had
    more of them, no undefined variables and IMO it's much simpler than the original code. And that code above was just for "if you don't want to
    use a BEGIN section for some reason" while the version I'd use is what I originally posted:

    BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
    }

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    if ( !((num % 100) in huns) ) {
    sfx = tens[num % 10]
    }
    }
    return num sfx
    }

    which is simpler and faster again.


    Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".
    If the OP has a large input file and wants to add "th" or "nd" to the
    end of numbers on each line then "nth()" is probably the only part of it
    that IS time-critical.


    Janis

    [...]

    [*] The code does not produce correct results as presented. If corrected
    it would probably get even (at least a bit) more complex, I suppose.
    All I was trying to do was show an alternative implementation of the OPs
    code, not solve the problem the OP was trying to solve, and all I did to
    test it was check it produced the same output as the OPs script for the
    sample input they provided, which it does:

    OPs code:

    $ echo 101 42 23 98 foo | awk -f nth.txt
    101st
    42nd
    23rd
    98th
    foo

    My code:

    $ echo 101 42 23 98 foo | awk -f nth.awk
    101st
    42nd
    23rd
    98th
    foo

    So, could you elaborate and provide an example where my code fails and
    the OPs succeeds?


    [**] In case that would have been the reason for this implementation.

    The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:17:26 2023
    From Newsgroup: comp.lang.awk

    On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
    <snip>
    Being curious I've compared timing of above [not corrected] function
    with the simpler and clearer pattern matching based algorithm

    function nth (num)
    {
    if (num ~ /[^0-9]/) return num;
    else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
    else if (num ~ /1$/) return num "st";
    else if (num ~ /2$/) return num "nd";
    else if (num ~ /3$/) return num "rd";
    else return num "th";
    }

    For _10 million_ function calls the difference is ~2s (~15s vs. ~17s). (Tested with GNU Awk 4.2.0)

    Did you also test it with the OPs code that I was showing an alternative implementation of or just with the above code which is yet another
    alternative implementation? If so, what was the result of that run?

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:40:20 2023
    From Newsgroup: comp.lang.awk

    On 11/5/2023 12:14 PM, Ed Morton wrote:
    On 11/5/2023 10:21 AM, Janis Papanagnou wrote:
    Hi Ed!

    On 05.11.2023 15:13, Ed Morton wrote:

    or if you don't want to use a BEGIN section for some reason then remove
    it and change `nth()` to this which is very, very slightly less
    efficient than the above:

    function nth(num,       sfx) {
         if (num ~ /^[0-9]+$/) {
            if ( !(1 in tens) ) {
               huns[11]; huns[12]; huns[13]
               split("st nd rd th th th th th th",tens)
               tens[0]="th"
            }
            if ( !((num % 100) in huns) ) {
               sfx = tens[num % 10]
            }
         }
         return num sfx
    }

    I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables) for such a simple task and also harder to understand (or analyze in case of errors[*]).
    Not sure where you're seeing any of those things. There are fewer "if"s
    than were in the OPs code, if by "incomplete branches" you mean "if"
    without an "else" there's nothing wrong with that and the OPs c9ode had
    more of them, no undefined variables and IMO it's much simpler than the original code. And that code above was just for "if you don't want to
    use a BEGIN section for some reason" while the version I'd use is what I originally posted:

    BEGIN {
        huns[11]; huns[12]; huns[13]
        split("st nd rd th th th th th th",tens)
        tens[0]="th"
    }

    function nth(num,       sfx) {
       if (num ~ /^[0-9]+$/) {
          if ( !((num % 100) in huns) ) {
             sfx = tens[num % 10]
          }
       }
       return num sfx
    }

    which is simpler and faster again.


    Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".
    If the OP has a large input file and wants to add "th" or "nd" to the
    end of numbers on each line then "nth()" is probably the only part of it that IS time-critical.


    Janis

    [...]

    [*] The code does not produce correct results as presented. If corrected it would probably get even (at least a bit) more complex, I suppose.
    All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to test it was check it produced the same output as the OPs script for the sample input they provided, which it does:

    OPs code:

    $ echo 101 42 23 98 foo | awk -f nth.txt
    101st
    42nd
    23rd
    98th
    foo

    My code:

    $ echo 101 42 23 98 foo | awk -f nth.awk
    101st
    42nd
    23rd
    98th
    foo

    So, could you elaborate and provide an example where my code fails and
    the OPs succeeds?

    Never mind, I see it - I wasn't assigning sfx for some numbers, fixed by changing "nth()" to:

    function nth(num, sfx) {
    if (num ~ /^[0-9]+$/) {
    sfx = ( (num % 100) in huns ? "th" : tens[num % 10] )
    }
    return num sfx
    }

    Thanks for the heads up.

    Ed.


    [**] In case that would have been the reason for this implementation.

    The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

        Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 19:48:47 2023
    From Newsgroup: comp.lang.awk

    On 05.11.2023 19:14, Ed Morton wrote:

    Simple pattern matches would be straightforward for such a primitive and
    certainly not time-critical[**] function like "nth()".
    If the OP has a large input file and wants to add "th" or "nd" to the
    end of numbers on each line then "nth()" is probably the only part of it
    that IS time-critical.

    Sorry, no. - The sample sizes I used are hilariously large.

    All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to
    test it was check it produced the same output as the OPs script for the sample input they provided, which it does:

    OPs code:

    $ echo 101 42 23 98 foo | awk -f nth.txt
    101st
    42nd
    23rd
    98th
    foo

    My code:

    $ echo 101 42 23 98 foo | awk -f nth.awk
    101st
    42nd
    23rd
    98th
    foo

    So, could you elaborate and provide an example where my code fails and
    the OPs succeeds?

    I've just checked the output of your code (not the OP's), and got

    1st
    2nd
    3rd
    4th
    5th
    6th
    7th
    8th
    9th
    10th
    11
    12
    13
    14th
    15th
    16th
    17th
    18th
    19th
    20th
    ...

    My intention was *not* to understand where the coding problem was,
    neither the original code nor the (derived?) variant.



    [**] In case that would have been the reason for this implementation.

    The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

    Maybe. - Even though performance is actually no real issue, I've
    tested a couple variants (with even larger data sets: 50 millions).

    InitPre/LoopFunc: 0 11 (to not count invariants)

    if/else-if: 71
    if: 79
    switch: 76
    arithm/lookup: 68
    precomp/lookup: 66

    Taking the lookup approach even further with a precalculated array
    of the first 100 numbers, the code gets yet _simpler_ (and faster)

    function nth_pre (num)
    {
    if (num ~ /[^0-9]/) return num
    return num e[num%100]
    }

    Not that these variants would matter WRT performance (pattern: 71s,
    your variant: 68s, precalculated array of 100 significant numbers:
    66s) is negligible. But code should be readable (if possible), IMO.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 20:04:56 2023
    From Newsgroup: comp.lang.awk

    On 05.11.2023 19:17, Ed Morton wrote:
    On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
    <snip>
    Being curious I've compared timing of above [not corrected] function
    with the simpler and clearer pattern matching based algorithm

    function nth (num)
    {
    if (num ~ /[^0-9]/) return num;
    else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
    else if (num ~ /1$/) return num "st";
    else if (num ~ /2$/) return num "nd";
    else if (num ~ /3$/) return num "rd";
    else return num "th";
    }

    For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
    (Tested with GNU Awk 4.2.0)

    Did you also test it with the OPs code that I was showing an alternative implementation of or just with the above code which is yet another alternative implementation? If so, what was the result of that run?

    Sorry, I was not interested in the OP's code. Since I had implemented
    a shell version some years ago that was very readable code as opposed
    to the OP's version (or your variant), that could also be implemented
    in a better legible (and less complex) form in Awk, I abstained from
    testing other's codes; this is something the authors should do.

    I obviously missed that your variant was just intended as an optimized
    version of the OP's approach, so don't take my criticism too serious.

    Fast pre-calculated solutions can also be legible. Taking the idea of
    your variant further can simplify it even, e.g.

    function nth_pre (num)
    {
    if (num ~ /[^0-9]/) return num
    return num e[num%100]
    }

    Building that array e[] should be explained, though, but that can be
    easily done (IMO), e.g..

    function init_e ()
    {
    for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
    e[i] = "th"
    for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
    e[i++] = "st"
    e[i++] = "nd"
    e[i++] = "rd"
    }
    e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
    }

    (something like that).

    Janis


    Ed.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Mon Nov 6 06:54:21 2023
    From Newsgroup: comp.lang.awk

    On 11/5/2023 1:04 PM, Janis Papanagnou wrote:
    On 05.11.2023 19:17, Ed Morton wrote:
    On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
    <snip>
    Being curious I've compared timing of above [not corrected] function
    with the simpler and clearer pattern matching based algorithm

    function nth (num)
    {
    if (num ~ /[^0-9]/) return num;
    else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
    else if (num ~ /1$/) return num "st";
    else if (num ~ /2$/) return num "nd";
    else if (num ~ /3$/) return num "rd";
    else return num "th";
    }

    For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
    (Tested with GNU Awk 4.2.0)

    Did you also test it with the OPs code that I was showing an alternative
    implementation of or just with the above code which is yet another
    alternative implementation? If so, what was the result of that run?

    Sorry, I was not interested in the OP's code. Since I had implemented
    a shell version some years ago that was very readable code as opposed
    to the OP's version (or your variant), that could also be implemented
    in a better legible (and less complex) form in Awk, I abstained from
    testing other's codes; this is something the authors should do.

    I obviously missed that your variant was just intended as an optimized version of the OP's approach, so don't take my criticism too serious.

    Fast pre-calculated solutions can also be legible.
    Apparently we just have different ideas of legible - to me a hash lookup
    is the clear and obvious way to implement this rather than a bunch of
    if/else regexp comparisons.
    Taking the idea of your variant further can simplify it even, e.g.

    function nth_pre (num)
    {
    if (num ~ /[^0-9]/) return num
    return num e[num%100]
    }

    Building that array e[] should be explained, though, but that can be
    easily done (IMO), e.g..

    function init_e ()
    {
    for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
    e[i] = "th"
    for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
    e[i++] = "st"
    e[i++] = "nd"
    e[i++] = "rd"
    }
    e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
    }

    (something like that).

    Janis


    That's a very good idea. I'd use this:

    function nth_pre (num)
    {
    return num (num ~ /[^0-9]/ ? "" : e[num%100])
    }

    to squeeze out the last bit of redundancy but that's nit-picking.

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Tue Nov 7 14:17:59 2023
    From Newsgroup: comp.lang.awk

    On 06.11.2023 13:54, Ed Morton wrote:
    On 11/5/2023 1:04 PM, Janis Papanagnou wrote:

    Fast pre-calculated solutions can also be legible.

    Apparently we just have different ideas of legible

    (This makes no sense; given what I said here and what you say below.)

    The advantage of the pattern approach is, though, that it matches
    exactly the specification/definition[*], as the cases are typically
    explained. - But I think it's boring to talk on that "ideas" level.

    - to me a hash lookup
    is the clear and obvious way to implement this rather than a bunch of
    if/else regexp comparisons.

    Taking the idea of your variant further can simplify it even, e.g.

    function nth_pre (num)
    {
    if (num ~ /[^0-9]/) return num
    return num e[num%100]
    }
    [...]

    That's a very good idea. [...]

    Yes, it's simple and legible. - No unnecessary 'if' cases and no hash
    arrays ("huns" and "tens") that introduce unnecessary complexity
    where you need only a single and clear mapping of the relevant digits.

    Janis

    [*] See for example https://en.wikipedia.org/wiki/Ordinal_suffix

    --- Synchronet 3.20a-Linux NewsLink 1.114