Forum: War Ensemble BBS

Nth (Ordinal Numeral Suffix)

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 19:30:50 2023

From Newsgroup: comp.lang.awk

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

{
delete v
split($0, v)
for (x in v) print nth(v[x])
}

# eof
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 19:57:25 2023

From Newsgroup: comp.lang.awk

Mike Sanders <porkchop@invalid.foo> wrote:

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

On 2nd thought, I think this could be better rendered as:

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}

{
delete v
split($0, v)
for (x in v) print nth(v[x])
}

# eof
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Fri Nov 3 20:49:53 2023

From Newsgroup: comp.lang.awk

porkchop@invalid.foo (Mike Sanders) writes:

Mike Sanders <porkchop@invalid.foo> wrote:

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

On 2nd thought, I think this could be better rendered as:

That's not really what "better rendered" means. The two bits of code
are functionally very different.

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}

{
delete v
split($0, v)
for (x in v) print nth(v[x])

This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.

Especially as (as you probably know) you can scan the fields in a line,
in order, like this

for (i = 1; i <= NF; i++) print nth($i)

}

# eof

--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Fri Nov 3 21:47:46 2023

From Newsgroup: comp.lang.awk

Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

Hey Ben =)

On 2nd thought, I think this could be better rendered as:

That's not really what "better rendered" means. The two bits of code
are functionally very different.

Oh c'mon now you're being fussy on this point & besides for you or me?
The distinction is important because you're speaking for yourself
& using that same logic, since I wrote the snippet, I can define my
own grammar no? Anyone can plainly read the 1st & 2nd versions of the
script & discern the differences. But 'quibble not'.

This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.

Nothing odd about it, I believe several implementations awk using:

'for (x in array)...'

say the output in not guaranteed to be in sequential order BUT...

Aye - I'll concede this point kind sir & update the script accordingly as
it is more inline with what the user would expect (& less code to boot).

So script updated as per your suggestion:

https://busybox.neocities.org/notes/nth.txt

Good catch Ben & thank you.
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Fri Nov 3 22:06:54 2023

From Newsgroup: comp.lang.awk

porkchop@invalid.foo (Mike Sanders) writes:

Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

Hey Ben =)

On 2nd thought, I think this could be better rendered as:

That's not really what "better rendered" means. The two bits of code
are functionally very different.

Oh c'mon now you're being fussy on this point & besides for you or me?

This is a very short function, so maybe a reader will see that the two
do different things, but in general I would not necessarily take a new
copy if someone posted a "better rendering" of some code. I would
expect at most superficial, aesthetic changes.

I don't want to assume you are a native speaker of English, so it's
possible that you don't know how minor a change "a better rendering" of something is likely to be.

And I don't know what you mean by "& besides for you or me?".

The distinction is important because you're speaking for yourself
& using that same logic, since I wrote the snippet, I can define my
own grammar no? Anyone can plainly read the 1st & 2nd versions of the
script & discern the differences. But 'quibble not'.

I don't follow this.

This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.

Nothing odd about it, I believe several implementations awk using:

'for (x in array)...'

say the output in not guaranteed to be in sequential order BUT...

Aye - I'll concede this point kind sir & update the script accordingly as
it is more inline with what the user would expect (& less code to boot).

So script updated as per your suggestion:

https://busybox.neocities.org/notes/nth.txt

Good catch Ben & thank you.

You're welcome.
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Nov 3 23:14:57 2023

From Newsgroup: comp.lang.awk

On 03.11.2023 20:57, Mike Sanders wrote:

Mike Sanders <porkchop@invalid.foo> wrote:

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

On 2nd thought, I think this could be better rendered as:

[...]

function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}

[...]

Hi Mike, I like your second version better since it doesn't _mix_
arithmetic with pattern comparisons. (Okay, there's still the
initial pattern, but as a overall test pattern that's fine, IMO.)

I had written such a function in shell and it was using patterns

case ${num} in
(*![0-9]*) x="" ;;
(*11|*12|*13) x=th ;;
(*1) x=st ;;
(*2) x=nd ;;
(*3) x=rd ;;
(*) x=th ;;
esac

I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)

switch (num) {
case /[^0-9]/: x="" ; break ;
case /11$|12$|13$/: x="th" ; break ;
case /1$/: x="st" ; break ;
case /2$/: x="nd" ; break ;
case /3$/: x="rd" ; break ;
default: x="th" ; break ;
}

(I've used GNU Awk's switch, but it can also be written with 'if'.)

Take care when using anchors; in your first version with /^1[1-3]$/
you where matching only three numbers. Maybe /1[1-3]$/ was intended?

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Nov 3 23:24:24 2023

From Newsgroup: comp.lang.awk

On 03.11.2023 23:14, Janis Papanagnou wrote:

[...]

Hi Mike, I like your second version better since it doesn't _mix_
arithmetic with pattern comparisons. (Okay, there's still the
initial pattern, but as a overall test pattern that's fine, IMO.)

Just one additional comment about why I like the pattern approach
better; three levels of nested 'if' makes legibility unnecessary
difficult, especially in comparison.

[...]
I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)

"(literal) description (e.g. of the Wikipedia definition)."

(Sorry for my sloppy writing.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bruce Horrocks@07.013@scorecrow.com to comp.lang.awk on Fri Nov 3 23:40:45 2023

From Newsgroup: comp.lang.awk

On 03/11/2023 19:57, Mike Sanders wrote:

if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {

You could trivially re-write this line as

if (num % 100 < 11 || num % 100 > 13) {

to save a comparison but the logic is slightly less clear.

Even less clear is to re-write as

if (num % 100 > 13 || num % 100 < 11) {

to take better advantage of lazy evaluation.
--
Bruce Horrocks
Surrey, England

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:35:30 2023

From Newsgroup: comp.lang.awk

Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

I don't follow this.

No biggie Ben (it was my lame attempt at being facetious).
Ultimately the burden of clarity lies squarely on the
shoulders of the poster, and in this case, that would be me.
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:38:00 2023

From Newsgroup: comp.lang.awk

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

Yes, thinking the same here Janis & even still, the 1st version seemed
a little off. And the 1st pattern? Prevents 'Footh' (chuckle sounds
silly to even write much less speak aloud).

I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)

switch (num) {
case /[^0-9]/: x="" ; break ;
case /11$|12$|13$/: x="th" ; break ;
case /1$/: x="st" ; break ;
case /2$/: x="nd" ; break ;
case /3$/: x="rd" ; break ;
default: x="th" ; break ;
}

Sure enough, it is very legible & concise at least to my eyes.

(I've used GNU Awk's switch, but it can also be written with 'if'.)

I know, Arnold has done an outstanding job with Gawk, 'case' is very
practical & function pointers too, those are so nifty!

Take care when using anchors; in your first version with /^1[1-3]$/
you where matching only three numbers. Maybe /1[1-3]$/ was intended?

Yeah, the whole thing was sort of a mess (I'd forgotten I had that script).

(Sorry for my sloppy writing.)

Shoot, no worries Janis. My writing is hardly ever error three.

No wait! I meant 'error free' =)

You know, where I call home, here in the Prairies of North America,
our dialect of English is very colloquial (meaning informal, or rustic).
For instance, if I wanted to ask another if s/he agreed that a fence
was constructed in a robust & strong way, I might ask:

Q: She's hell built for stout, yeah?

A: Sure enough, if ever there was, she is.

...so you can see its relative. We at comp.lang.awk can work it out.

Also, my earnest thanks to all for putting up with my flood of posts. Sometimes, when you have an itch, well you have to scratch, & that's
where I'm at right now it seems.

Well folks, I'm off for the weekend. My 5yr old granddaughter is en-route
even as I write this & she's just beginning to learn to read. And I'll be
front & center to witness her recite either 'Curious George' or
'Cat In The Hat'. She's so excited she's beside herself & I want to
honor her efforts at greater cognition. =)
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Nov 4 07:39:32 2023

From Newsgroup: comp.lang.awk

Bruce Horrocks <07.013@scorecrow.com> wrote:

You could trivially re-write this line as

if (num % 100 < 11 || num % 100 > 13) {

to save a comparison but the logic is slightly less clear.

Even less clear is to re-write as

if (num % 100 > 13 || num % 100 < 11) {

to take better advantage of lazy evaluation.

Though the latter edges out the former, I'll take your 1st
construct Bruce just to keep a little clarity (Lord knows
I need it, chuckle).

Script updated & also added contributing author's names:

https://busybox.neocities.org/notes/nth.txt
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 08:04:30 2023

From Newsgroup: comp.lang.awk

On 11/3/2023 2:57 PM, Mike Sanders wrote:

Mike Sanders <porkchop@invalid.foo> wrote:

function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}

On 2nd thought, I think this could be better rendered as:

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}

{
delete v

`split($0,v)` will delete v before repopulating it, no need to do it explicitly before calling `split()` plus that would make your code non-portable as `delete array` isn't defined by POSIX (yet).

split($0, v)
for (x in v) print nth(v[x])

The would print the output in a "random" order, do `for (x=1; x in v;
x++)` instead to get the same output order as the input order.

You don't need split() and an array at all, though, all you need is `for
(x=1; x<=NF; x++) print nth($x)`.

}

Consider doing this instead (untested) to address the above points and
for improved efficiency:

BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

{
for (x=1; x<=NF; x++) print nth($x)
}

Regards,

Ed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 08:13:46 2023

From Newsgroup: comp.lang.awk

On 11/5/2023 8:04 AM, Ed Morton wrote:

On 11/3/2023 2:57 PM, Mike Sanders wrote:

Mike Sanders <porkchop@invalid.foo> wrote:

function nth(day) {
   if (day ~ /^[0-9]+$/) {
     if (day ~ /^1[1-3]$/ || day > 20) {
       if (day % 10 == 1) return day "st"
       if (day % 10 == 2) return day "nd"
       if (day % 10 == 3) return day "rd"
     }
       return day "th"
   }
   return day
}

On 2nd thought, I think this could be better rendered as:

# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral

function nth(num) {
   if (num ~ /^[0-9]+$/) {
     if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
       if (num % 10 == 1) return num "st"
       if (num % 10 == 2) return num "nd"
       if (num % 10 == 3) return num "rd"
     }
     return num "th"
   }
   return num
}

{
   delete v

`split($0,v)` will delete v before repopulating it, no need to do it explicitly before calling `split()` plus that would make your code non-portable as `delete array` isn't defined by POSIX (yet).

   split($0, v)
   for (x in v) print nth(v[x])

The would print the output in a "random" order, do `for (x=1; x in v;
x++)` instead to get the same output order as the input order.

You don't need split() and an array at all, though, all you need is `for (x=1; x<=NF; x++) print nth($x)`.

}

Consider doing this instead (untested) to address the above points and
for improved efficiency:

BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
}

function nth(num,       sfx) {
   if (num ~ /^[0-9]+$/) {
      if ( !((num % 100) in huns) ) {
         sfx = tens[num % 10]
      }
   }
   return num sfx
}

{
   for (x=1; x<=NF; x++) print nth($x)
}

Regards,

    Ed.

or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

You may want to come up with some naming convention for huns[] and
tens[] to make it clear they're global and avoid clashing with anything
else of the same name anywhere else in the script such as prefixing them
with the name of the function that uses them, "Nth_huns", or some common indicator you use for all global variables, e.g. "G_huns" or whatever
else makes sense to you.

Regards,

Ed.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 17:21:52 2023

From Newsgroup: comp.lang.awk

Hi Ed!

On 05.11.2023 15:13, Ed Morton wrote:

or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

I don't see where the advantage here is. It is (IMO) unnecessary complex
(many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).

Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

Janis

[...]

[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.

[**] In case that would have been the reason for this implementation.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 18:01:41 2023

From Newsgroup: comp.lang.awk

On 05.11.2023 17:21, Janis Papanagnou wrote:

Hi Ed!

On 05.11.2023 15:13, Ed Morton wrote:

or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).

Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm

function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}

For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)

Janis

[...]

[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.

[**] In case that would have been the reason for this implementation.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:14:53 2023

From Newsgroup: comp.lang.awk

On 11/5/2023 10:21 AM, Janis Papanagnou wrote:

Hi Ed!

On 05.11.2023 15:13, Ed Morton wrote:

or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).

Not sure where you're seeing any of those things. There are fewer "if"s
than were in the OPs code, if by "incomplete branches" you mean "if"
without an "else" there's nothing wrong with that and the OPs c9ode had
more of them, no undefined variables and IMO it's much simpler than the original code. And that code above was just for "if you don't want to
use a BEGIN section for some reason" while the version I'd use is what I originally posted:

BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}

which is simpler and faster again.

Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.

Janis

[...]

[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.

All I was trying to do was show an alternative implementation of the OPs
code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the
sample input they provided, which it does:

OPs code:

$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo

My code:

$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo

So, could you elaborate and provide an example where my code fails and
the OPs succeeds?

[**] In case that would have been the reason for this implementation.

The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

Ed.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:17:26 2023

From Newsgroup: comp.lang.awk

On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>

Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm

function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}

For _10 million_ function calls the difference is ~2s (~15s vs. ~17s). (Tested with GNU Awk 4.2.0)

Did you also test it with the OPs code that I was showing an alternative implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?

Ed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 12:40:20 2023

From Newsgroup: comp.lang.awk

On 11/5/2023 12:14 PM, Ed Morton wrote:

On 11/5/2023 10:21 AM, Janis Papanagnou wrote:

Hi Ed!

On 05.11.2023 15:13, Ed Morton wrote:

or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:

function nth(num,       sfx) {
     if (num ~ /^[0-9]+$/) {
        if ( !(1 in tens) ) {
           huns[11]; huns[12]; huns[13]
           split("st nd rd th th th th th th",tens)
           tens[0]="th"
        }
        if ( !((num % 100) in huns) ) {
           sfx = tens[num % 10]
        }
     }
     return num sfx
}

I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables) for such a simple task and also harder to understand (or analyze in case of errors[*]).

Not sure where you're seeing any of those things. There are fewer "if"s
than were in the OPs code, if by "incomplete branches" you mean "if"
without an "else" there's nothing wrong with that and the OPs c9ode had
more of them, no undefined variables and IMO it's much simpler than the original code. And that code above was just for "if you don't want to
use a BEGIN section for some reason" while the version I'd use is what I originally posted:

BEGIN {
    huns[11]; huns[12]; huns[13]
    split("st nd rd th th th th th th",tens)
    tens[0]="th"
}

function nth(num,       sfx) {
   if (num ~ /^[0-9]+$/) {
      if ( !((num % 100) in huns) ) {
         sfx = tens[num % 10]
      }
   }
   return num sfx
}

which is simpler and faster again.

Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".

If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it that IS time-critical.

Janis

[...]

[*] The code does not produce correct results as presented. If corrected it would probably get even (at least a bit) more complex, I suppose.

All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to test it was check it produced the same output as the OPs script for the sample input they provided, which it does:

OPs code:

$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo

My code:

$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo

So, could you elaborate and provide an example where my code fails and
the OPs succeeds?

Never mind, I see it - I wasn't assigning sfx for some numbers, fixed by changing "nth()" to:

function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
sfx = ( (num % 100) in huns ? "th" : tens[num % 10] )
}
return num sfx
}

Thanks for the heads up.

Ed.

[**] In case that would have been the reason for this implementation.

The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

    Ed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 19:48:47 2023

From Newsgroup: comp.lang.awk

On 05.11.2023 19:14, Ed Morton wrote:

Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".

If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.

Sorry, no. - The sample sizes I used are hilariously large.

All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the sample input they provided, which it does:

OPs code:

$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo

My code:

$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo

So, could you elaborate and provide an example where my code fails and
the OPs succeeds?

I've just checked the output of your code (not the OP's), and got

1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11
12
13
14th
15th
16th
17th
18th
19th
20th
...

My intention was *not* to understand where the coding problem was,
neither the original code nor the (derived?) variant.

[**] In case that would have been the reason for this implementation.

The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.

Maybe. - Even though performance is actually no real issue, I've
tested a couple variants (with even larger data sets: 50 millions).

InitPre/LoopFunc: 0 11 (to not count invariants)

if/else-if: 71
if: 79
switch: 76
arithm/lookup: 68
precomp/lookup: 66

Taking the lookup approach even further with a precalculated array
of the first 100 numbers, the code gets yet _simpler_ (and faster)

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}

Not that these variants would matter WRT performance (pattern: 71s,
your variant: 68s, precalculated array of 100 significant numbers:
66s) is negligible. But code should be readable (if possible), IMO.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Nov 5 20:04:56 2023

From Newsgroup: comp.lang.awk

On 05.11.2023 19:17, Ed Morton wrote:

On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>

Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm

function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}

For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)

Did you also test it with the OPs code that I was showing an alternative implementation of or just with the above code which is yet another alternative implementation? If so, what was the result of that run?

Sorry, I was not interested in the OP's code. Since I had implemented
a shell version some years ago that was very readable code as opposed
to the OP's version (or your variant), that could also be implemented
in a better legible (and less complex) form in Awk, I abstained from
testing other's codes; this is something the authors should do.

I obviously missed that your variant was just intended as an optimized
version of the OP's approach, so don't take my criticism too serious.

Fast pre-calculated solutions can also be legible. Taking the idea of
your variant further can simplify it even, e.g.

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}

Building that array e[] should be explained, though, but that can be
easily done (IMO), e.g..

function init_e ()
{
for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
e[i] = "th"
for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
e[i++] = "st"
e[i++] = "nd"
e[i++] = "rd"
}
e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
}

(something like that).

Janis

Ed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Mon Nov 6 06:54:21 2023

From Newsgroup: comp.lang.awk

On 11/5/2023 1:04 PM, Janis Papanagnou wrote:

On 05.11.2023 19:17, Ed Morton wrote:

On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>

Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm

function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}

For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)

Did you also test it with the OPs code that I was showing an alternative
implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?

Sorry, I was not interested in the OP's code. Since I had implemented
a shell version some years ago that was very readable code as opposed
to the OP's version (or your variant), that could also be implemented
in a better legible (and less complex) form in Awk, I abstained from
testing other's codes; this is something the authors should do.

I obviously missed that your variant was just intended as an optimized version of the OP's approach, so don't take my criticism too serious.

Fast pre-calculated solutions can also be legible.

Apparently we just have different ideas of legible - to me a hash lookup
is the clear and obvious way to implement this rather than a bunch of
if/else regexp comparisons.

Taking the idea of your variant further can simplify it even, e.g.

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}

Building that array e[] should be explained, though, but that can be
easily done (IMO), e.g..

function init_e ()
{
for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
e[i] = "th"
for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
e[i++] = "st"
e[i++] = "nd"
e[i++] = "rd"
}
e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
}

(something like that).

Janis

That's a very good idea. I'd use this:

function nth_pre (num)
{
return num (num ~ /[^0-9]/ ? "" : e[num%100])
}

to squeeze out the last bit of redundancy but that's nit-picking.

Ed.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Tue Nov 7 14:17:59 2023

From Newsgroup: comp.lang.awk

On 06.11.2023 13:54, Ed Morton wrote:

On 11/5/2023 1:04 PM, Janis Papanagnou wrote:

Fast pre-calculated solutions can also be legible.

Apparently we just have different ideas of legible

(This makes no sense; given what I said here and what you say below.)

The advantage of the pattern approach is, though, that it matches
exactly the specification/definition[*], as the cases are typically
explained. - But I think it's boring to talk on that "ideas" level.

- to me a hash lookup
is the clear and obvious way to implement this rather than a bunch of
if/else regexp comparisons.

Taking the idea of your variant further can simplify it even, e.g.

function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}
[...]

That's a very good idea. [...]

Yes, it's simple and legible. - No unnecessary 'if' cases and no hash
arrays ("huns" and "tens") that introduce unnecessary complexity
where you need only a single and clear mapping of the relevant digits.

Janis

[*] See for example https://en.wikipedia.org/wiki/Ordinal_suffix

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Noozle
  Sat Apr 19 14:10:30 2025
  from Noozle City via Telnet
- Noozle
  Sat Apr 19 09:18:26 2025
  from Noozle City via Telnet
- Microbot
  Sat Apr 19 04:21:48 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Apr 18 18:10:21 2025
  from Noozle City via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,030
Nodes:	10 (1 / 9)
Uptime:	23:21:56
Calls:	13,346
Calls today:	3
Files:	186,574
D/L today:	1,716 files (461M bytes)
Messages:	3,357,692

Nth (Ordinal Numeral Suffix)

Who's Online

Recent Visitors

System Info