function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
Mike Sanders <porkchop@invalid.foo> wrote:
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
On 2nd thought, I think this could be better rendered as:
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
{
delete v
split($0, v)
for (x in v) print nth(v[x])
}--
# eof
On 2nd thought, I think this could be better rendered as:
That's not really what "better rendered" means. The two bits of code
are functionally very different.
This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
Hey Ben =)
On 2nd thought, I think this could be better rendered as:
That's not really what "better rendered" means. The two bits of code
are functionally very different.
Oh c'mon now you're being fussy on this point & besides for you or me?
The distinction is important because you're speaking for yourself
& using that same logic, since I wrote the snippet, I can define my
own grammar no? Anyone can plainly read the 1st & 2nd versions of the
script & discern the differences. But 'quibble not'.
This is a little odd in that the output order will not necessarily match
the input order. Whilst I understand that this is probably just driver
code to test the function, it's going to make automatic testing harder.
Nothing odd about it, I believe several implementations awk using:
'for (x in array)...'
say the output in not guaranteed to be in sequential order BUT...
Aye - I'll concede this point kind sir & update the script accordingly as
it is more inline with what the user would expect (& less code to boot).
So script updated as per your suggestion:
https://busybox.neocities.org/notes/nth.txt
Good catch Ben & thank you.
Mike Sanders <porkchop@invalid.foo> wrote:
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
On 2nd thought, I think this could be better rendered as:
[...]
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
[...]
[...]
Hi Mike, I like your second version better since it doesn't _mix_
arithmetic with pattern comparisons. (Okay, there's still the
initial pattern, but as a overall test pattern that's fine, IMO.)
[...]
I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
I don't follow this.
I think (in shell) patterns are better legible. But also the Awk
transcript with patterns has a good legibility and reflects the
(literal) definition of the definition (e.g. Wikipedia)
switch (num) {
case /[^0-9]/: x="" ; break ;
case /11$|12$|13$/: x="th" ; break ;
case /1$/: x="st" ; break ;
case /2$/: x="nd" ; break ;
case /3$/: x="rd" ; break ;
default: x="th" ; break ;
}
(I've used GNU Awk's switch, but it can also be written with 'if'.)
Take care when using anchors; in your first version with /^1[1-3]$/
you where matching only three numbers. Maybe /1[1-3]$/ was intended?
(Sorry for my sloppy writing.)
You could trivially re-write this line as
if (num % 100 < 11 || num % 100 > 13) {
to save a comparison but the logic is slightly less clear.
Even less clear is to re-write as
if (num % 100 > 13 || num % 100 < 11) {
to take better advantage of lazy evaluation.
Mike Sanders <porkchop@invalid.foo> wrote:
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
On 2nd thought, I think this could be better rendered as:
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
{
delete v
split($0, v)
for (x in v) print nth(v[x])
}
On 11/3/2023 2:57 PM, Mike Sanders wrote:
Mike Sanders <porkchop@invalid.foo> wrote:
function nth(day) {
if (day ~ /^[0-9]+$/) {
if (day ~ /^1[1-3]$/ || day > 20) {
if (day % 10 == 1) return day "st"
if (day % 10 == 2) return day "nd"
if (day % 10 == 3) return day "rd"
}
return day "th"
}
return day
}
On 2nd thought, I think this could be better rendered as:
# tags: nth, ordinal, suffix, digit, numbers, awk, code
#
# appends ordinal suffix to space delimited numerals
# Michael Sanders 2023
# https://busybox.neocities.org/notes/nth.txt
#
# usage example: echo 101 42 23 98 foo | awk -f nth.txt
#
# output (1 per line): 101st 42nd 23rd 98th foo
#
# further reading:
# https://en.wikipedia.org/wiki/Ordinal_numeral
function nth(num) {
if (num ~ /^[0-9]+$/) {
if (num % 100 != 11 && num % 100 != 12 && num % 100 != 13) {
if (num % 10 == 1) return num "st"
if (num % 10 == 2) return num "nd"
if (num % 10 == 3) return num "rd"
}
return num "th"
}
return num
}
{
delete v
`split($0,v)` will delete v before repopulating it, no need to do it explicitly before calling `split()` plus that would make your code non-portable as `delete array` isn't defined by POSIX (yet).
split($0, v)
for (x in v) print nth(v[x])
The would print the output in a "random" order, do `for (x=1; x in v;
x++)` instead to get the same output order as the input order.
You don't need split() and an array at all, though, all you need is `for (x=1; x<=NF; x++) print nth($x)`.
}
Consider doing this instead (untested) to address the above points and
for improved efficiency:
BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
{
for (x=1; x<=NF; x++) print nth($x)
}
Regards,
Ed.
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
[...]
Hi Ed!
On 05.11.2023 15:13, Ed Morton wrote:
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).
Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".
Janis
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.
[**] In case that would have been the reason for this implementation.
Hi Ed!Not sure where you're seeing any of those things. There are fewer "if"s
On 05.11.2023 15:13, Ed Morton wrote:
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables)
for such a simple task and also harder to understand (or analyze in case
of errors[*]).
Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".If the OP has a large input file and wants to add "th" or "nd" to the
JanisAll I was trying to do was show an alternative implementation of the OPs
[...]
[*] The code does not produce correct results as presented. If corrected
it would probably get even (at least a bit) more complex, I suppose.
[**] In case that would have been the reason for this implementation.
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s). (Tested with GNU Awk 4.2.0)
On 11/5/2023 10:21 AM, Janis Papanagnou wrote:
Hi Ed!
On 05.11.2023 15:13, Ed Morton wrote:
or if you don't want to use a BEGIN section for some reason then remove
it and change `nth()` to this which is very, very slightly less
efficient than the above:
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !(1 in tens) ) {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
I don't see where the advantage here is. It is (IMO) unnecessary complex (many 'if' control constructs, incomplete branches, undefined variables) for such a simple task and also harder to understand (or analyze in case of errors[*]).Not sure where you're seeing any of those things. There are fewer "if"s
than were in the OPs code, if by "incomplete branches" you mean "if"
without an "else" there's nothing wrong with that and the OPs c9ode had
more of them, no undefined variables and IMO it's much simpler than the original code. And that code above was just for "if you don't want to
use a BEGIN section for some reason" while the version I'd use is what I originally posted:
BEGIN {
huns[11]; huns[12]; huns[13]
split("st nd rd th th th th th th",tens)
tens[0]="th"
}
function nth(num, sfx) {
if (num ~ /^[0-9]+$/) {
if ( !((num % 100) in huns) ) {
sfx = tens[num % 10]
}
}
return num sfx
}
which is simpler and faster again.
Simple pattern matches would be straightforward for such a primitive and certainly not time-critical[**] function like "nth()".If the OP has a large input file and wants to add "th" or "nd" to the
end of numbers on each line then "nth()" is probably the only part of it that IS time-critical.
Janis
[...]
[*] The code does not produce correct results as presented. If corrected it would probably get even (at least a bit) more complex, I suppose.All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to test it was check it produced the same output as the OPs script for the sample input they provided, which it does:
OPs code:
$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo
My code:
$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo
So, could you elaborate and provide an example where my code fails and
the OPs succeeds?
[**] In case that would have been the reason for this implementation.
The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.
Ed.
If the OP has a large input file and wants to add "th" or "nd" to the
Simple pattern matches would be straightforward for such a primitive and
certainly not time-critical[**] function like "nth()".
end of numbers on each line then "nth()" is probably the only part of it
that IS time-critical.
All I was trying to do was show an alternative implementation of the OPs code, not solve the problem the OP was trying to solve, and all I did to
test it was check it produced the same output as the OPs script for the sample input they provided, which it does:
OPs code:
$ echo 101 42 23 98 foo | awk -f nth.txt
101st
42nd
23rd
98th
foo
My code:
$ echo 101 42 23 98 foo | awk -f nth.awk
101st
42nd
23rd
98th
foo
So, could you elaborate and provide an example where my code fails and
the OPs succeeds?
[**] In case that would have been the reason for this implementation.
The reason for this implementation is it's faster, simpler, and doesn't contain duplicate code so it'll be easier to maintain.
On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Did you also test it with the OPs code that I was showing an alternative implementation of or just with the above code which is yet another alternative implementation? If so, what was the result of that run?
Ed.
On 05.11.2023 19:17, Ed Morton wrote:Apparently we just have different ideas of legible - to me a hash lookup
On 11/5/2023 11:01 AM, Janis Papanagnou wrote:
<snip>
Being curious I've compared timing of above [not corrected] function
with the simpler and clearer pattern matching based algorithm
function nth (num)
{
if (num ~ /[^0-9]/) return num;
else if (num ~ /11$|12$|13$/) return num "th"; # or use: /1[1-3]$/
else if (num ~ /1$/) return num "st";
else if (num ~ /2$/) return num "nd";
else if (num ~ /3$/) return num "rd";
else return num "th";
}
For _10 million_ function calls the difference is ~2s (~15s vs. ~17s).
(Tested with GNU Awk 4.2.0)
Did you also test it with the OPs code that I was showing an alternative
implementation of or just with the above code which is yet another
alternative implementation? If so, what was the result of that run?
Sorry, I was not interested in the OP's code. Since I had implemented
a shell version some years ago that was very readable code as opposed
to the OP's version (or your variant), that could also be implemented
in a better legible (and less complex) form in Awk, I abstained from
testing other's codes; this is something the authors should do.
I obviously missed that your variant was just intended as an optimized version of the OP's approach, so don't take my criticism too serious.
Fast pre-calculated solutions can also be legible.
Taking the idea of your variant further can simplify it even, e.g.
function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}
Building that array e[] should be explained, though, but that can be
easily done (IMO), e.g..
function init_e ()
{
for (i=0; i<=99; i++) # init with 'th' as the prevalent suffix
e[i] = "th"
for (i=1; i<=91; i+=7) { # exceptions to that are low digits 1..3
e[i++] = "st"
e[i++] = "nd"
e[i++] = "rd"
}
e[11] = e[12] = e[13] = "th" # and exception to that are 11..13
}
(something like that).
Janis
On 11/5/2023 1:04 PM, Janis Papanagnou wrote:
Fast pre-calculated solutions can also be legible.
Apparently we just have different ideas of legible
- to me a hash lookup
is the clear and obvious way to implement this rather than a bunch of
if/else regexp comparisons.
Taking the idea of your variant further can simplify it even, e.g.
function nth_pre (num)
{
if (num ~ /[^0-9]/) return num
return num e[num%100]
}
[...]
That's a very good idea. [...]
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (1 / 9) |
Uptime: | 23:21:56 |
Calls: | 13,346 |
Calls today: | 3 |
Files: | 186,574 |
D/L today: |
1,716 files (461M bytes) |
Messages: | 3,357,692 |