Forum: War Ensemble BBS

Re: Unique Characters Only

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 09:11:13 2023

From Newsgroup: comp.lang.awk

On 10/1/2023 4:38 AM, Mike Sanders wrote:

run as...

awk -f uniqueChars.awk

output...

Input string: Mary had a little lamb who's fleece was white as snow... Unique chars: Mary hdlitembwo'sfcn.

script...

BEGIN {

a = "Mary had a little lamb who's fleece was white as snow..."
b = uniqueChars(a)

print "Input string: " a
print "Unique chars: " b

}

function uniqueChars(str, x, y, c, tmp, uniqueStr) {

y = length(str)
uniqueStr = ""
delete tmp # clear array for each new string

You don't need to do that `delete` - just having "tmp" listed in the
args list will re-init it every time the function is called. Removing
that statement will also make your script portable to awks than don't
support `delete array` (but most, possibly all, modern awks do support
that even though it's technically still undefined behavior).

while(++x <= y) {

Using a `while` instead of `for` loop for that makes your code a bit
less clear, a bit more fragile (what if `x` gets set above?), and a bit
harder to maintain (what if in future you need to increment x by 2 every iteration?). It's not worth saving the few characters over the
traditional `for ( x=1; x<=y; x++ )`

c = substr(str, x, 1)
if (!(c in tmp)) {

Idiomatically that'd be implemented as

if ( !tmp[c]++ ) {

and then you'd remove the `tmp[c]` below but the array in that case is
almost always named `seen[]` rather than `tmp[]`.

uniqueStr = uniqueStr c
tmp[c]
}
}

return uniqueStr

}

Alternatively, if the order of the characters returned doesn't matter,
you could do:

function uniqueChars(str, x, y, c, tmp, uniqueStr) {

y = length(str)
uniqueStr = ""
for ( x=1; x<=y; x++ ) {
tmp[substr(str,x,1)]
}
for ( c in tmp ) {
uniqueStr = uniqueStr c
}

return uniqueStr

}

I don't expect that to be any faster or anything, it's just different,
but if you have GNU awk then it can be tweaked to:

function uniqueChars(str, x, y, c, tmp, uniqueStr) {

y = length(str)
uniqueStr = ""
for ( x=1; x<=y; x++ ) {
tmp[substr(str,x,1)]
}
PROCINFO["sorted_in"] = "@ind_str_asc"
for ( c in tmp ) {
uniqueStr = uniqueStr c
}

return uniqueStr

}

and then it'll return the unique characters sorted in alphabetic order
which may be useful.

Regards,

Ed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Nov 6 03:18:19 2023

From Newsgroup: comp.lang.awk

Ed Morton <mortonspam@gmail.com> wrote:

You don't need to do that `delete` - just having "tmp" listed in the
args list will re-init it every time the function is called. Removing
that statement will also make your script portable to awks than don't support `delete array` (but most, possibly all, modern awks do support
that even though it's technically still undefined behavior).

You know I wondered about that, thought I'd play it safe, but yeah,
noted: array always created anew, good to know.

while(++x <= y) {

Using a `while` instead of `for` loop for that makes your code a bit
less clear, a bit more fragile (what if `x` gets set above?), and a bit harder to maintain (what if in future you need to increment x by 2 every iteration?).

Aye.

It's not worth saving the few characters over the
traditional `for ( x=1; x<=y; x++ )`

c = substr(str, x, 1)
if (!(c in tmp)) {

Idiomatically that'd be implemented as

if ( !tmp[c]++ ) {

and then you'd remove the `tmp[c]` below but the array in that case is almost always named `seen[]` rather than `tmp[]`.

uniqueStr = uniqueStr c
tmp[c]
}
}

return uniqueStr

}

Alternatively, if the order of the characters returned doesn't matter,
you could do:

function uniqueChars(str, x, y, c, tmp, uniqueStr) {

y = length(str)
uniqueStr = ""
for ( x=1; x<=y; x++ ) {
tmp[substr(str,x,1)]
}
for ( c in tmp ) {
uniqueStr = uniqueStr c
}

return uniqueStr

}

I don't expect that to be any faster or anything, it's just different,
but if you have GNU awk then it can be tweaked to:

function uniqueChars(str, x, y, c, tmp, uniqueStr) {

y = length(str)
uniqueStr = ""
for ( x=1; x<=y; x++ ) {
tmp[substr(str,x,1)]
}
PROCINFO["sorted_in"] = "@ind_str_asc"
for ( c in tmp ) {
uniqueStr = uniqueStr c
}

return uniqueStr

}

and then it'll return the unique characters sorted in alphabetic order
which may be useful.

Must add these examples to my notes.
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Microbot
  Fri Apr 18 06:08:31 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Apr 18 05:04:09 2025
  from Noozle City via Telnet
- Oodler
  Fri Apr 18 01:30:42 2025
  from Houston, Texas via Raw
- Noozle
  Thu Apr 17 18:11:52 2025
  from Noozle City via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,030
Nodes:	10 (0 / 10)
Uptime:	200:56:36
Calls:	13,340
Calls today:	3
Files:	186,574
D/L today:	3,503 files (1,084M bytes)
Messages:	3,357,058

Re: Unique Characters Only

Who's Online

Recent Visitors

System Info