• Re: Unique Characters Only

    From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 09:11:13 2023
    From Newsgroup: comp.lang.awk

    On 10/1/2023 4:38 AM, Mike Sanders wrote:
    run as...

    awk -f uniqueChars.awk

    output...

    Input string: Mary had a little lamb who's fleece was white as snow... Unique chars: Mary hdlitembwo'sfcn.

    script...

    BEGIN {

    a = "Mary had a little lamb who's fleece was white as snow..."
    b = uniqueChars(a)

    print "Input string: " a
    print "Unique chars: " b

    }

    function uniqueChars(str, x, y, c, tmp, uniqueStr) {

    y = length(str)
    uniqueStr = ""
    delete tmp # clear array for each new string
    You don't need to do that `delete` - just having "tmp" listed in the
    args list will re-init it every time the function is called. Removing
    that statement will also make your script portable to awks than don't
    support `delete array` (but most, possibly all, modern awks do support
    that even though it's technically still undefined behavior).


    while(++x <= y) {
    Using a `while` instead of `for` loop for that makes your code a bit
    less clear, a bit more fragile (what if `x` gets set above?), and a bit
    harder to maintain (what if in future you need to increment x by 2 every iteration?). It's not worth saving the few characters over the
    traditional `for ( x=1; x<=y; x++ )`

    c = substr(str, x, 1)
    if (!(c in tmp)) {
    Idiomatically that'd be implemented as

    if ( !tmp[c]++ ) {

    and then you'd remove the `tmp[c]` below but the array in that case is
    almost always named `seen[]` rather than `tmp[]`.

    uniqueStr = uniqueStr c
    tmp[c]
    }
    }

    return uniqueStr

    }

    Alternatively, if the order of the characters returned doesn't matter,
    you could do:

    function uniqueChars(str, x, y, c, tmp, uniqueStr) {

    y = length(str)
    uniqueStr = ""
    for ( x=1; x<=y; x++ ) {
    tmp[substr(str,x,1)]
    }
    for ( c in tmp ) {
    uniqueStr = uniqueStr c
    }

    return uniqueStr

    }

    I don't expect that to be any faster or anything, it's just different,
    but if you have GNU awk then it can be tweaked to:

    function uniqueChars(str, x, y, c, tmp, uniqueStr) {

    y = length(str)
    uniqueStr = ""
    for ( x=1; x<=y; x++ ) {
    tmp[substr(str,x,1)]
    }
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for ( c in tmp ) {
    uniqueStr = uniqueStr c
    }

    return uniqueStr

    }

    and then it'll return the unique characters sorted in alphabetic order
    which may be useful.

    Regards,

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Nov 6 03:18:19 2023
    From Newsgroup: comp.lang.awk

    Ed Morton <mortonspam@gmail.com> wrote:

    You don't need to do that `delete` - just having "tmp" listed in the
    args list will re-init it every time the function is called. Removing
    that statement will also make your script portable to awks than don't support `delete array` (but most, possibly all, modern awks do support
    that even though it's technically still undefined behavior).

    You know I wondered about that, thought I'd play it safe, but yeah,
    noted: array always created anew, good to know.

    while(++x <= y) {
    Using a `while` instead of `for` loop for that makes your code a bit
    less clear, a bit more fragile (what if `x` gets set above?), and a bit harder to maintain (what if in future you need to increment x by 2 every iteration?).

    Aye.

    It's not worth saving the few characters over the
    traditional `for ( x=1; x<=y; x++ )`

    c = substr(str, x, 1)
    if (!(c in tmp)) {
    Idiomatically that'd be implemented as

    if ( !tmp[c]++ ) {

    and then you'd remove the `tmp[c]` below but the array in that case is almost always named `seen[]` rather than `tmp[]`.

    uniqueStr = uniqueStr c
    tmp[c]
    }
    }

    return uniqueStr

    }

    Alternatively, if the order of the characters returned doesn't matter,
    you could do:

    function uniqueChars(str, x, y, c, tmp, uniqueStr) {

    y = length(str)
    uniqueStr = ""
    for ( x=1; x<=y; x++ ) {
    tmp[substr(str,x,1)]
    }
    for ( c in tmp ) {
    uniqueStr = uniqueStr c
    }

    return uniqueStr

    }

    I don't expect that to be any faster or anything, it's just different,
    but if you have GNU awk then it can be tweaked to:

    function uniqueChars(str, x, y, c, tmp, uniqueStr) {

    y = length(str)
    uniqueStr = ""
    for ( x=1; x<=y; x++ ) {
    tmp[substr(str,x,1)]
    }
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for ( c in tmp ) {
    uniqueStr = uniqueStr c
    }

    return uniqueStr

    }

    and then it'll return the unique characters sorted in alphabetic order
    which may be useful.

    Must add these examples to my notes.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114