• Unique Characters related: Isogram Coding Puzzle

    From yeti@yeti@tilde.institute to comp.lang.awk on Sun Oct 1 15:14:12 2023
    From Newsgroup: comp.lang.awk

    WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

    That was a nice and fun one. \o/

    Try it.
    --
    R || 0 ... Resistance is futile.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 22:36:58 2023
    From Newsgroup: comp.lang.awk

    On 01.10.2023 17:14, yeti wrote:
    WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

    Under the link you find:

    "Isogram words are these with all letters different (no letters
    duplicated). For instance “Hydropneumatics” is Isogram word.
    Your challenge this weekend is to make program which scans text
    and displays the longest Isogram word found in the scanned text."

    And an (obviously broken) data link to alice_in_wonderland.html


    That was a nice and fun one. \o/

    Try it.

    The point is that such types of tasks can be simply solved by
    Unix commands. E.g. the following code

    grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
    awk '{print length($0),$0}' | sort -n | tail

    produces - sensible folks DO NOT READ FURTHER (strong language) !!!




    13 clergywoman's
    13 demographic's
    13 documentary's
    13 expurgation's
    13 motherfucking
    13 thunderclap's
    13 tragicomedy's
    13 valedictory's
    14 ambidextrously
    14 lexicography's

    and so I'd throw in "ambidextrously" as a possible good word.


    As homework do that in GNU Awk - I think it is not difficult. :-)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 22:51:06 2023
    From Newsgroup: comp.lang.awk

    On 01.10.2023 22:36, Janis Papanagnou wrote:
    On 01.10.2023 17:14, yeti wrote:
    WEEKEND PROGRAMMING CHALLENGE ISSUE #4

    grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
    awk '{print length($0),$0}' | sort -n | tail

    grep -Ev '(.).*\1'

    is of course a sufficient grep pattern.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Sun Oct 1 22:40:43 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 01.10.2023 17:14, yeti wrote:
    WEEKEND PROGRAMMING CHALLENGE ISSUE #4
    https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

    Under the link you find:

    "Isogram words are these with all letters different (no letters
    duplicated). For instance “Hydropneumatics” is Isogram word.
    Your challenge this weekend is to make program which scans text
    and displays the longest Isogram word found in the scanned text."

    And an (obviously broken) data link to alice_in_wonderland.html


    That was a nice and fun one. \o/

    Try it.

    The point is that such types of tasks can be simply solved by
    Unix commands. E.g. the following code

    grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

    That's a neat trick! The initial and final .* are, however, redundant
    and removing them makes the search noticeably faster (though it hardy
    matters).

    awk '{print length($0),$0}' | sort -n | tail

    I generally use 'sort -rn | head' for this sort of thing, but that's
    just a preference for the output order.

    Comments on the exercise suggest that case should be ignored so maybe a
    'tr A-Z a-z' in the pipe is needed. Personally, I'd also exclude
    apostrophes:

    </usr/share/dict/american-english tr A-Z a-z | \
    grep -Ev "(.).*\1|'" | awk '{print length($0),$0}' | sort -rn | head

    As homework do that in GNU Awk - I think it is not difficult. :-)

    GNU AWK does not permit numbered back references in REs so it's going to
    be more fiddly, though probably faster. Something like:

    function is_isogram(s, letters, unique, i) {
    split(tolower(s), letters, //)
    for (i in letters) unique[letters[i]] = 1
    return length(letters) == length(unique)
    }

    !/'/ && length($0) > max && is_isogram($0) {
    max = length($0)
    max_isogram = $0
    }

    END { print max_isogram }
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 23:54:39 2023
    From Newsgroup: comp.lang.awk

    On 01.10.2023 23:40, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    The point is that such types of tasks can be simply solved by
    Unix commands. E.g. the following code

    grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

    That's a neat trick! The initial and final .* are, however, redundant
    and removing them makes the search noticeably faster (though it hardy matters).

    Yes, I posted a follow-up where I already noted that.


    awk '{print length($0),$0}' | sort -n | tail

    I generally use 'sort -rn | head' for this sort of thing, but that's
    just a preference for the output order.

    Yes.


    Comments on the exercise suggest that case should be ignored so maybe a
    'tr A-Z a-z' in the pipe is needed.

    Partly solved simply by a 'grep -Evi', but only for the first part.
    So, yes, you're right

    Personally, I'd also exclude apostrophes:

    Indeed. (I've just taken a Linux standard dictionary as test data,
    since the proposed text was unavailable. For a more complex text
    there's certainly a lot more cleanup to be done beforehand.)

    [snip]

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 00:03:08 2023
    From Newsgroup: comp.lang.awk

    On 01.10.2023 23:40, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    As homework do that in GNU Awk - I think it is not difficult. :-)

    GNU AWK does not permit numbered back references in REs so it's going to
    be more fiddly, though probably faster.

    Here I was not so much focused on the back-reference but on the
    code that had already been posted in that other thread and that
    could simply be used, e.g. like

    # already existing function

    function uniqueChars (t, s, n, i, c, o, seen)
    {
    delete seen
    n = split (t, s, "")
    for (i=1; i<=n; i++)
    if (!seen[c = s[i]]++)
    o = o c

    return o
    }

    # new code below

    $0 == uniqueChars($0) && length($0) > maxlen {
    maxlen = length($0)
    word = $0
    }

    END { print maxlen, word }


    Of course there are also other ways to implement the function,
    like yours...

    Something like:

    function is_isogram(s, letters, unique, i) {
    split(tolower(s), letters, //)
    for (i in letters) unique[letters[i]] = 1
    return length(letters) == length(unique)
    }

    !/'/ && length($0) > max && is_isogram($0) {
    max = length($0)
    max_isogram = $0
    }

    END { print max_isogram }


    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 00:25:00 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    Indeed. (I've just taken a Linux standard dictionary as test data,
    since the proposed text was unavailable. For a more complex text
    there's certainly a lot more cleanup to be done beforehand.)

    <https://www.gutenberg.org/cache/epub/11/pg11.txt>
    --
    This stealth signature intentionally left blank.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 06:01:59 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    # new code below

    $0 == uniqueChars($0) && length($0) > maxlen {
    maxlen = length($0)
    word = $0
    }

    END { print maxlen, word }

    Now that's really nice. I like the thinking here.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 10:23:37 2023
    From Newsgroup: comp.lang.awk

    On 02.10.2023 02:25, yeti wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    Indeed. (I've just taken a Linux standard dictionary as test data,
    since the proposed text was unavailable. For a more complex text
    there's certainly a lot more cleanup to be done beforehand.)

    <https://www.gutenberg.org/cache/epub/11/pg11.txt>

    In this text I could only find seven isogram words of max.
    length 10 (complained, croqueting, curtseying, educations,
    flamingoes, flamingoes, scrambling). - Is that expected?

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 13:20:00 2023
    From Newsgroup: comp.lang.awk

    Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

    ... confirms ‘curtseying’ as solution.
    --
    Fake signature.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 15:33:53 2023
    From Newsgroup: comp.lang.awk

    On 02.10.2023 15:20, yeti wrote:
    Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

    ...says "This Weekend Programming Challenge have record submissions,
    either the problem was very easy [...]" - I suppose it was.

    ...and: "I count total 30 solutions, some of them very elegant, some of
    them very short [...]" - But where can we find the code to all these
    solutions contributed? (I can't see anything on that page.)

    ...specifically: "I still bang my head to understand what this one line
    AWK shell script solution does" - Certainly interesting for c.l.awk

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 13:59:17 2023
    From Newsgroup: comp.lang.awk

    <https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24>
    --
    Recursive signature
    |--
    |Recursive signature
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 18:33:24 2023
    From Newsgroup: comp.lang.awk

    On 02.10.2023 15:33, Janis Papanagnou wrote:

    ...specifically: "I still bang my head to understand what this one line
    AWK shell script solution does" - Certainly interesting for c.l.awk

    https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24/readme.txt

    awk 'BEGIN { RS="[^A-Za-z]" } $0 { word=tolower($0) ; if(word in
    WordSeen) next ; WordSeen[word]=1 ; split(word,Letters,"") ; delete
    CharSeen ; for(char in Letters) if(++CharSeen[Letters[char]]>1) next ; len=length(word) ; if(len>maxlen) { maxword=word ; maxlen=len } } END {
    print maxword}'

    Not something I'd call a one-liner. (It's just a complete program in one
    line, just omitting newlines.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114