Forum: War Ensemble BBS

Unique Characters related: Isogram Coding Puzzle

From yeti@yeti@tilde.institute to comp.lang.awk on Sun Oct 1 15:14:12 2023

From Newsgroup: comp.lang.awk

WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

That was a nice and fun one. \o/

Try it.
--
R || 0 ... Resistance is futile.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 22:36:58 2023

From Newsgroup: comp.lang.awk

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

Under the link you find:

"Isogram words are these with all letters different (no letters
duplicated). For instance �Hydropneumatics� is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."

And an (obviously broken) data link to alice_in_wonderland.html

That was a nice and fun one. \o/

Try it.

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail

produces - sensible folks DO NOT READ FURTHER (strong language) !!!

13 clergywoman's
13 demographic's
13 documentary's
13 expurgation's
13 motherfucking
13 thunderclap's
13 tragicomedy's
13 valedictory's
14 ambidextrously
14 lexicography's

and so I'd throw in "ambidextrously" as a possible good word.

As homework do that in GNU Awk - I think it is not difficult. :-)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 22:51:06 2023

From Newsgroup: comp.lang.awk

On 01.10.2023 22:36, Janis Papanagnou wrote:

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail

grep -Ev '(.).*\1'

is of course a sufficient grep pattern.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Sun Oct 1 22:40:43 2023

From Newsgroup: comp.lang.awk

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 01.10.2023 17:14, yeti wrote:

WEEKEND PROGRAMMING CHALLENGE ISSUE #4
https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/

Under the link you find:

"Isogram words are these with all letters different (no letters
duplicated). For instance “Hydropneumatics” is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."

And an (obviously broken) data link to alice_in_wonderland.html

That was a nice and fun one. \o/

Try it.

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy
matters).

awk '{print length($0),$0}' | sort -n | tail

I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.

Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed. Personally, I'd also exclude
apostrophes:

</usr/share/dict/american-english tr A-Z a-z | \
grep -Ev "(.).*\1|'" | awk '{print length($0),$0}' | sort -rn | head

As homework do that in GNU Awk - I think it is not difficult. :-)

GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster. Something like:

function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}

!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}

END { print max_isogram }
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Oct 1 23:54:39 2023

From Newsgroup: comp.lang.awk

On 01.10.2023 23:40, Ben Bacarisse wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code

grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |

That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy matters).

Yes, I posted a follow-up where I already noted that.

awk '{print length($0),$0}' | sort -n | tail

I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.

Yes.

Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed.

Partly solved simply by a 'grep -Evi', but only for the first part.
So, yes, you're right

Personally, I'd also exclude apostrophes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

[snip]

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 00:03:08 2023

From Newsgroup: comp.lang.awk

On 01.10.2023 23:40, Ben Bacarisse wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

As homework do that in GNU Awk - I think it is not difficult. :-)

GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster.

Here I was not so much focused on the back-reference but on the
code that had already been posted in that other thread and that
could simply be used, e.g. like

# already existing function

function uniqueChars (t, s, n, i, c, o, seen)
{
delete seen
n = split (t, s, "")
for (i=1; i<=n; i++)
if (!seen[c = s[i]]++)
o = o c

return o
}

# new code below

$0 == uniqueChars($0) && length($0) > maxlen {
maxlen = length($0)
word = $0
}

END { print maxlen, word }

Of course there are also other ways to implement the function,
like yours...

Something like:

function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}

!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}

END { print max_isogram }

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 00:25:00 2023

From Newsgroup: comp.lang.awk

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

<https://www.gutenberg.org/cache/epub/11/pg11.txt>
--
This stealth signature intentionally left blank.
--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 06:01:59 2023

From Newsgroup: comp.lang.awk

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

# new code below

$0 == uniqueChars($0) && length($0) > maxlen {
maxlen = length($0)
word = $0
}

END { print maxlen, word }

Now that's really nice. I like the thinking here.
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 10:23:37 2023

From Newsgroup: comp.lang.awk

On 02.10.2023 02:25, yeti wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)

<https://www.gutenberg.org/cache/epub/11/pg11.txt>

In this text I could only find seven isogram words of max.
length 10 (complained, croqueting, curtseying, educations,
flamingoes, flamingoes, scrambling). - Is that expected?

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 13:20:00 2023

From Newsgroup: comp.lang.awk

Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

... confirms ‘curtseying’ as solution.
--
Fake signature.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 15:33:53 2023

From Newsgroup: comp.lang.awk

On 02.10.2023 15:20, yeti wrote:

Weekend Programming Challenge ISSUE #4 – Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>

...says "This Weekend Programming Challenge have record submissions,
either the problem was very easy [...]" - I suppose it was.

...and: "I count total 30 solutions, some of them very elegant, some of
them very short [...]" - But where can we find the code to all these
solutions contributed? (I can't see anything on that page.)

...specifically: "I still bang my head to understand what this one line
AWK shell script solution does" - Certainly interesting for c.l.awk

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From yeti@yeti@tilde.institute to comp.lang.awk on Mon Oct 2 13:59:17 2023

From Newsgroup: comp.lang.awk

<https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24>
--
Recursive signature
|--
|Recursive signature
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 2 18:33:24 2023

From Newsgroup: comp.lang.awk

On 02.10.2023 15:33, Janis Papanagnou wrote:

...specifically: "I still bang my head to understand what this one line
AWK shell script solution does" - Certainly interesting for c.l.awk

https://github.com/OLIMEX/WPC/tree/master/ISSUE-4/SOLUTION-24/readme.txt

awk 'BEGIN { RS="[^A-Za-z]" } $0 { word=tolower($0) ; if(word in
WordSeen) next ; WordSeen[word]=1 ; split(word,Letters,"") ; delete
CharSeen ; for(char in Letters) if(++CharSeen[Letters[char]]>1) next ; len=length(word) ; if(len>maxlen) { maxword=word ; maxlen=len } } END {
print maxword}'

Not something I'd call a one-liner. (It's just a complete program in one
line, just omitting newlines.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Microbot
  Mon May 6 20:15:29 2024
  from Moore, Ok via Telnet
- Duke
  Mon May 6 11:17:35 2024
  from London via Telnet
- Grey Gamer
  Mon May 6 07:57:21 2024
  from Show Low, Az via Telnet
- Grey Gamer
  Tue May 7 06:11:28 2024
  from Show Low, Az via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	920
Nodes:	10 (1 / 9)
Uptime:	96:40:50
Calls:	12,189
Calls today:	1
Files:	186,527
Messages:	2,237,348

Unique Characters related: Isogram Coding Puzzle

Who's Online

Recent Visitors

System Info