WEEKEND PROGRAMMING CHALLENGE ISSUE #4 https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/
That was a nice and fun one. \o/
Try it.
On 01.10.2023 17:14, yeti wrote:
WEEKEND PROGRAMMING CHALLENGE ISSUE #4
grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail
On 01.10.2023 17:14, yeti wrote:
WEEKEND PROGRAMMING CHALLENGE ISSUE #4
https://olimex.wordpress.com/2013/04/12/weekend-programming-challenge-issue-4/
Under the link you find:
"Isogram words are these with all letters different (no letters
duplicated). For instance âHydropneumaticsâ is Isogram word.
Your challenge this weekend is to make program which scans text
and displays the longest Isogram word found in the scanned text."
And an (obviously broken) data link to alice_in_wonderland.html
That was a nice and fun one. \o/
Try it.
The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code
grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
awk '{print length($0),$0}' | sort -n | tail
As homework do that in GNU Awk - I think it is not difficult. :-)
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
The point is that such types of tasks can be simply solved by
Unix commands. E.g. the following code
grep -Ev '.*(.).*\1.*' /usr/share/dict/american-english |
That's a neat trick! The initial and final .* are, however, redundant
and removing them makes the search noticeably faster (though it hardy matters).
awk '{print length($0),$0}' | sort -n | tail
I generally use 'sort -rn | head' for this sort of thing, but that's
just a preference for the output order.
Comments on the exercise suggest that case should be ignored so maybe a
'tr A-Z a-z' in the pipe is needed.
Personally, I'd also exclude apostrophes:
[snip]
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
As homework do that in GNU Awk - I think it is not difficult. :-)
GNU AWK does not permit numbered back references in REs so it's going to
be more fiddly, though probably faster.
Something like:
function is_isogram(s, letters, unique, i) {
split(tolower(s), letters, //)
for (i in letters) unique[letters[i]] = 1
return length(letters) == length(unique)
}
!/'/ && length($0) > max && is_isogram($0) {
max = length($0)
max_isogram = $0
}
END { print max_isogram }
Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)
# new code below
$0 == uniqueChars($0) && length($0) > maxlen {
maxlen = length($0)
word = $0
}
END { print maxlen, word }
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
Indeed. (I've just taken a Linux standard dictionary as test data,
since the proposed text was unavailable. For a more complex text
there's certainly a lot more cleanup to be done beforehand.)
<https://www.gutenberg.org/cache/epub/11/pg11.txt>
Weekend Programming Challenge ISSUE #4 â Solutions <https://olimex.wordpress.com/2013/04/15/weekend-programming-challenge-issue-4-solutions/>
...specifically: "I still bang my head to understand what this one line
AWK shell script solution does" - Certainly interesting for c.l.awk
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 920 |
Nodes: | 10 (1 / 9) |
Uptime: | 96:40:50 |
Calls: | 12,189 |
Calls today: | 1 |
Files: | 186,527 |
Messages: | 2,237,348 |