# Michael Sanders 2023
# https://busybox.neocities.org/notes/isogram.txt
#
# awk script that displays isograms using inverse ANSI
# escapes (meaning fore & background colors are swapped)
# requires an ANSI capable terminal, rename this file
# and invoke script as:
#
# awk -f isogram.awk file
#
# isogram test block...
#
# aberration lucrative concurrent espouse obfuscate
# garrulous promenade epiphany requiem juxtapose
# languid ephemeral abscond extricate circumvent
# obstinate vivacious corroborate attenuate paragon
# penchant serendipity superfluous immutable mitigate
# aplomb concatenate ethereal diaphanous demagogue
# cogitate pervasive anathema juxtaposition memento
# disparate oscillate ennui perfunctory parabola
# mellifluous recumbent ephemeral sycophant timorous
# voracious quixotic serenade conundrum vicarious
# insipid ornate camaraderie cogent introspection
# sanguine deleterious impeccable extraneous loquacious
BEGIN { print "\nisograms...\n" }
function hilite(str) { return "\033[7m" str "\033[0m" }
function isogram(str, c, x, y) {
y = length(str)
for (x = 1; x <= y; x++) {
c = substr(str, x, 1)
if (index(substr(str, x + 1), c) > 0) return 0 # !isogram
}
return 1 # isogram
}
{
word = ""
line = ""
for (x = 1; x <= length($0); x++) {
c = substr($0, x, 1)
if (c ~ /[[:space:]]/ || x == length($0)) {
if (x == length($0) && c !~ /[[:space:]]/) word = word c
line = (isogram(word) ? line hilite(word) : line word)
if(c ~ /[[:space:]]/) line = line c
word = ""
} else {
word = word c
}
}
print line
}
# eof
A quick glimpse at your code gives the impression that you
are parsing the line character-wise to identify "words".
In Awk it is usually better to use the inherent splitting
procedure and operate on $1, $2, etc. Even for cases where
punctuation and other characters may get into your way you
can just define the FS regular expression so that it fits
your needs. That should make your program much simpler and
also easier to understand and maintain.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
A quick glimpse at your code gives the impression that you
are parsing the line character-wise to identify "words".
In Awk it is usually better to use the inherent splitting
procedure and operate on $1, $2, etc. Even for cases where
punctuation and other characters may get into your way you
can just define the FS regular expression so that it fits
your needs. That should make your program much simpler and
also easier to understand and maintain.
Hi Janis.
Sure enough, you're 100% correct on this in my thinking.
In fact, I'm working now on a variant that does use $1, $2,
etc... One issue I'm groping to understand is how *not* to
destroy the layout of a given file upon output. In other
words, I want the output equal to the input with only
difference being that isograms are inverse color. The only
way I've worked through, so far at least, is to not assume
any file structure other than words...
Its an interesting problem to think about =)
[...]
{
out = ""
for (line=$0; match(line, /[[:alpha:]]+/); line=substr(line,RSTART+RLENGTH)) {
out = out substr(line,1,RSTART-1)
escape(substr(line,RSTART,RLENGTH))
}
out = out line
print out
}
[...]
Yes. A solution may also depend on the Awk version you are
allowed to use. With GNU Awk you can preserve the formatting
by using its newer features (array of separators!).
function predicate (s) { ...here's your isogram function... }
function escape (s) { return predicate(s) ? "E" s "E" : s }
# replace the two "E" by your ANSI escape code strings
{
out = ""
for (line=$0; match(line, /[[:alpha:]]+/); line=substr(line,RSTART+RLENGTH)) {
out = out substr(line,1,RSTART-1)
escape(substr(line,RSTART,RLENGTH))
}
out = out line
print out
}
This code specifies the 'alpha' words as entities to consider;
change as desired. (I saw that your code also highlights a '#'
for example; not sure this is intended, though.)
# requires an ANSI capable terminal...
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
function predicate (s) { ...here's your isogram function... }
Excellent name for a function.
[...]
This code specifies the 'alpha' words as entities to consider;
change as desired. (I saw that your code also highlights a '#'
for example; not sure this is intended, though.)
A single char... isogram or not? Probably not really, and then
there's the case of 'mixed' strings as your snippet deals with,
'abc-321'.
But back to the single character issue I'm going with:
function isogram(str, c, x, y) {
y = length(str)
if (y < 2) return 0 # !isogram <-- added this
[...]
That's why I also think a pattern based approach has advantages.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 920 |
Nodes: | 10 (0 / 10) |
Uptime: | 106:10:08 |
Calls: | 12,190 |
Calls today: | 2 |
Files: | 186,527 |
Messages: | 2,237,550 |