In a recent thread I posted an Awk code pattern to define words that
match a pattern and conditionally transforms it; it just relied on
POSIX Awk features. Actually, though, it's a generally usable code
pattern. With standard Awk you can substitute the entity pattern and
function to transform the defined data entities as necessary.
GNU Awk supports a couple newer features to make that generalization
more explicit, by use of first class patterns and indirect functions.
# generic function to transform specified data entities
function trent (line, pattern, transform, out)
{
for (line=$0; match(line, pattern);
line=substr(line, RSTART+RLENGTH))
{
out = out substr(line, 1, RSTART-1) \
@transform(substr(line, RSTART, RLENGTH))
}
out = out line
return out
}
With a transformation function like
function highlight (str)
{
return "\033[7m" str "\033[0m"
}
a sample usage can be
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
}
Applied to the task from the other thread you can provide
function isogram_highlight (str)
{
return (isogram(str) ? "\033[7m" str "\033[0m" : str)
}
using Mike's (only slightly changed by me) isogram() algorithm
function isogram(str, c, x, y) {
y = length(str)
for (x = 1; x < y; x++) {
c = substr(str, x, 1)
if (index(substr(str, x + 1), c)) return 0
}
return 1
}
in a context like
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
}
Note again that this solution based on a generalized algorithm
uses GNU Awk specific features and is not conforming to POSIX!
Janis
In a recent thread I posted an Awk code pattern to define words that
match a pattern and conditionally transforms it; it just relied on
POSIX Awk features. Actually, though, it's a generally usable code
pattern. With standard Awk you can substitute the entity pattern and
function to transform the defined data entities as necessary.
GNU Awk supports a couple newer features to make that generalization
more explicit, by use of first class patterns and indirect functions.
# generic function to transform specified data entities
function trent (line, pattern, transform, out)
{
for (line=$0; match(line, pattern);
line=substr(line, RSTART+RLENGTH))
{
out = out substr(line, 1, RSTART-1) \
@transform(substr(line, RSTART, RLENGTH))
}
out = out line
return out
}
With a transformation function like
function highlight (str)
{
return "\033[7m" str "\033[0m"
}
a sample usage can be
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
}
Applied to the task from the other thread you can provide
function isogram_highlight (str)
{
return (isogram(str) ? "\033[7m" str "\033[0m" : str)
}
using Mike's (only slightly changed by me) isogram() algorithm
function isogram(str, c, x, y) {
y = length(str)
for (x = 1; x < y; x++) {
c = substr(str, x, 1)
if (index(substr(str, x + 1), c)) return 0
}
return 1
}
in a context like
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
}
Note again that this solution based on a generalized algorithm
uses GNU Awk specific features and is not conforming to POSIX!
Janis
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
[...]
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
}
Note again that this solution based on a generalized algorithm
uses GNU Awk specific features and is not conforming to POSIX!
Good stuff. Adding this to my notes in fact. I really was hoping
others would see some value in using hilite(). Its handy on my end too.
I'm using ANSI escaped from time to time, and also just recently,
e.g. for coloring.
But my point here was more the generalization. The task to change
some entities on a line while preserving the spacing, delimiters,
and other information is quite common. I used it a couple times
and always reprogrammed the two-lines loop with different pattern
for different transformations. That's why I think that GNU Awk's
features - too sad you cannot use them! - are valuable; they can
emulate quite nicely what other languages do with real function
arguments.
I expanded my test program[*] with some more simple applications
that lead to
BEGIN {
...
words = @/[[:alpha:]]+/
numbers = @/[[:digit:]]+/
names = @/([[:upper:]][.])*[[:upper:]][[:lower:]]*/
}
[*] Extended test program: volatile.gridbug.de/transform_words
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
[*] Extended test program: volatile.gridbug.de/transform_words
Will you have an index page of your projects/snippets
in the future Janis?
Unfortunately(?), no...
You only live once Janis, I hope someday you'll reconsider
for the benefit of others =)
In a recent thread I posted an Awk code pattern to define words that
match a pattern and conditionally transforms it; it just relied on
POSIX Awk features. Actually, though, it's a generally usable code
pattern. With standard Awk you can substitute the entity pattern and function to transform the defined data entities as necessary.
GNU Awk supports a couple newer features to make that generalization
more explicit, by use of first class patterns and indirect functions.
# generic function to transform specified data entities
function trent (line, pattern, transform, out)
{
for (line=$0; match(line, pattern);
line=substr(line, RSTART+RLENGTH))
{
out = out substr(line, 1, RSTART-1) \
@transform(substr(line, RSTART, RLENGTH))
}
out = out line
return out
}
With a transformation function like
function highlight (str)
{
return "\033[7m" str "\033[0m"
}
a sample usage can be
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
}
Applied to the task from the other thread you can provide
function isogram_highlight (str)
{
return (isogram(str) ? "\033[7m" str "\033[0m" : str)
}
using Mike's (only slightly changed by me) isogram() algorithm
function isogram(str, c, x, y) {
y = length(str)
for (x = 1; x < y; x++) {
c = substr(str, x, 1)
if (index(substr(str, x + 1), c)) return 0
}
return 1
}
in a context like
BEGIN { words = @/[[:alpha:]]+/ }
{
print trent($0, words, "highlight")
print trent($0, words, "isogram_highlight")
}
Note again that this solution based on a generalized algorithmhmm ….. a heterogram is when # unique chars == string length, but isogram technically just means all chars within it show up at the same frequency -
uses GNU Awk specific features and is not conforming to POSIX!
Janis
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 920 |
Nodes: | 10 (1 / 9) |
Uptime: | 91:57:04 |
Calls: | 12,188 |
Files: | 186,527 |
Messages: | 2,237,240 |