• Display Isograms Using Awk and Inverse ANSI

    From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Wed Oct 11 00:51:07 2023
    From Newsgroup: comp.lang.awk

    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/isogram.txt
    #
    # awk script that displays isograms using inverse ANSI
    # escapes (meaning fore & background colors are swapped)
    # requires an ANSI capable terminal, rename this file
    # and invoke script as:
    #
    # awk -f isogram.awk file
    #
    # isogram test block...
    #
    # aberration lucrative concurrent espouse obfuscate
    # garrulous promenade epiphany requiem juxtapose
    # languid ephemeral abscond extricate circumvent
    # obstinate vivacious corroborate attenuate paragon
    # penchant serendipity superfluous immutable mitigate
    # aplomb concatenate ethereal diaphanous demagogue
    # cogitate pervasive anathema juxtaposition memento
    # disparate oscillate ennui perfunctory parabola
    # mellifluous recumbent ephemeral sycophant timorous
    # voracious quixotic serenade conundrum vicarious
    # insipid ornate camaraderie cogent introspection
    # sanguine deleterious impeccable extraneous loquacious

    BEGIN { print "\nisograms...\n" }

    function hilite(str) { return "\033[7m" str "\033[0m" }

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x <= y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c) > 0) return 0 # !isogram
    }
    return 1 # isogram
    }

    {
    word = ""
    line = ""
    for (x = 1; x <= length($0); x++) {
    c = substr($0, x, 1)
    if (c ~ /[[:space:]]/ || x == length($0)) {
    if (x == length($0) && c !~ /[[:space:]]/) word = word c
    line = (isogram(word) ? line hilite(word) : line word)
    if(c ~ /[[:space:]]/) line = line c
    word = ""
    } else {
    word = word c
    }
    }
    print line
    }

    # eof
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Wed Oct 11 10:15:37 2023
    From Newsgroup: comp.lang.awk

    A quick glimpse at your code gives the impression that you
    are parsing the line character-wise to identify "words".
    In Awk it is usually better to use the inherent splitting
    procedure and operate on $1, $2, etc. Even for cases where
    punctuation and other characters may get into your way you
    can just define the FS regular expression so that it fits
    your needs. That should make your program much simpler and
    also easier to understand and maintain.

    Janis


    On 11.10.2023 02:51, Mike Sanders wrote:
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/isogram.txt
    #
    # awk script that displays isograms using inverse ANSI
    # escapes (meaning fore & background colors are swapped)
    # requires an ANSI capable terminal, rename this file
    # and invoke script as:
    #
    # awk -f isogram.awk file
    #
    # isogram test block...
    #
    # aberration lucrative concurrent espouse obfuscate
    # garrulous promenade epiphany requiem juxtapose
    # languid ephemeral abscond extricate circumvent
    # obstinate vivacious corroborate attenuate paragon
    # penchant serendipity superfluous immutable mitigate
    # aplomb concatenate ethereal diaphanous demagogue
    # cogitate pervasive anathema juxtaposition memento
    # disparate oscillate ennui perfunctory parabola
    # mellifluous recumbent ephemeral sycophant timorous
    # voracious quixotic serenade conundrum vicarious
    # insipid ornate camaraderie cogent introspection
    # sanguine deleterious impeccable extraneous loquacious

    BEGIN { print "\nisograms...\n" }

    function hilite(str) { return "\033[7m" str "\033[0m" }

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x <= y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c) > 0) return 0 # !isogram
    }
    return 1 # isogram
    }

    {
    word = ""
    line = ""
    for (x = 1; x <= length($0); x++) {
    c = substr($0, x, 1)
    if (c ~ /[[:space:]]/ || x == length($0)) {
    if (x == length($0) && c !~ /[[:space:]]/) word = word c
    line = (isogram(word) ? line hilite(word) : line word)
    if(c ~ /[[:space:]]/) line = line c
    word = ""
    } else {
    word = word c
    }
    }
    print line
    }

    # eof


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Wed Oct 11 10:12:13 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    A quick glimpse at your code gives the impression that you
    are parsing the line character-wise to identify "words".
    In Awk it is usually better to use the inherent splitting
    procedure and operate on $1, $2, etc. Even for cases where
    punctuation and other characters may get into your way you
    can just define the FS regular expression so that it fits
    your needs. That should make your program much simpler and
    also easier to understand and maintain.

    Hi Janis.

    Sure enough, you're 100% correct on this in my thinking.
    In fact, I'm working now on a variant that does use $1, $2,
    etc... One issue I'm groping to understand is how *not* to
    destroy the layout of a given file upon output. In other
    words, I want the output equal to the input with only
    difference being that isograms are inverse color. The only
    way I've worked through, so far at least, is to not assume
    any file structure other than words...

    Its an interesting problem to think about =)
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Wed Oct 11 18:41:51 2023
    From Newsgroup: comp.lang.awk

    On 11.10.2023 12:12, Mike Sanders wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    A quick glimpse at your code gives the impression that you
    are parsing the line character-wise to identify "words".
    In Awk it is usually better to use the inherent splitting
    procedure and operate on $1, $2, etc. Even for cases where
    punctuation and other characters may get into your way you
    can just define the FS regular expression so that it fits
    your needs. That should make your program much simpler and
    also easier to understand and maintain.

    Hi Janis.

    Sure enough, you're 100% correct on this in my thinking.
    In fact, I'm working now on a variant that does use $1, $2,
    etc... One issue I'm groping to understand is how *not* to
    destroy the layout of a given file upon output. In other
    words, I want the output equal to the input with only
    difference being that isograms are inverse color. The only
    way I've worked through, so far at least, is to not assume
    any file structure other than words...

    Its an interesting problem to think about =)

    Yes. A solution may also depend on the Awk version you are
    allowed to use. With GNU Awk you can preserve the formatting
    by using its newer features (array of separators!).

    And with standard Awk you can work on patterns and preserve
    formatting e.g. with a frame like

    function predicate (s) { ...here's your isogram function... }

    function escape (s) { return predicate(s) ? "E" s "E" : s }
    # replace the two "E" by your ANSI escape code strings

    {
    out = ""
    for (line=$0; match(line, /[[:alpha:]]+/); line=substr(line,RSTART+RLENGTH)) {
    out = out substr(line,1,RSTART-1)
    escape(substr(line,RSTART,RLENGTH))
    }
    out = out line
    print out
    }

    This code specifies the 'alpha' words as entities to consider;
    change as desired. (I saw that your code also highlights a '#'
    for example; not sure this is intended, though.)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Wed Oct 11 18:47:17 2023
    From Newsgroup: comp.lang.awk

    On 11.10.2023 18:41, Janis Papanagnou wrote:
    [...]

    {
    out = ""
    for (line=$0; match(line, /[[:alpha:]]+/); line=substr(line,RSTART+RLENGTH)) {
    out = out substr(line,1,RSTART-1)
    escape(substr(line,RSTART,RLENGTH))
    }
    out = out line
    print out
    }

    [...]

    (Sorry, my newsreader splitted two of the long lines.)
    The two lines in above code starting at column 1 shall
    be on one line:

    for (line=$0; match(...); line=substr(...)) {

    out = out substr(...) escape(substr(...))


    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Thu Oct 12 18:54:30 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    Yes. A solution may also depend on the Awk version you are
    allowed to use. With GNU Awk you can preserve the formatting
    by using its newer features (array of separators!).

    Very much for me, I cant always install things I'd like to,
    but its okay, I'll work it out =)

    function predicate (s) { ...here's your isogram function... }

    Excellent name for a function.

    function escape (s) { return predicate(s) ? "E" s "E" : s }
    # replace the two "E" by your ANSI escape code strings

    {
    out = ""
    for (line=$0; match(line, /[[:alpha:]]+/); line=substr(line,RSTART+RLENGTH)) {
    out = out substr(line,1,RSTART-1)
    escape(substr(line,RSTART,RLENGTH))
    }
    out = out line
    print out
    }

    This code specifies the 'alpha' words as entities to consider;
    change as desired. (I saw that your code also highlights a '#'
    for example; not sure this is intended, though.)

    A single char... isogram or not? Probably not really, and then
    there's the case of 'mixed' strings as your snippet deals with,
    'abc-321'.

    But back to the single character issue I'm going with:

    function isogram(str, c, x, y) {
    y = length(str)
    if (y < 2) return 0 # !isogram <-- added this

    for (x = 1; x <= y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c) > 0) return 0 # !isogram
    }
    return 1 # isogram
    }

    Thanks for your input Janis.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Thu Oct 12 22:33:25 2023
    From Newsgroup: comp.lang.awk

    Mike Sanders <porkchop@invalid.foo> wrote:

    # requires an ANSI capable terminal...

    for your notes:

    tags: ANSI, escapes, colors, code

    invert fore/background color: "\033[7m" str "\033[0m"

    clear screen: "\033[H\033[2J"

    hide cursor: "\033[?25l"

    show cursor: "\033[?25h"

    output to row & column: "\033[<ROW>;<COLUMN>H"

    set titlebar (for terminals that support it): \033]0;Your Title Here\007

    reset colors: "\033[0m"

    foreground colors...

    "\033[30mBlack"
    "\033[31mRed"
    "\033[32mGreen"
    "\033[33mYellow"
    "\033[34mBlue"
    "\033[35mMagenta"
    "\033[36mCyan"
    "\033[37mWhite"

    background colors...

    "\033[40mBlack"
    "\033[41mRed"
    "\033[42mGreen"
    "\033[43mYellow"
    "\033[44mBlue"
    "\033[45mMagenta"
    "\033[46mCyan"
    "\033[47mWhite"
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Oct 13 09:22:53 2023
    From Newsgroup: comp.lang.awk

    On 12.10.2023 20:54, Mike Sanders wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    function predicate (s) { ...here's your isogram function... }

    Excellent name for a function.

    It's meant as generic name for the code pattern I wanted to show.

    [...]

    This code specifies the 'alpha' words as entities to consider;
    change as desired. (I saw that your code also highlights a '#'
    for example; not sure this is intended, though.)

    A single char... isogram or not? Probably not really, and then
    there's the case of 'mixed' strings as your snippet deals with,
    'abc-321'.

    Oh, my question was more whether a non-alpha character shall be
    considered a possible isogram.


    But back to the single character issue I'm going with:

    function isogram(str, c, x, y) {
    y = length(str)
    if (y < 2) return 0 # !isogram <-- added this
    [...]

    That's why I also think a pattern based approach has advantages.

    Janis


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Oct 14 00:34:34 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    That's why I also think a pattern based approach has advantages.

    Yes, anything the human mind can conceive is valid.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Oct 27 16:14:03 2023
    From Newsgroup: comp.lang.awk

    if you want an ultra quick ANSI color chart :
    jot - 16 231 | mawk ' BEGIN { print (ORS = _)
    . . . . . . . . . . . . . . . . . . . . . _ *= __ = RS RS
    } $NF = sprintf("\33[38;5;%dm%3d%.*s%.*s",
    . . . . . . . . . . . . . . . . . . . $_, $_, NR % 6^2 == _, RS,
    . . . . . . . . . . . . . . . . . . . . . . . . NR % 6^3 == _, __)'
    the result is a VERY wide table spanning 6 rows, but that's the only way I could get the colors to properly line up with each other without complicated math.
    — The 4Chan Teller
    --- Synchronet 3.20a-Linux NewsLink 1.114