• Generic transformations of arbitrary data entities

    From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Oct 12 14:04:56 2023
    From Newsgroup: comp.lang.awk

    In a recent thread I posted an Awk code pattern to define words that
    match a pattern and conditionally transforms it; it just relied on
    POSIX Awk features. Actually, though, it's a generally usable code
    pattern. With standard Awk you can substitute the entity pattern and
    function to transform the defined data entities as necessary.

    GNU Awk supports a couple newer features to make that generalization
    more explicit, by use of first class patterns and indirect functions.


    # generic function to transform specified data entities
    function trent (line, pattern, transform, out)
    {
    for (line=$0; match(line, pattern);
    line=substr(line, RSTART+RLENGTH))
    {
    out = out substr(line, 1, RSTART-1) \
    @transform(substr(line, RSTART, RLENGTH))
    }
    out = out line
    return out
    }

    With a transformation function like

    function highlight (str)
    {
    return "\033[7m" str "\033[0m"
    }

    a sample usage can be

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    }


    Applied to the task from the other thread you can provide

    function isogram_highlight (str)
    {
    return (isogram(str) ? "\033[7m" str "\033[0m" : str)
    }

    using Mike's (only slightly changed by me) isogram() algorithm

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x < y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c)) return 0
    }
    return 1
    }

    in a context like

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    }


    Note again that this solution based on a generalized algorithm
    uses GNU Awk specific features and is not conforming to POSIX!

    Janis
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Oct 12 18:23:58 2023
    From Newsgroup: comp.lang.awk

    On 12.10.2023 14:04, Janis Papanagnou wrote:
    In a recent thread I posted an Awk code pattern to define words that
    match a pattern and conditionally transforms it; it just relied on
    POSIX Awk features. Actually, though, it's a generally usable code
    pattern. With standard Awk you can substitute the entity pattern and
    function to transform the defined data entities as necessary.

    GNU Awk supports a couple newer features to make that generalization
    more explicit, by use of first class patterns and indirect functions.


    # generic function to transform specified data entities
    function trent (line, pattern, transform, out)
    {
    for (line=$0; match(line, pattern);

    The line=$0 assignment was a remains from an earlier version. Here
    you don't want it, since 'line' is passed as a function parameter.
    So make that just

    for ( ; match(line, pattern);

    line=substr(line, RSTART+RLENGTH))
    {
    out = out substr(line, 1, RSTART-1) \
    @transform(substr(line, RSTART, RLENGTH))
    }
    out = out line
    return out
    }

    With a transformation function like

    function highlight (str)
    {
    return "\033[7m" str "\033[0m"
    }

    a sample usage can be

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    }


    Applied to the task from the other thread you can provide

    function isogram_highlight (str)
    {
    return (isogram(str) ? "\033[7m" str "\033[0m" : str)
    }

    using Mike's (only slightly changed by me) isogram() algorithm

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x < y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c)) return 0
    }
    return 1
    }

    in a context like

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    }


    Note again that this solution based on a generalized algorithm
    uses GNU Awk specific features and is not conforming to POSIX!

    Janis


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Thu Oct 12 19:00:13 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    In a recent thread I posted an Awk code pattern to define words that
    match a pattern and conditionally transforms it; it just relied on
    POSIX Awk features. Actually, though, it's a generally usable code
    pattern. With standard Awk you can substitute the entity pattern and
    function to transform the defined data entities as necessary.

    GNU Awk supports a couple newer features to make that generalization
    more explicit, by use of first class patterns and indirect functions.


    # generic function to transform specified data entities
    function trent (line, pattern, transform, out)
    {
    for (line=$0; match(line, pattern);
    line=substr(line, RSTART+RLENGTH))
    {
    out = out substr(line, 1, RSTART-1) \
    @transform(substr(line, RSTART, RLENGTH))
    }
    out = out line
    return out
    }

    With a transformation function like

    function highlight (str)
    {
    return "\033[7m" str "\033[0m"
    }

    a sample usage can be

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    }


    Applied to the task from the other thread you can provide

    function isogram_highlight (str)
    {
    return (isogram(str) ? "\033[7m" str "\033[0m" : str)
    }

    using Mike's (only slightly changed by me) isogram() algorithm

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x < y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c)) return 0
    }
    return 1
    }

    in a context like

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    }


    Note again that this solution based on a generalized algorithm
    uses GNU Awk specific features and is not conforming to POSIX!

    Janis

    Good stuff. Adding this to my notes in fact. I really was hoping
    others would see some value in using hilite(). Its handy on my end too.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Oct 13 09:33:05 2023
    From Newsgroup: comp.lang.awk

    On 12.10.2023 21:00, Mike Sanders wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    [...]

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    }


    Note again that this solution based on a generalized algorithm
    uses GNU Awk specific features and is not conforming to POSIX!

    Good stuff. Adding this to my notes in fact. I really was hoping
    others would see some value in using hilite(). Its handy on my end too.

    I'm using ANSI escaped from time to time, and also just recently,
    e.g. for coloring.

    But my point here was more the generalization. The task to change
    some entities on a line while preserving the spacing, delimiters,
    and other information is quite common. I used it a couple times
    and always reprogrammed the two-lines loop with different pattern
    for different transformations. That's why I think that GNU Awk's
    features - too sad you cannot use them! - are valuable; they can
    emulate quite nicely what other languages do with real function
    arguments.

    I expanded my test program[*] with some more simple applications
    that lead to

    BEGIN {
    ...
    words = @/[[:alpha:]]+/
    numbers = @/[[:digit:]]+/
    names = @/([[:upper:]][.])*[[:upper:]][[:lower:]]*/
    }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    print trent($0, numbers, "black_out")
    print trent($0, names, "black_out")
    print trent($0, names, "anonymize")
    }

    Just to demonstrate the point by possible combinations of patterns
    (that can of course be simply refined) and functions (identified
    by their names).

    Janis

    [*] Extended test program: volatile.gridbug.de/transform_words

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Sat Oct 14 00:39:14 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    I'm using ANSI escaped from time to time, and also just recently,
    e.g. for coloring.

    Myself as well...

    <https://drive.google.com/file/d/1tf_X3U3TwJQz67z3gdFBSZo2oKW2vcao/view>

    But my point here was more the generalization. The task to change
    some entities on a line while preserving the spacing, delimiters,
    and other information is quite common. I used it a couple times
    and always reprogrammed the two-lines loop with different pattern
    for different transformations. That's why I think that GNU Awk's
    features - too sad you cannot use them! - are valuable; they can
    emulate quite nicely what other languages do with real function
    arguments.

    I hope too soon =) Yet a while longer I can't.

    I expanded my test program[*] with some more simple applications
    that lead to

    BEGIN {
    ...
    words = @/[[:alpha:]]+/
    numbers = @/[[:digit:]]+/
    names = @/([[:upper:]][.])*[[:upper:]][[:lower:]]*/
    }

    That is so cool!

    [*] Extended test program: volatile.gridbug.de/transform_words

    Will you have an index page of your projects/snippets
    in the future Janis?
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sat Oct 14 13:17:42 2023
    From Newsgroup: comp.lang.awk

    On 14.10.2023 02:39, Mike Sanders wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    [*] Extended test program: volatile.gridbug.de/transform_words

    Will you have an index page of your projects/snippets
    in the future Janis?

    Unfortunately(?), no. - I've never[*] started to systematically publish
    any code (and I don't intend to do so). My approach was discussions in
    Usenet, sharing knowledge, and code only on demand or where it supports
    the shared and discussed topics. There's also too much stuff that got accumulated over the decades; it would require quite some effort to
    provide that in a form of sufficient quality. My view was that anything
    useful that I posted could eventually be retrieved using some search engine[**]. The ideas (those that are worth it) and insights can still
    spread (or become forgotten). For me it's "Open Ideas", something like
    Open Source for non-code contributions. Occasionally I drop some code
    on grigbug.de ('volatile' for stuff I might delete, 'random' for stuff
    that might stay available), but that's just a small fraction of the
    stuff I have on my disks. These two sub-domains have thus no index
    page[***] and bound to a post (or an email), but previously in Usenet
    posted links might still have the information.

    For the intention of my previous post the code for the sample functions
    were unnecessary, but I wanted to provide them as "amendment" for folks
    who want to see some complete and runnable code.

    Feel free to ask if you need something specific.

    Janis

    [*] "never" = only rarely, or only in specific cases.

    [**] Sadly whenever I now try to find some older stuff I often cannot
    find it any more (using Google).

    [***] Other sub-domains for specific topics do an organized form with
    an index.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 16 18:49:23 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    Unfortunately(?), no...

    No? I say 'yes'. Much to read/learn...

    Me? I think I will in fact. Index by the end of the week
    and lots of interesting (at least to me) items on the way.

    You only live once Janis, I hope someday you'll reconsider
    for the benefit of others =)
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Oct 16 21:38:40 2023
    From Newsgroup: comp.lang.awk

    On 16.10.2023 20:49, Mike Sanders wrote:

    You only live once Janis, I hope someday you'll reconsider
    for the benefit of others =)

    For the benefit of others, spread the word... - with or without
    an index. :-)

    I promise I will reconsider it in my next life! ;-)

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Oct 27 12:37:56 2023
    From Newsgroup: comp.lang.awk

    On Thursday, October 12, 2023 at 8:05:01 AM UTC-4, Janis Papanagnou wrote:
    In a recent thread I posted an Awk code pattern to define words that
    match a pattern and conditionally transforms it; it just relied on
    POSIX Awk features. Actually, though, it's a generally usable code
    pattern. With standard Awk you can substitute the entity pattern and function to transform the defined data entities as necessary.

    GNU Awk supports a couple newer features to make that generalization
    more explicit, by use of first class patterns and indirect functions.


    # generic function to transform specified data entities
    function trent (line, pattern, transform, out)
    {
    for (line=$0; match(line, pattern);
    line=substr(line, RSTART+RLENGTH))
    {
    out = out substr(line, 1, RSTART-1) \
    @transform(substr(line, RSTART, RLENGTH))
    }
    out = out line
    return out
    }

    With a transformation function like

    function highlight (str)
    {
    return "\033[7m" str "\033[0m"
    }

    a sample usage can be

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    }


    Applied to the task from the other thread you can provide

    function isogram_highlight (str)
    {
    return (isogram(str) ? "\033[7m" str "\033[0m" : str)
    }

    using Mike's (only slightly changed by me) isogram() algorithm

    function isogram(str, c, x, y) {
    y = length(str)
    for (x = 1; x < y; x++) {
    c = substr(str, x, 1)
    if (index(substr(str, x + 1), c)) return 0
    }
    return 1
    }

    in a context like

    BEGIN { words = @/[[:alpha:]]+/ }
    {
    print trent($0, words, "highlight")
    print trent($0, words, "isogram_highlight")
    }


    Note again that this solution based on a generalized algorithm
    uses GNU Awk specific features and is not conforming to POSIX!

    Janis
    hmm ….. a heterogram is when # unique chars == string length, but isogram technically just means all chars within it show up at the same frequency -
    i.e. "DODO" is an isogram, but the function above results a FALSE (0). The code below should rectify the test case differences. The updated function adds 2 rapid exit criteria based on whether (a) input string is empty or only 1 character long, or (b) whether # of copies of left most character isn't an integer multiple of the total input length. From there on, the freq counts returned by each subsequent gsub(…) must match that of the left-most char.
    . . 1 .FRR . . . . . 0 .}:orig | new:{ .0
    . . 2 .DODO . . . . .0 .}:orig | new:{ .1 .<-----
    . . 3 .ECBFADEDCFAB .0 .}:orig | new:{ .1 .<-----
    . . 4 .KWNAWKAN . . .0 .}:orig | new:{ .1 .<-----
    . . 5 .BAIDU . . . . 1 .}:orig | new:{ .1
    . . 6 .BLACKHORSE . .1 .}:orig | new:{ .1
    . . 7 .DUBAI . . . . 1 .}:orig | new:{ .1
    . . 8 .DUMBWAITER . .1 .}:orig | new:{ .1
    . . 9 .ISOGRAM . . . 1 .}:orig | new:{ .1
    . .10 .PATHFINDER . .1 .}:orig | new:{ .1 ======================================
    function isogram_new(__, _, ___) {
    . .
    . . if ( ! ((_ = (___ = length(__)) <= !!___) ||
    . . . . . ___ % (___ = gsub(substr(__, ++_, _--), "", __))))
    . .
    . . . . for (_++; __; )
    . . . . . . ___ == gsub(substr(__, _, _), "", __) || _ *= __ = ""
    . . return _
    }
    — The 4Chan Teller
    --- Synchronet 3.20a-Linux NewsLink 1.114