• regsub replacement question

    From aotto1968@aotto1968@t-online.de to comp.lang.tcl on Fri Mar 22 08:29:20 2024
    From Newsgroup: comp.lang.tcl

    Hi,

    # I have a question regarding *regsub* and how to accelerate replacement
    # let's assume the following code:

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    # My goal is to eliminate the all "123" except the FIRST one with the restriction
    # that between the "123" is *not* a number other then 123
    puts [regsub -all {(\d+)([^\d]*)\1} $str {\1\2}]
    aaa123bbbccc123dddeee123fffggg

    # → my problem is that always the SECOND "111" is replaced because the replacement itself is *not*
    # checked again.

    # my solution is a loop
    while {[regsub -all {(\d+)([^\d]*)\1} $str {\1\2} str]} ""

    # this works but the GOAL is to have ONE *regsub* to get this job done
    puts $str
    aaa123bbbcccdddeeefffggg


    mfg ao
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Andreas Leitgeb@avl@logic.at to comp.lang.tcl on Fri Mar 22 08:04:48 2024
    From Newsgroup: comp.lang.tcl

    aotto1968 <aotto1968@t-online.de> wrote:
    # I have a question regarding *regsub* and how to accelerate replacement
    # let's assume the following code:
    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    # My goal is to eliminate the all "123" except the FIRST one with the restriction
    # that between the "123" is *not* a number other then 123
    puts [regsub -all {(\d+)([^\d]*)\1} $str {\1\2}]
    aaa123bbbccc123dddeee123fffggg

    # → my problem is that always the SECOND "111" is replaced because the replacement itself is *not*
    # checked again.

    That is correct, after first substitution of "123bbb123" to "123bbb",
    then in the remainder it doesn't see the "ccc" wrapped in "123"s, so
    cannot eliminate the trailing "123" for "ccc"

    # my solution is a loop
    while {[regsub -all {(\d+)([^\d]*)\1} $str {\1\2} str]} ""

    I think this is the way to go, but you might experiment with
    removing the "-all" option... Maybe it improves speed, or
    maybe it spoils it, I can't predict.

    # this works but the GOAL is to have ONE *regsub* to get this job done
    puts $str
    aaa123bbbcccdddeeefffggg

    Another approach could be to extract the non-"123"s as a list
    with regexp (not regsub), and then just re-insert the number:

    set num [regexp -inline {\d+} $str];# get the separating number
    set list [regexp -inline -all {\D+} $str] ;# \D is like [^\d]
    puts [join [linsert $list 1 $num] ""]

    (unless you also need to deal with aaa123bbb123ccc456ddd456 where
    the first 456 also needs to stay...)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Fri Mar 22 12:05:35 2024
    From Newsgroup: comp.lang.tcl

    * aotto1968 <aotto1968@t-online.de>
    | set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    | # My goal is to eliminate the all "123" except the FIRST one with the
    | # restriction that between the "123" is *not* a number other then 123

    If that *really* is the goal, I would simply search for the first "123"
    and then [string map] the rest of them to "":

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"
    set res [string range $str 0 [string first 123 $str]+2]
    append res [string map {123 ""} [string range $str [string first 123 $str ]+3 end]]

    I do not completely understand the second part of the restriction, though...

    R'
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From aotto1968@aotto1968@t-online.de to comp.lang.tcl on Sat Mar 23 10:00:16 2024
    From Newsgroup: comp.lang.tcl

    On 22.03.24 12:05, Ralf Fassel wrote:
    * aotto1968 <aotto1968@t-online.de>
    | set str "aaa123bbb123ccc123ddd123eee123fff123ggg"

    | # My goal is to eliminate the all "123" except the FIRST one with the
    | # restriction that between the "123" is *not* a number other then 123

    If that *really* is the goal, I would simply search for the first "123"
    and then [string map] the rest of them to "":

    set str "aaa123bbb123ccc123ddd123eee123fff123ggg"
    set res [string range $str 0 [string first 123 $str]+2]
    append res [string map {123 ""} [string range $str [string first 123 $str ]+3 end]]

    I do not completely understand the second part of the restriction, though...

    R'

    as always the real-problem is much more complicated as the easy example above.

    My try is to get a solution without recall the *regsub* multiple times. To achieve this
    the *regsub* has to re-scan the substitution as part of the "-all" switch. The *regsub* has with the '-start' switch already implemented the ability to get this done.

    The "123" is just an example because this question is a "followup" of the:
    https://wiki.tcl-lang.org/page/BUG+%2D+%27string+length%27+count+also+NON+visible+chars
    problem.

    the "123" is in real a the regular-expression:

    \u001b\[[0-9;]*m

    and the

    regsub -all {(\d+)([^\d]*)\1} $str {\1\2}

    is in real:

    # erase CTRL->CTRL doublets
    while {[regsub -all {(\u001b\[[0-9;]*m)([^\u001b]*)\1} $STR {\1\2} STR]} {
    #puts fire0
    }

    .

    The CORE problem of the

    while {[regsub -all ...]} ""

    is that every loop the entire STR is processed and *not* just the part starting with the *last*
    substitution.
    --- Synchronet 3.20a-Linux NewsLink 1.114