• Re: push for memory safe languages -- impact on Forth

    From dxf@dxforth@gmail.com to comp.lang.forth on Mon Mar 11 16:26:11 2024
    From Newsgroup: comp.lang.forth

    On 11/03/2024 2:37 am, Hans Bezemer wrote:
    On 10-03-2024 10:56, Paul Rubin wrote:
    ...
    That is, C and other such languages have null pointers because they
    corresponded so conveniently to machine operations that the language
    designers couldn't resist including them.  Java-style wraparound
    arithmetic is more of the same.  A bug magnet, but irresistibly
    convenient for the implementers because of its isomorphism to machine
    arithmetic.

    That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

    It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

    At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Mon Mar 11 11:15:56 2024
    From Newsgroup: comp.lang.forth

    In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    It might be worse for RISC V.

    It is. That's a failure of RISC-V.

    As far as I can tell it was a design choice for DEC Alpha and RISC-V. Apparently flags are detrimental to parallelism.

    You can't call that a failure because you don't like it.


    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 17:40:20 2024
    From Newsgroup: comp.lang.forth

    albert@spenarnc.xs4all.nl writes:
    In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    It might be worse for RISC V.

    It is. That's a failure of RISC-V.

    As far as I can tell it was a design choice for DEC Alpha and RISC-V.

    And MIPS.

    Apparently flags are detrimental to parallelism.

    Reality check: No MIPS, Alpha, or RISC-V ever has had as much
    instruction-level parallelism as contemporaneous CPUs for
    architectures with flags, so flags are obviously not detrimental to instruction-level parallelism.

    Look at
    <http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps>: The
    dashed orange line near the bottom is U74, a RISC-V implementation.
    The other lines are all for CPU cores with flags.

    If you want to do several parallel multi-precision additions, say, if
    you want a multi-precision addition a+b+c+d, having one (ARM A64) or
    two (AMD64 with ADX) carry flags does indeed limit the parallelism,
    but the MIPS/Alpha/RISC-V answer is to replace one ADCX/ADOX
    instruction (one cycle latency) with five instructions with typically
    three cycles of latency.

    On AMD64 with ADX, a 6400-bit addition of a+b+c+d can be split into
    two chains: t=a+b+c and t+d; this has a total latency of about 200
    cycles (actually OoO execution can reduce this somewhat by overlapping
    the two chains to a certain extent), while the MIPS/Alpha/RISC-V
    approach takes 300 cycles of latency with no chance of additional
    overlap within that computation.

    You will need >6 parallel multi-precision additions before the two
    carry flags of AMD64 with ADX are theoretically more limiting than the MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
    RISC-V implementation needs to be extremely wide (>36 instructions per
    cycle) and the precision must be extremely high (to eliminate overlap
    between chains as an issue).

    You can't call that a failure because you don't like it.

    The correct english term is that it's the *fault* of RISC-V. They
    took a deliberate decision to need more instructions for implementing
    overflow checks than other architectures, so it's their
    responsibility, and for those who want to use big integers (or who
    want to trap on signed overflow), their fault.

    For an alternative to the RISC-V approach that is not as limiting as
    the ARM A64 and AMD64 approaches, read:

    http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf

    (not published yet)

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Mar 11 18:50:36 2024
    From Newsgroup: comp.lang.forth

    No / not yet?
    "The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 20:51:46 2024
    From Newsgroup: comp.lang.forth

    mhx@iae.nl (mhx) writes:
    No / not yet?
    "The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

    Works for me:

    wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
    --2024-03-11 21:49:20-- http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
    Resolving www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)... 128.130.173.64
    Connecting to www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)|128.130.173.64|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 2255987 (2.2M) [application/postscript]
    Saving to: ‘opt-ipc-uarch.eps’

    opt-ipc-uarch.eps 100%[===================>] 2.15M 8.38MB/s in 0.3s

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 20:53:54 2024
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    You will need >6 parallel multi-precision additions before the two
    carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
    RISC-V implementation needs to be extremely wide (>36 instructions per
    cycle) and the precision must be extremely high (to eliminate overlap
    between chains as an issue).

    Correction: For performing >6 parallel multi-precision additions at a
    rate of >6 steps every 3 cycles, >36 instructions are needed only
    every 3 cycles with the MIPS/Alpha/RISC-V approach, i.e. >12 instructions/cycle.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 21:08:43 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    2+2=5 is also deterministic yet wrong.
    In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

    2+2=5 is obviously wrong and Java doesn't go quite that far. Java
    instead insists that you can add two positive integers and get a
    negative one. That's wrong the same way that 2+2=5 is.

    Not at all. Modular arithmetic is not arithmetic in Z, but it's a
    commutative ring and has the nice properties of this algebraic
    structure.

    It just doesn't
    mess up actual programs as often, because the numbers involved are
    bigger.

    If you use members of that ring as if they were members of Z, you will sometimes get an unintended result; but even that works surprisingly
    well, so well that the RISC-V designers have not seen a need to
    include an efficient way to detect those cases where the result
    deviates from that in Z. Still, the nice algebraic properties of
    modular arithmetic can be of benefit even in such cases:

    9223372036854775807 1 + dup cr . 2 - cr .

    prints

    -9223372036854775808
    9223372036854775806 ok

    in Gforth on a 64-bit machine.

    In what world can it be right for n to be a positive integer and n+1 to
    be a negative integer? That's not how integers work.

    It's how Java's int and long types work. And if you want something
    closer to Z, Java also has BigInteger.

    Tony Hoare in 2009 said about null pointers:

    And the relevance is?

    Java-style wraparound
    arithmetic is more of the same. A bug magnet,

    Unsupported claim. Interestingly, I remember only one case where I
    saw an unintended result due to modular arithmetic in a programming
    language. It happened when I computed with performance counter
    results in bash. bash still works that way:

    [~:147654] A=9223372036854775807
    [~:147655] echo $[A+1]
    -9223372036854775808

    I think I saw the unintended result on a 32-bit machine, because
    performance counter results typically do not exceed 2^48, definitely
    not 2^63-1.

    Java also has null pointers, another possible mistake. Ada doesn't have >them,

    Ada certainly has null.

    C++ has them because of its C heritage and
    the need to support legacy code, but I believe that in "modern" C++
    style you're supposed to use references instead of pointers, so you
    can't have a null or uninitialized one.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    Null pointers are at least a little bit more on-topic in this thread
    than integer overflow. In Java one can write, say, a linked list or a
    tree in an object-oriented manner, with, e.g., a tree node being an
    abstract class that has two concrete subclasses: inner node, and empty
    node. No null pointers in sight, right? Wrong: When an inner node is
    created, the constructor of the node first sees a data structure where
    all bytes have been initialized to 0, in order to guarantee memory
    safety; for the references to the child nodes, this means that at that
    point they are null pointers. Only then can the Java code in the
    constructor overwrite them with whatever proper value they get. Is it
    a problem? Not if they only exist there.

    The fact that Java idiomatics is to implement trees and linked lists
    not in the object-oriented way I outlined above, but in an imperative
    way with null pointers instead of empty nodes could be more
    problematic, but is it a major problem? Not in my (limited)
    experience.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Mar 12 10:44:01 2024
    From Newsgroup: comp.lang.forth

    On 11/03/2024 4:26 pm, dxf wrote:
    On 11/03/2024 2:37 am, Hans Bezemer wrote:
    On 10-03-2024 10:56, Paul Rubin wrote:
    ...
    That is, C and other such languages have null pointers because they
    corresponded so conveniently to machine operations that the language
    designers couldn't resist including them.  Java-style wraparound
    arithmetic is more of the same.  A bug magnet, but irresistibly
    convenient for the implementers because of its isomorphism to machine
    arithmetic.

    That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

    It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

    At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

    I examined my application for ?DO. Of the eleven instances found only
    one was justified. Similarly where I had written - 0 MAX the adjustment proved redundant.

    "The oldest and strongest emotion of mankind is fear, and the oldest
    and strongest kind of fear is fear of the unknown." - H.P. Lovecraft

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Mar 11 19:20:08 2024
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps

    It worked for me too (in a browser). The Golden Cove figures are
    impressive. I believe there are some RISC-V implementations with OOO by
    now though.

    The article about carry bits is interesting though besides bignums, one
    should also consider the cost of (desirable) routine overflow trapping
    of integer arithmetic which is currently not done much. Maybe
    benchmarking C programs compiled with and without -ftrapv would be a
    useful addition.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Mar 11 20:07:01 2024
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
    structure.

    Right, those modular values aren't integers, they are equivalence
    classes of integers. The ring Z/NZ might have some nice properties
    but they aren't the properties of integers.

    but even that works surprisingly well, so well that the RISC-V
    designers have not seen a need to include an efficient way to detect
    those cases where the result deviates from that in Z.

    Sure, C worked pretty well in the 1980s but we've seen how well that
    worked out. RISC-V perpetuates the bugs of the 1980s instead of taking
    the opportunity to fix them.

    Still, the nice algebraic properties of modular arithmetic can be of
    benefit even in such cases.... 64 bit machine

    Another thing, if I run the same integer calculation on two machines, at
    least programmed in a HLL, I should expect the same result on both. But
    if the word sizes are different then the results will be different. (If
    one or both crash due to implementation restrictions such as machine
    overflow, that's annoying, but it's better than getting wrong answers).

    In what world can it be right for n to be a positive integer and n+1 to
    be a negative integer? That's not how integers work.
    It's how Java's int and long types work.

    Yes, that's a mistake. I just don't see how it can be anything else.
    2+2=5 would be obviously wrong, but it's hypothetical, or as you say, a
    straw man. 20+20=50 or 2000+2000=5000 or 200000+200000=500000 would
    also be straw men, since they don't happen either. What about 2000000000+2000000000=-294967296? Java actually does that, it can't be
    called a straw man, so instead I'm supposed to believe that it's a valid result. I just can't.

    And if you want something closer to Z, Java also has BigInteger.

    Those are boxed and expensive for the usual case where the results are
    expected to fit into the machine word. Of course that expectation may
    be wrong (say due to a program bug), but in that case I want the program
    to crash, like it would for an out-of-range subscript.

    Maybe it is a mistake for Java to have an int type like that at all,
    i.e. BigInteger should be the default, like in Python. It was a design
    choice to make machine arithmetic more accessible to gain acceptance by
    some potential users. Guy Steele famously said "We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."
    Java today seems awfully old-fashioned of course.

    .
    Tony Hoare in 2009 said about null pointers:
    And the relevance is?

    Both are instances where adding a "feature" for implementation
    convenience turned out to attract bugs and vulnerabilities.

    Java-style wraparound arithmetic is more of the same. A bug magnet,
    Unsupported claim.

    It's supported by that page linked a few days ago, about overflow bugs
    in real programs.

    I think I saw the unintended result on a 32-bit machine

    I agree that it's less likely to be a problem if the ints are 64 bits.
    And of course it was a frequent occurence in the 16 bit era.

    Note that at least in gcc on x64, ints and longs by default are still 32
    bits. These days when I write C code I tend to use stdint.h and specify
    int sizes explicitly, e.g. int64_t or int32_t rather than int or long or whatever.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    I don't know a way to make an uninitialized reference in C++ but maybe
    it's possible. If you just say "int &y;" you get a compile time error.

    The fact that Java idiomatics is to implement trees and linked lists
    not in the object-oriented way I outlined above

    The OO description is similar to using a sum type, and it's reasonable
    for the implementation under the covers to use a zero pointer to
    represent an empty list. Some Lisp implementations go even further and
    used "cdr coding", which means using a single bit to indicate that the
    next list node is at the next word in memory, so the "next" pointer
    (cdr) can be eliminated. You might allocate the list nodes
    non-consecutively when the list is created, but a compacting GC can
    later make the elements consecutive in memory and get rid of the pointer overhead.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Mar 12 16:25:37 2024
    From Newsgroup: comp.lang.forth

    On 12/03/2024 2:07 pm, Paul Rubin wrote:
    ...
    Another thing, if I run the same integer calculation on two machines, at least programmed in a HLL, I should expect the same result on both. But
    if the word sizes are different then the results will be different. (If
    one or both crash due to implementation restrictions such as machine overflow, that's annoying, but it's better than getting wrong answers).

    Not all customers have the same expectations. If they can find a HLL that suits them, well and good.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Mar 12 09:48:18 2024
    From Newsgroup: comp.lang.forth

    In article <2024Mar11.220843@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    <SNIP?
    Java also has null pointers, another possible mistake. Ada doesn't have >>them,

    Ada certainly has null.

    C++ has them because of its C heritage and
    the need to support legacy code, but I believe that in "modern" C++
    style you're supposed to use references instead of pointers, so you
    can't have a null or uninitialized one.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    I can't see the problem with null pointers. Algol68 had an explicit
    `nil' that serves the same purpose. Any reference is initialized with
    `nil'. If you try to dereference it, meaning trying to fetch or otherwise
    use the referred object this meets with a run time error.
    That is probably the clean and expensive way.
    So nil + reference takes the same place as NULL + pointer in c.

    I try to emulate this in ciforth. Looking up a word in the dictionary
    results in an entry (struct with fields for properties) or a null pointer,
    i.e. zero. You are supposed to test for this case, but if you fail
    you get a "Segmentation fault".
    As far as Forth goes, that is pretty satisfactory security.

    <SNIP>

    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Mar 12 10:13:19 2024
    From Newsgroup: comp.lang.forth

    In article <2024Mar2.090401@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    #include <stdio.h>
    #include <stdlib.h>

    void MaliciousCode() {
    printf("This code is malicious!\n");
    printf("It will not execute normally.\n");
    exit(0);
    }

    void GetInput() {
    char buffer[8];
    gets(buffer);
    // puts(buffer);
    }

    int main() {
    GetInput();
    return 0;
    }
    === end code ===

    It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
    input sanitization).

    Forth does not have an inherently unbounded input word like C's
    gets(). And even typical C environments warn you when you compile
    this code; e.g., when I compile it on Debian 11, I get:

    gcc xxx.c
    |xxx.c: In function ‘GetInput’:
    |xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
    you mean ‘fgets’? [-Wimplicit-function-declaration]
    | 12 | gets(buffer);
    | | ^~~~
    | | fgets
    |/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
    |xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
    should not be used.

    So, they removed gets() from stdio.h, and added a warning to the
    linker. "man gets" tells me:

    |_Never use this function_
    |[...]
    |ISO C11 removes the specification of gets() from the C language, and
    |since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.

    Ironically, in ciforth I implemented (ACCEPT). That has the
    functionality of gets(). However it returns (addr length) and
    identifies a part of the input buffer. So you can never
    overwrite anything, because it doesn't write anything.

    <SNIP>

    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Tue Mar 12 17:42:50 2024
    From Newsgroup: comp.lang.forth

    On 11-03-2024 06:26, dxf wrote:
    On 11/03/2024 2:37 am, Hans Bezemer wrote:
    On 10-03-2024 10:56, Paul Rubin wrote:
    ...
    That is, C and other such languages have null pointers because they
    corresponded so conveniently to machine operations that the language
    designers couldn't resist including them.  Java-style wraparound
    arithmetic is more of the same.  A bug magnet, but irresistibly
    convenient for the implementers because of its isomorphism to machine
    arithmetic.

    That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

    It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

    At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

    Yeeaaah - and NO! In order to make an informed decision you have to know
    in which the loop will be progressing. And in Forth, you don't know
    that. Worse, with a classical "DO" you don't do anything. You just put a
    few items on the return stack. The *real* decision is made by "+LOOP"
    (or "LOOP". "?DO" introduces a *SECOND* word that makes a decision. If I
    had my way, "LOOP" would be dumb - and just jump back, leaving some
    component of "DO" make the ultimate decision (because it can't be a
    single word).

    In a perfect world I'd have a word:
    - That puts *three* parameters on the stack: limit, start and step;
    - That evaluates these three parameters and leaves a flag
    - That takes this flag and skips the loop if zero.

    Let's call the word that initializes these actions "+DO". +DO equals (
    limit index step -- R: limit index step)

    "DO" would become : DO 1 postpone +DO ;

    It would function like a BASIC "FOR" and have just about the same
    behavior - as far as BASIC "FOR" have sane behavior. That's open for discussion ;-)

    Sure it'd overload the return stack even more and affect I, I' and J
    but:

    10 0 -1 +DO (..) LOOP

    Would not run. Neither would:

    -10 0 DO (..) LOOP

    Nor:

    0 0 DO (..) LOOP

    I'd consider that sane behavior.

    Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
    To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

    Note that 4tH behaves different here. It catches most of the exceptional situations:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2
    start: -2 stop: 2 inc: -1 | -2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2
    start: 2 stop: 2 inc: 1 | 2
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2
    start: 0 stop: 0 inc: 0 | 0

    Versus:

    Some of these loop infinitely, and some under/overflow, so for the sake
    of brevity long outputs will be truncated by ....

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
    start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
    start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

    I still don't think 4tH's performance is perfect, but it's a tradeoff
    between compatibility and intuitive behavior.

    Note that 4tH behaves different when performing negative +LOOPs, but
    those are rare IRL.

    Hans Bezemer


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Wed Mar 13 13:53:43 2024
    From Newsgroup: comp.lang.forth

    On 13/03/2024 3:42 am, Hans Bezemer wrote:
    On 11-03-2024 06:26, dxf wrote:
    On 11/03/2024 2:37 am, Hans Bezemer wrote:
    On 10-03-2024 10:56, Paul Rubin wrote:
    ...
    That is, C and other such languages have null pointers because they
    corresponded so conveniently to machine operations that the language
    designers couldn't resist including them.  Java-style wraparound
    arithmetic is more of the same.  A bug magnet, but irresistibly
    convenient for the implementers because of its isomorphism to machine
    arithmetic.

    That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

    It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

    At this point in time there's no way ?DO can be wrested away from forthers. >> They'll point to all the memory errors it has prevented :)

    Yeeaaah - and NO! In order to make an informed decision you have to know in which the loop will be progressing. And in Forth, you don't know that. Worse, with a classical "DO" you don't do anything. You just put a few items on the return stack. The *real* decision is made by "+LOOP" (or "LOOP". "?DO" introduces a *SECOND* word that makes a decision. If I had my way, "LOOP" would be dumb - and just jump back, leaving some component of "DO" make the ultimate decision (because it can't be a single word).

    DO LOOP lies in the category of 'counted loop' (as opposed to 'indefinite loop' e.g.
    BEGIN). The premise behind LOOP having control is counted loops run at least once.
    It was the same for Moore's FOR NEXT. Microprocessors, too, have decrement-and-loop
    instructions. So there's precedence in having counted loops test at the end.

    ?DO was a late addition to forth introduced by ANS and then only as an option. How often is it needed - far less than I've been using it or needed to. In a recent application review I was able to swap all but one ?DO for DO. Count now stands at 14 DO vs. 1 ?DO.

    In a perfect world I'd have a word:
    - That puts *three* parameters on the stack: limit, start and step;
    - That evaluates these three parameters and leaves a flag
    - That takes this flag and skips the loop if zero.

    Let's call the word that initializes these actions "+DO". +DO equals ( limit index step -- R: limit index step)

    "DO" would become : DO 1 postpone +DO ;

    It would function like a BASIC "FOR" and have just about the same behavior - as far as BASIC "FOR" have sane behavior. That's open for discussion ;-)

    Sure it'd overload the return stack even more and affect I, I' and J
    but:

      10 0 -1 +DO (..) LOOP

    Would not run. Neither would:

      -10 0 DO (..) LOOP

    Nor:

      0 0 DO (..) LOOP

    I'd consider that sane behavior.

    ISTM an unnecessary complication of counted loops. AFAIK languages such as
    C don't have counted loops. What they have is indefinite loops whose tests
    are coded according to the datatype (signed, unsigned) and it's on these 'indefinite loops' that folks are comparing Forth's DO LOOP. I see counted
    and indefinite loops as distinct and separate, each having its own uses and strength.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Wed Mar 13 07:20:34 2024
    From Newsgroup: comp.lang.forth

    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    In a perfect world I'd have a word:
    - That puts *three* parameters on the stack: limit, start and step;
    - That evaluates these three parameters and leaves a flag
    - That takes this flag and skips the loop if zero.

    Let's call the word that initializes these actions "+DO". +DO equals (
    limit index step -- R: limit index step)

    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )

    which is paired with LOOP. Both produce the same addresses (if ubytes
    is a multiple of +nstride), but MEM-DO in reverse order.

    One could add a BOUNDS+DO that works like your +DO, but I would first
    have to see if it is needed.

    Concerning the name +DO, this is taken in Gforth since at least
    Gforth-0.2 (1996) for entering a loop only if index<limit (signed
    comparison), without providing a stride.

    Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
    To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

    Note that 4tH behaves different here. It catches most of the exceptional >situations:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2
    start: -2 stop: 2 inc: -1 | -2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2
    start: 2 stop: 2 inc: 1 | 2
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2
    start: 0 stop: 0 inc: 0 | 0

    Versus:

    Some of these loop infinitely, and some under/overflow, so for the sake
    of brevity long outputs will be truncated by ....

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
    start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
    start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

    I still don't think 4tH's performance is perfect, but it's a tradeoff >between compatibility and intuitive behavior.

    You showed the DO version in Forth, which is indeed rather weak for
    the practically occuring index=limit case. For that we have ?DO,
    which shows:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
    start: 2 stop: 2 inc: 1 |
    start: 2 stop: 2 inc: -1 |
    start: 2 stop: 2 inc: 0 |
    start: 0 stop: 0 inc: 0 |

    The 0 +LOOP case (second line) does not occur in practice. I
    recommend to not use

    ?DO ... -1 +LOOP

    because the behaviour of ?DO is not consistent with that of -1 +LOOP
    when index=limit. The rosettacode tests don't show this inconsistency
    clearly, though. Gforth has

    -DO ... 1 -LOOP

    for decrementing in each step by 1, but it seems to me that the
    rosettacode task is intended to use the same counted-loop construct
    for both cases. If you, say, write

    2 -2 +DO ... -1 +LOOP

    You will get the same result as in the third line, but you asked for
    it.

    For the fifth line, if you use

    -2 2 +DO ... 1 +LOOP

    the result is that the loop is not entered.

    Overall, for

    : test-seq ( start stop inc -- )
    cr rot dup ." start: " 2 .r
    rot dup ." stop: " 2 .r
    rot dup ." inc: " 2 .r ." | "
    -rot swap +do i . dup +loop drop ;
    -2 2 1 test-seq
    -2 2 0 test-seq
    -2 2 -1 test-seq
    -2 2 10 test-seq
    2 -2 1 test-seq
    2 2 1 test-seq
    2 2 -1 test-seq
    2 2 0 test-seq
    0 0 0 test-seq

    the output is:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1 ok
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2 ok
    start: -2 stop: 2 inc: 10 | -2 ok
    start: 2 stop: -2 inc: 1 | ok
    start: 2 stop: 2 inc: 1 | ok
    start: 2 stop: 2 inc: -1 | ok
    start: 2 stop: 2 inc: 0 | ok
    start: 0 stop: 0 inc: 0 | ok

    The same as the ?DO variant except for the "start: 2 stop: -2 inc: 1"
    case.

    I don't consider performing one iteration if index=limit good
    behaviour.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 09:24:36 2024
    From Newsgroup: comp.lang.forth

    Anton Ertl wrote:
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )

    which is paired with LOOP. Both produce the same addresses (if ubytes
    is a multiple of +nstride), but MEM-DO in reverse order.

    A very handy addition when working with arrays. I use similar words

    .. NEXT and <FOR .. NEXT \ index N for 1-dim vectors

    .. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.

    Recently I also added a "runtime control flow stack" to my system to hold
    loop indices. I just hated UNLOOP et al too much. ;-)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Wed Mar 13 10:00:05 2024
    From Newsgroup: comp.lang.forth

    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area
    or at the address of the first item to process?

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Wed Mar 13 14:30:00 2024
    From Newsgroup: comp.lang.forth

    - DOxxx performs the loop
    - Indices are integers.
    - forms of DO
    one-bound {BODY} DO) \ 0 ... one-bound-1
    one-bound {BODY} DO] \ 1 ... one-bound
    b1 b2 {BODY} DO[] \ b1 .. b2
    b1 b2 stride {BODY} DO[..] \ b1 b1+stride b1+2*stride .. b2

    Maybe
    b1 b2 {BODY} DO[) \ b1 .. b2-1
    to accommodate
    array length OVER + {BODY} DO[)

    Note the stride is now constant obviously.
    If it is negative, the loop goes down.
    If you want to straddle from positive to negative (addresses?),
    program it explicitly and conspicuously.

    Note 1
    The [ ) convention comes from mathematics, example:
    [1,9] interval 1 2 3 4 5 6 7 8 9
    [1,9) interval 1 2 3 4 5 6 7 8
    (0,9) interval 1 2 3 4 5 6 7 8

    Note 2
    {BODY} leans heavily on [: ;] presence. (Or ciforth's { } )

    Note 3
    If you want to change the stride mid-program, you have to
    use BEGIN WHILE REPEAT, as you should have done in the first place.

    The four DO's replace the four don't's : ?DO DO LOOP +LOOP .

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 00:47:12 2024
    From Newsgroup: comp.lang.forth

    On 13/03/2024 9:00 pm, mhx wrote:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

    Make one using BEGIN WHILE REPEAT. That's what Forth is for.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 13:53:14 2024
    From Newsgroup: comp.lang.forth

    Also handy When you have list types:

    (( 1 2 3 5 7 11 13 17 )) DO-WITH ..

    or

    (( H2 O2 CO CO2 )) DO-WITH ..
    where H2 et al can be numbers/addresses or arrays/strings
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 14:15:49 2024
    From Newsgroup: comp.lang.forth

    dxf wrote:

    On 13/03/2024 9:00 pm, mhx wrote:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

    Make one using BEGIN WHILE REPEAT. That's what Forth is for.

    Scratch with the chickens, don't fly with the eagles! ;-)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Wed Mar 13 16:41:37 2024
    From Newsgroup: comp.lang.forth

    mhx@iae.nl (mhx) writes:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP

    I used the locals stack for the stride in the general case (when the
    stride is not a constant). If MEM+DO works correctly, that value is
    cleaned up automatically. Let's see if it works correctly:

    : foo pad swap dup mem+do unloop exit loop ;
    : bar 123 {: a :} cell foo a . ;
    bar

    This prints 123, so it works as intended. Let's see if LEAVE also
    works as it should:

    : foo 123 {: a :} pad swap dup mem+do leave loop a . ;
    cell foo

    This also prints 123 as it should.

    and does one point at the start of the area
    or at the address of the first item to process?

    For MEM+DO addr is the first item to process, for MEM-DO the last.
    I.e., you use exactly the same parameters whether you process the
    array forwards with MEM+DO or backwards with MEM-DO, as long as ubytes
    is a multiple of +nstride.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Mar 13 19:04:47 2024
    From Newsgroup: comp.lang.forth

    On 13-03-2024 08:20, Anton Ertl wrote:
    Concerning the name +DO, this is taken in Gforth since at least
    Gforth-0.2 (1996) for entering a loop only if index<limit (signed comparison), without providing a stride.

    Don't worry, Anton - I have no intention to implement this one. About a
    third of my loops are well behaved DO or ?DO .. LOOPs. I like BEGIN..WHILE..REPEAT a lot more. I very, very rarely use UNLOOP and
    LEAVE makes me feel uncomfortable as well. So I don't feel +DO adds a
    whole lot.

    I must say I'm quite in love with the FOR..NEXT I designed in uBasic/4tH:
    - DO..LOOP executes the very same code <shhhh!>;
    - All these are valid expressions:
    FOR x=1 TO 5
    FOR x=1 TO 5 STEP 2
    FOR x=1
    FOR x=1 STEP 2
    FOR (equals DO)
    FOR x=1 WHILE x<5
    FOR x=1 UNTIL x=5
    FOR x=1 TO 5 UNTIL y=3
    - It supports BREAK and CONTINUE, like:
    IF n=5 THEN BREAK
    IF n>5 THEN CONTINUE
    - You can place BREAK and CONTINUE everywhere - and they take effect immediately
    - "IF n=5 THEN BREAK" is equivalent to:
    UNTIL n=5
    WHILE n<>5
    - It features UNLOOP as well, so you can safely GOTO or RETURN out of a
    loop.

    Note you can reuse a lot of the components, like UNLOOP in BREAK, like
    BREAK in WHILE and UNTIL. I consider its design quite Forthy:

    : exec_unloop fpop fscrap ;
    : exec_break exec_unloop skip_next ;
    : exec_while get_exp 0= if exec_break then ;
    : exec_until get_exp if exec_break then ;

    "One loop to rule them all!!" ;-)

    Hans Bezemer



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 10:42:52 2024
    From Newsgroup: comp.lang.forth

    On 14/03/2024 1:15 am, minforth wrote:
    dxf wrote:

    On 13/03/2024 9:00 pm, mhx wrote:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

    Make one using BEGIN WHILE REPEAT.  That's what Forth is for.

    Scratch with the chickens, don't fly with the eagles!  ;-)

    A loop that needs more than one test and one branch is already
    inefficient so chickens it is :)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Mar 13 18:03:56 2024
    From Newsgroup: comp.lang.forth

    albert@spenarnc.xs4all.nl writes:
    So [Algol68] nil + reference takes the same place as NULL + pointer in c.

    I'm unfamiliar with Algol68 but if every reference in it can be set to
    nil, that sounds like the same error that Algol-W had. The alternative,
    using an option value, means: 1) if the reference is not wrapped by an
    option type, then it is guaranteed to not be null; 2) if it is wrapped
    by an option type, then the compiler can stop you (or at least warn you)
    if you try to dereference without first checking that it is non-null.

    You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
    satisfactory security.

    For sure, it is usually better to crash than to keep running and give
    nonsense answers. Of course that usually requires a hardware fault on dereferencing a null pointer, rather than giving whatever is at location
    0 in memory like on unprotected machines.

    Beyond not giving wrong answers, it's usually nice if your program
    doesn't crash too often, especially from program bugs. Getting help
    from the compiler for that is often useful.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Thu Mar 14 09:16:55 2024
    From Newsgroup: comp.lang.forth

    In article <87bk7h1v5v.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    So [Algol68] nil + reference takes the same place as NULL + pointer in c.

    I'm unfamiliar with Algol68 but if every reference in it can be set to
    nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
    option type, then it is guaranteed to not be null; 2) if it is wrapped
    by an option type, then the compiler can stop you (or at least warn you)
    if you try to dereference without first checking that it is non-null.

    You are supposed to test for this case, but if you fail you get a
    "Segmentation fault". As far as Forth goes, that is pretty
    satisfactory security.

    For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
    0 in memory like on unprotected machines.

    Algol68 doesn't crash. It gives a run time error of the type
    dereferencing a <nil> (<ref> <ref> <my_struct> aap) on line .. of ...
    called from line .. of ..
    ..
    called from line .. of main

    Beyond not giving wrong answers, it's usually nice if your program
    doesn't crash too often, especially from program bugs. Getting help
    from the compiler for that is often useful.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 20:34:33 2024
    From Newsgroup: comp.lang.forth

    On 14/03/2024 10:42 am, dxf wrote:
    On 14/03/2024 1:15 am, minforth wrote:
    dxf wrote:

    On 13/03/2024 9:00 pm, mhx wrote:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

    Make one using BEGIN WHILE REPEAT.  That's what Forth is for.

    Scratch with the chickens, don't fly with the eagles!  ;-)

    A loop that needs more than one test and one branch is already
    inefficient so chickens it is :)

    Chicken feed...

    \ FOR..WHILE..STEP..NEXT loop

    : STEP ( ?comp) postpone 2r> ; immediate

    : IDROP postpone step postpone 2drop ; immediate

    : FOR postpone begin postpone 2dup postpone 2>r ; immediate

    : NEXT postpone repeat postpone idrop ; immediate

    : >= < 0= ;
    : <= > 0= ;

    : t1 9 0 for >= while r@ . step 1+ next ; \ 0..9
    : t2 0 9 for <= while r@ . step 1- next ; \ 9..0

    : t3 10 0 for > while r@ . step 1+ next ; \ 0..9
    : t4 0 0 for > while r@ . step 1+ next ; \ does nothing

    : t5 10 0 for > while r@ 5 <> while r@ . step 1+ next else idrop then ; \ 0..4 : t6 10 0 for > while r@ 5 <> while r@ . step 1+ repeat then idrop ; \ 0..4


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Thu Mar 14 14:52:43 2024
    From Newsgroup: comp.lang.forth

    albert@spenarnc.xs4all.nl writes:
    Algol68 doesn't crash. It gives a run time error of the type

    Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    No idea about Algol68 but in (at least some) other languages, the idea
    of having references instead of pointers is that it is impossible to
    create an uninitialised reference.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Mar 15 16:34:49 2024
    From Newsgroup: comp.lang.forth

    On 14/03/2024 12:30 am, albert@spenarnc.xs4all.nl wrote:
    - DOxxx performs the loop
    - Indices are integers.
    - forms of DO
    one-bound {BODY} DO) \ 0 ... one-bound-1
    one-bound {BODY} DO] \ 1 ... one-bound
    b1 b2 {BODY} DO[] \ b1 .. b2
    b1 b2 stride {BODY} DO[..] \ b1 b1+stride b1+2*stride .. b2

    Maybe
    b1 b2 {BODY} DO[) \ b1 .. b2-1
    to accommodate
    array length OVER + {BODY} DO[)

    Note the stride is now constant obviously.
    If it is negative, the loop goes down.
    If you want to straddle from positive to negative (addresses?),
    program it explicitly and conspicuously.

    Note 1
    The [ ) convention comes from mathematics, example:
    [1,9] interval 1 2 3 4 5 6 7 8 9
    [1,9) interval 1 2 3 4 5 6 7 8
    (0,9) interval 1 2 3 4 5 6 7 8

    Note 2
    {BODY} leans heavily on [: ;] presence. (Or ciforth's { } )

    Note 3
    If you want to change the stride mid-program, you have to
    use BEGIN WHILE REPEAT, as you should have done in the first place.

    The four DO's replace the four don't's : ?DO DO LOOP +LOOP .

    All these alternatives to DO LOOP that folks propose don't get off the
    ground because the 'lesser programmers' for whom they're intended don't eventuate.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Mar 15 09:26:11 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin wrote:

    albert@spenarnc.xs4all.nl writes:
    Algol68 doesn't crash. It gives a run time error of the type

    Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    No idea about Algol68 but in (at least some) other languages, the idea
    of having references instead of pointers is that it is impossible to
    create an uninitialised reference.

    In Forth parlance: unless you're doing system programming where you need it, don't use direct memory operations like @ ! MOVE, etc. This also prohibits
    the use of VARIABLE. VARIABLES are uninitialized and are accessed by @ !.

    So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
    in cleaner code and improves memory safety.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Fri Mar 15 11:37:55 2024
    From Newsgroup: comp.lang.forth

    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about
    arrays? What about ALLOT or ALLOCATE?

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
    in cleaner code and improves memory safety.

    Yes I should start doing that too. I only mess with Forth for fun
    though. I feel like it helps me stay sharp compared with safer
    languages, even including C. I'm not old enough to have written
    significant amounts of machine code.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Mar 15 19:55:07 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin wrote:

    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about arrays?

    Arrays are heap-allocated dynamic objects with access methods. Direct memory access is virtually impossible (but with "carnal knowledge"). There
    is an array stack for more complex operations and a chain of array values
    for persistent storage. Stack and array values contain only pointers.

    F.ex.
    XZ14[ designates array (matrix or vector) value XZ14
    <index or indices> ] reads a vector/matrix element
    <index or indices> ]! writes to a vector/matrix element
    M"[ 2 1 ] from 3rd array on array stack read 1st element in 2nd row
    (M[ M'[ M"[ designate top, second and third matrix on array stack)
    XZ14[ ]' pushes transposed matrix XZ14 onto array stack
    XZ14 (or TO XZ14) writes top matrix to array value XZ14
    et cetera

    IOW there is a special word set for array operations. Operators check
    that there is no memory violation like index out of bounds, and do some housekeeping like (re)allocating memory.

    What about ALLOT or ALLOCATE?

    Above word set would be overkill for normal Forth applications.
    Nevertheless you could SEAL your search order and exclude or make
    safer versions of ALLOT et al for your application wordlist.
    I never understood why SEAL did not make it into ANS Forth's
    Search-Order word set, as it is just a simple SET-ORDER thing.

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do in general.

    Yes and no. It is easy to forget correct initialization when 0 is wrong.
    VALUEs explicitly require conscious initialization.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Mar 16 11:40:09 2024
    From Newsgroup: comp.lang.forth

    On 16/03/2024 5:37 am, Paul Rubin wrote:
    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    So I regularly use either xVALUEs (x means different data types) or data
    objects (for compound or dynamic types) with access methods. This results
    in cleaner code and improves memory safety.

    Yes I should start doing that too. I only mess with Forth for fun
    though. I feel like it helps me stay sharp compared with safer
    languages, even including C. I'm not old enough to have written
    significant amounts of machine code.

    In early forths (microFORTH, figFORTH) one was required to supply an
    initial value:

    0 VARIABLE name

    Nowadays one can write:

    VARIABLE name 0 name !

    or as I do:

    \ Set application defaults
    : DEFAULTS ( -- )
    0 to outdev shaping off spacing off
    train on koch off
    plain-text punct off compress on ignore off
    7 send.s ! 15 char.s ! 3 cspace ! 7 wspace !
    700 tone ! 7 volume ! sqr1wave
    6 groupcols ! 4 grouprows ! 3 groupsize !
    lsignal off 0 to hide ;

    defaults

    With one word I initialize everything in the application that deserves it.
    That leaves VALUEs which are needlessly initialized twice. If it was
    deemed poor practice to initialize VARIABLEs at creation, the same applies
    to VALUEs.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Mar 16 16:15:17 2024
    From Newsgroup: comp.lang.forth

    On 16/03/2024 5:37 am, Paul Rubin wrote:
    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    That's something I'd do for VALUEs should I move to omit the numeric
    prefix at creation. By automatically initializing VALUEs with 0, I can
    pretend - if only to myself - that VALUEs are different from VARIABLEs.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Mar 16 09:35:13 2024
    From Newsgroup: comp.lang.forth

    dxf wrote:
    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    That's something I'd do for VALUEs should I move to omit the numeric
    prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.

    Indeed, if you only work with integers in cell size, VARIABLEs and some
    code discipline are sufficient.

    VALUEs are like variants in VBA. You can only change them with TO <NAME>,
    and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

    When you implement your type-specific TO variants with built-in
    appropriate checking, you are on the safer side.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Mar 16 10:20:16 2024
    From Newsgroup: comp.lang.forth

    minforth@gmx.net (minforth) writes:
    Non-standard $VALUEs (for dynamic strings) or
    DVALUEs/ZVALUEs can be very practical too.

    2VALUE is standard.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Mar 16 11:13:07 2024
    From Newsgroup: comp.lang.forth

    Anton Ertl wrote:

    minforth@gmx.net (minforth) writes:
    Non-standard $VALUEs (for dynamic strings) or
    DVALUEs/ZVALUEs can be very practical too.

    2VALUE is standard.

    2VALUEs are for cell pairs. DVALUEs do not exist, because
    the standard assumes equivalency of double numbers and cell
    pairs (although mathematically they are not).
    ZVALUEs are for complex numbers.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Mar 17 12:04:54 2024
    From Newsgroup: comp.lang.forth

    On 16/03/2024 8:35 pm, minforth wrote:
    dxf wrote:
    At least in gforth, VARIABLEs are initialized to 0.  That seems like a
    good thing for implementations to do ingeneral.

    That's something I'd do for VALUEs should I move to omit the numeric
    prefix at creation.  By automatically initializing VALUEs with 0, I can
    pretend - if only to myself - that VALUEs are different from VARIABLEs.

    Indeed, if you only work with integers in cell size, VARIABLEs and some
    code discipline are sufficient.

    VALUEs are like variants in VBA. You can only change them with TO <NAME>,
    and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

    When you implement your type-specific TO variants with built-in
    appropriate checking, you are on the safer side.

    If safety is where one wants to be, there are surely better choices than
    Forth. For myself, I'd have to agree with Paul:

    "I only mess with Forth for fun though. I feel like it helps me stay
    sharp compared with safer languages, even including C."

    But VALUEs in Forth had little to do with safety. Its history, form,
    issues and attempted solutions is summarized here:

    https://pastebin.com/p5P5EVTm

    (ref: "svars.arc" Taygeta Forth archive)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Mar 17 07:30:07 2024
    From Newsgroup: comp.lang.forth

    Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Mar 18 11:04:46 2024
    From Newsgroup: comp.lang.forth

    On 17/03/2024 6:30 pm, mhx wrote:
    Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

    -marcel

    A momentary lapse by Moore? TO was an abstraction. 'Under the hood' it was still
    addresses, @ and ! .

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.forth on Tue Mar 19 19:57:35 2024
    From Newsgroup: comp.lang.forth

    On 05/03/2024 14:03, minforth wrote:
    Tristan Wibberley wrote:


    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
    https://developer.arm.com/documentation/den0013/d/Porting/Alignment

    And then we're not even trying to talk about what's in use and for sale
    today but rather what will be in use over the next 6 decades. Most of
    the historical peculiarities that are eliminated with more complex
    hardware instead of longer software can be expected to be present at
    some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
    peculiarities wouldn't have been present if there weren't some
    efficiency earned.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Tue Mar 19 21:39:41 2024
    From Newsgroup: comp.lang.forth

    Tristan Wibberley wrote:

    On 05/03/2024 14:03, minforth wrote:
    Tristan Wibberley wrote:


    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    You are still in for some nasty surprises with "public market" ARM CPUs.
    f.ex.
    https://developer.arm.com/documentation/den0013/d/Porting/Alignment

    And then we're not even trying to talk about what's in use and for sale today but rather what will be in use over the next 6 decades. Most of
    the historical peculiarities that are eliminated with more complex
    hardware instead of longer software can be expected to be present at
    some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those peculiarities wouldn't have been present if there weren't some
    efficiency earned.

    Although repeatedly proclaimed dead, we can still observe Moore's Law.
    With the increasing 3-dimensional design of CPUs and the hunger for massive computing power through AI applications, the trend is likely to continue. Another driver is the need for lower energy consumption.

    This means that as the complexity of systems grows almost exponentially,
    the consequences of software errors will become increasingly dangerous in
    the same magnitude. Just as a professional electrician only works with insulated tools, a professional programmer should also choose his tools,
    e.g. programming languages, which do not allow even simple errors to occur
    in the first place. They should also use operating systems and software containers equipped with protective functions.

    These means of protection that already exist today are not available in
    archaic programming languages such as C or Forth. Stoic language
    conservativism (a tenor in standard Forth) won't help.
    --- Synchronet 3.20a-Linux NewsLink 1.114