Forum: War Ensemble BBS

Re: push for memory safe languages -- impact on Forth

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Mar 11 16:26:11 2024

From Newsgroup: comp.lang.forth

On 11/03/2024 2:37 am, Hans Bezemer wrote:

On 10-03-2024 10:56, Paul Rubin wrote:
...

That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Mon Mar 11 11:15:56 2024

From Newsgroup: comp.lang.forth

In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V. Apparently flags are detrimental to parallelism.

You can't call that a failure because you don't like it.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 17:40:20 2024

From Newsgroup: comp.lang.forth

albert@spenarnc.xs4all.nl writes:

In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V.

And MIPS.

Apparently flags are detrimental to parallelism.

Reality check: No MIPS, Alpha, or RISC-V ever has had as much
instruction-level parallelism as contemporaneous CPUs for
architectures with flags, so flags are obviously not detrimental to instruction-level parallelism.

Look at
<http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps>: The
dashed orange line near the bottom is U74, a RISC-V implementation.
The other lines are all for CPU cores with flags.

If you want to do several parallel multi-precision additions, say, if
you want a multi-precision addition a+b+c+d, having one (ARM A64) or
two (AMD64 with ADX) carry flags does indeed limit the parallelism,
but the MIPS/Alpha/RISC-V answer is to replace one ADCX/ADOX
instruction (one cycle latency) with five instructions with typically
three cycles of latency.

On AMD64 with ADX, a 6400-bit addition of a+b+c+d can be split into
two chains: t=a+b+c and t+d; this has a total latency of about 200
cycles (actually OoO execution can reduce this somewhat by overlapping
the two chains to a certain extent), while the MIPS/Alpha/RISC-V
approach takes 300 cycles of latency with no chance of additional
overlap within that computation.

You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).

You can't call that a failure because you don't like it.

The correct english term is that it's the *fault* of RISC-V. They
took a deliberate decision to need more instructions for implementing
overflow checks than other architectures, so it's their
responsibility, and for those who want to use big integers (or who
want to trap on signed overflow), their fault.

For an alternative to the RISC-V approach that is not as limiting as
the ARM A64 and AMD64 approaches, read:

http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf

(not published yet)

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Mar 11 18:50:36 2024

From Newsgroup: comp.lang.forth

No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 20:51:46 2024

From Newsgroup: comp.lang.forth

mhx@iae.nl (mhx) writes:

No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

Works for me:

wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
--2024-03-11 21:49:20-- http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
Resolving www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)... 128.130.173.64
Connecting to www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)|128.130.173.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2255987 (2.2M) [application/postscript]
Saving to: ‘opt-ipc-uarch.eps’

opt-ipc-uarch.eps 100%[===================>] 2.15M 8.38MB/s in 0.3s

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 20:53:54 2024

From Newsgroup: comp.lang.forth

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).

Correction: For performing >6 parallel multi-precision additions at a
rate of >6 steps every 3 cycles, >36 instructions are needed only
every 3 cycles with the MIPS/Alpha/RISC-V approach, i.e. >12 instructions/cycle.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Mar 11 21:08:43 2024

From Newsgroup: comp.lang.forth

Paul Rubin <no.email@nospam.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is.

Not at all. Modular arithmetic is not arithmetic in Z, but it's a
commutative ring and has the nice properties of this algebraic
structure.

It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.

If you use members of that ring as if they were members of Z, you will sometimes get an unintended result; but even that works surprisingly
well, so well that the RISC-V designers have not seen a need to
include an efficient way to detect those cases where the result
deviates from that in Z. Still, the nice algebraic properties of
modular arithmetic can be of benefit even in such cases:

9223372036854775807 1 + dup cr . 2 - cr .

prints

-9223372036854775808
9223372036854775806 ok

in Gforth on a 64-bit machine.

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

It's how Java's int and long types work. And if you want something
closer to Z, Java also has BigInteger.

Tony Hoare in 2009 said about null pointers:

And the relevance is?

Java-style wraparound
arithmetic is more of the same. A bug magnet,

Unsupported claim. Interestingly, I remember only one case where I
saw an unintended result due to modular arithmetic in a programming
language. It happened when I computed with performance counter
results in bash. bash still works that way:

[~:147654] A=9223372036854775807
[~:147655] echo $[A+1]
-9223372036854775808

I think I saw the unintended result on a 32-bit machine, because
performance counter results typically do not exceed 2^48, definitely
not 2^63-1.

Java also has null pointers, another possible mistake. Ada doesn't have >them,

Ada certainly has null.

C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

Null pointers are at least a little bit more on-topic in this thread
than integer overflow. In Java one can write, say, a linked list or a
tree in an object-oriented manner, with, e.g., a tree node being an
abstract class that has two concrete subclasses: inner node, and empty
node. No null pointers in sight, right? Wrong: When an inner node is
created, the constructor of the node first sees a data structure where
all bytes have been initialized to 0, in order to guarantee memory
safety; for the references to the child nodes, this means that at that
point they are null pointers. Only then can the Java code in the
constructor overwrite them with whatever proper value they get. Is it
a problem? Not if they only exist there.

The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above, but in an imperative
way with null pointers instead of empty nodes could be more
problematic, but is it a major problem? Not in my (limited)
experience.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Mar 12 10:44:01 2024

From Newsgroup: comp.lang.forth

On 11/03/2024 4:26 pm, dxf wrote:

On 11/03/2024 2:37 am, Hans Bezemer wrote:

On 10-03-2024 10:56, Paul Rubin wrote:
...

That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

I examined my application for ?DO. Of the eleven instances found only
one was justified. Similarly where I had written - 0 MAX the adjustment proved redundant.

"The oldest and strongest emotion of mankind is fear, and the oldest
and strongest kind of fear is fear of the unknown." - H.P. Lovecraft

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Mar 11 19:20:08 2024

From Newsgroup: comp.lang.forth

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps

It worked for me too (in a browser). The Golden Cove figures are
impressive. I believe there are some RISC-V implementations with OOO by
now though.

The article about carry bits is interesting though besides bignums, one
should also consider the cost of (desirable) routine overflow trapping
of integer arithmetic which is currently not done much. Maybe
benchmarking C programs compiled with and without -ftrapv would be a
useful addition.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Mar 11 20:07:01 2024

From Newsgroup: comp.lang.forth

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
structure.

Right, those modular values aren't integers, they are equivalence
classes of integers. The ring Z/NZ might have some nice properties
but they aren't the properties of integers.

but even that works surprisingly well, so well that the RISC-V
designers have not seen a need to include an efficient way to detect
those cases where the result deviates from that in Z.

Sure, C worked pretty well in the 1980s but we've seen how well that
worked out. RISC-V perpetuates the bugs of the 1980s instead of taking
the opportunity to fix them.

Still, the nice algebraic properties of modular arithmetic can be of
benefit even in such cases.... 64 bit machine

Another thing, if I run the same integer calculation on two machines, at
least programmed in a HLL, I should expect the same result on both. But
if the word sizes are different then the results will be different. (If
one or both crash due to implementation restrictions such as machine
overflow, that's annoying, but it's better than getting wrong answers).

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

It's how Java's int and long types work.

Yes, that's a mistake. I just don't see how it can be anything else.
2+2=5 would be obviously wrong, but it's hypothetical, or as you say, a
straw man. 20+20=50 or 2000+2000=5000 or 200000+200000=500000 would
also be straw men, since they don't happen either. What about 2000000000+2000000000=-294967296? Java actually does that, it can't be
called a straw man, so instead I'm supposed to believe that it's a valid result. I just can't.

And if you want something closer to Z, Java also has BigInteger.

Those are boxed and expensive for the usual case where the results are
expected to fit into the machine word. Of course that expectation may
be wrong (say due to a program bug), but in that case I want the program
to crash, like it would for an out-of-range subscript.

Maybe it is a mistake for Java to have an int type like that at all,
i.e. BigInteger should be the default, like in Python. It was a design
choice to make machine arithmetic more accessible to gain acceptance by
some potential users. Guy Steele famously said "We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."
Java today seems awfully old-fashioned of course.

.

Tony Hoare in 2009 said about null pointers:

And the relevance is?

Both are instances where adding a "feature" for implementation
convenience turned out to attract bugs and vulnerabilities.

Java-style wraparound arithmetic is more of the same. A bug magnet,

Unsupported claim.

It's supported by that page linked a few days ago, about overflow bugs
in real programs.

I think I saw the unintended result on a 32-bit machine

I agree that it's less likely to be a problem if the ints are 64 bits.
And of course it was a frequent occurence in the 16 bit era.

Note that at least in gcc on x64, ints and longs by default are still 32
bits. These days when I write C code I tend to use stdint.h and specify
int sizes explicitly, e.g. int64_t or int32_t rather than int or long or whatever.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

I don't know a way to make an uninitialized reference in C++ but maybe
it's possible. If you just say "int &y;" you get a compile time error.

The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above

The OO description is similar to using a sum type, and it's reasonable
for the implementation under the covers to use a zero pointer to
represent an empty list. Some Lisp implementations go even further and
used "cdr coding", which means using a single bit to indicate that the
next list node is at the next word in memory, so the "next" pointer
(cdr) can be eliminated. You might allocate the list nodes
non-consecutively when the list is created, but a compacting GC can
later make the elements consecutive in memory and get rid of the pointer overhead.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Mar 12 16:25:37 2024

From Newsgroup: comp.lang.forth

On 12/03/2024 2:07 pm, Paul Rubin wrote:

...
Another thing, if I run the same integer calculation on two machines, at least programmed in a HLL, I should expect the same result on both. But
if the word sizes are different then the results will be different. (If
one or both crash due to implementation restrictions such as machine overflow, that's annoying, but it's better than getting wrong answers).

Not all customers have the same expectations. If they can find a HLL that suits them, well and good.

--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Mar 12 09:48:18 2024

From Newsgroup: comp.lang.forth

In article <2024Mar11.220843@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes:

<SNIP?

Java also has null pointers, another possible mistake. Ada doesn't have >>them,

Ada certainly has null.

C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

I can't see the problem with null pointers. Algol68 had an explicit
`nil' that serves the same purpose. Any reference is initialized with
`nil'. If you try to dereference it, meaning trying to fetch or otherwise
use the referred object this meets with a run time error.
That is probably the clean and expensive way.
So nil + reference takes the same place as NULL + pointer in c.

I try to emulate this in ciforth. Looking up a word in the dictionary
results in an entry (struct with fields for properties) or a null pointer,
i.e. zero. You are supposed to test for this case, but if you fail
you get a "Segmentation fault".
As far as Forth goes, that is pretty satisfactory security.

<SNIP>

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Mar 12 10:13:19 2024

From Newsgroup: comp.lang.forth

In article <2024Mar2.090401@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}

void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}

int main() {
GetInput();
return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
input sanitization).

Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:

gcc xxx.c

|xxx.c: In function ‘GetInput’:
|xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
you mean ‘fgets’? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
should not be used.

So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:

|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.

Ironically, in ciforth I implemented (ACCEPT). That has the
functionality of gets(). However it returns (addr length) and
identifies a part of the input buffer. So you can never
overwrite anything, because it doesn't write anything.

<SNIP>

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Tue Mar 12 17:42:50 2024

From Newsgroup: comp.lang.forth

On 11-03-2024 06:26, dxf wrote:

On 11/03/2024 2:37 am, Hans Bezemer wrote:

On 10-03-2024 10:56, Paul Rubin wrote:
...

That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)

Yeeaaah - and NO! In order to make an informed decision you have to know
in which the loop will be progressing. And in Forth, you don't know
that. Worse, with a classical "DO" you don't do anything. You just put a
few items on the return stack. The *real* decision is made by "+LOOP"
(or "LOOP". "?DO" introduces a *SECOND* word that makes a decision. If I
had my way, "LOOP" would be dumb - and just jump back, leaving some
component of "DO" make the ultimate decision (because it can't be a
single word).

In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.

Let's call the word that initializes these actions "+DO". +DO equals (
limit index step -- R: limit index step)

"DO" would become : DO 1 postpone +DO ;

It would function like a BASIC "FOR" and have just about the same
behavior - as far as BASIC "FOR" have sane behavior. That's open for discussion ;-)

Sure it'd overload the return stack even more and affect I, I' and J
but:

10 0 -1 +DO (..) LOOP

Would not run. Neither would:

-10 0 DO (..) LOOP

Nor:

0 0 DO (..) LOOP

I'd consider that sane behavior.

Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

Note that 4tH behaves different here. It catches most of the exceptional situations:

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2
start: -2 stop: 2 inc: -1 | -2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2
start: 2 stop: 2 inc: 1 | 2
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2
start: 0 stop: 0 inc: 0 | 0

Versus:

Some of these loop infinitely, and some under/overflow, so for the sake
of brevity long outputs will be truncated by ....

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

I still don't think 4tH's performance is perfect, but it's a tradeoff
between compatibility and intuitive behavior.

Note that 4tH behaves different when performing negative +LOOPs, but
those are rare IRL.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Wed Mar 13 13:53:43 2024

From Newsgroup: comp.lang.forth

On 13/03/2024 3:42 am, Hans Bezemer wrote:

On 11-03-2024 06:26, dxf wrote:

On 11/03/2024 2:37 am, Hans Bezemer wrote:

On 10-03-2024 10:56, Paul Rubin wrote:
...

That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".

It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.

At this point in time there's no way ?DO can be wrested away from forthers. >> They'll point to all the memory errors it has prevented :)

Yeeaaah - and NO! In order to make an informed decision you have to know in which the loop will be progressing. And in Forth, you don't know that. Worse, with a classical "DO" you don't do anything. You just put a few items on the return stack. The *real* decision is made by "+LOOP" (or "LOOP". "?DO" introduces a *SECOND* word that makes a decision. If I had my way, "LOOP" would be dumb - and just jump back, leaving some component of "DO" make the ultimate decision (because it can't be a single word).

DO LOOP lies in the category of 'counted loop' (as opposed to 'indefinite loop' e.g.
BEGIN). The premise behind LOOP having control is counted loops run at least once.
It was the same for Moore's FOR NEXT. Microprocessors, too, have decrement-and-loop
instructions. So there's precedence in having counted loops test at the end.

?DO was a late addition to forth introduced by ANS and then only as an option. How often is it needed - far less than I've been using it or needed to. In a recent application review I was able to swap all but one ?DO for DO. Count now stands at 14 DO vs. 1 ?DO.

In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.

Let's call the word that initializes these actions "+DO". +DO equals ( limit index step -- R: limit index step)

"DO" would become : DO 1 postpone +DO ;

It would function like a BASIC "FOR" and have just about the same behavior - as far as BASIC "FOR" have sane behavior. That's open for discussion ;-)

Sure it'd overload the return stack even more and affect I, I' and J
but:

10 0 -1 +DO (..) LOOP

Would not run. Neither would:

-10 0 DO (..) LOOP

Nor:

0 0 DO (..) LOOP

I'd consider that sane behavior.

ISTM an unnecessary complication of counted loops. AFAIK languages such as
C don't have counted loops. What they have is indefinite loops whose tests
are coded according to the datatype (signed, unsigned) and it's on these 'indefinite loops' that folks are comparing Forth's DO LOOP. I see counted
and indefinite loops as distinct and separate, each having its own uses and strength.
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Wed Mar 13 07:20:34 2024

From Newsgroup: comp.lang.forth

Hans Bezemer <the.beez.speaks@gmail.com> writes:

In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.

Let's call the word that initializes these actions "+DO". +DO equals (
limit index step -- R: limit index step)

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.

One could add a BOUNDS+DO that works like your +DO, but I would first
have to see if it is needed.

Concerning the name +DO, this is taken in Gforth since at least
Gforth-0.2 (1996) for entering a loop only if index<limit (signed
comparison), without providing a stride.

Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

Note that 4tH behaves different here. It catches most of the exceptional >situations:

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2
start: -2 stop: 2 inc: -1 | -2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2
start: 2 stop: 2 inc: 1 | 2
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2
start: 0 stop: 0 inc: 0 | 0

Versus:

Some of these loop infinitely, and some under/overflow, so for the sake
of brevity long outputs will be truncated by ....

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

I still don't think 4tH's performance is perfect, but it's a tradeoff >between compatibility and intuitive behavior.

You showed the DO version in Forth, which is indeed rather weak for
the practically occuring index=limit case. For that we have ?DO,
which shows:

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 |
start: 2 stop: 2 inc: -1 |
start: 2 stop: 2 inc: 0 |
start: 0 stop: 0 inc: 0 |

The 0 +LOOP case (second line) does not occur in practice. I
recommend to not use

?DO ... -1 +LOOP

because the behaviour of ?DO is not consistent with that of -1 +LOOP
when index=limit. The rosettacode tests don't show this inconsistency
clearly, though. Gforth has

-DO ... 1 -LOOP

for decrementing in each step by 1, but it seems to me that the
rosettacode task is intended to use the same counted-loop construct
for both cases. If you, say, write

2 -2 +DO ... -1 +LOOP

You will get the same result as in the third line, but you asked for
it.

For the fifth line, if you use

-2 2 +DO ... 1 +LOOP

the result is that the loop is not entered.

Overall, for

: test-seq ( start stop inc -- )
cr rot dup ." start: " 2 .r
rot dup ." stop: " 2 .r
rot dup ." inc: " 2 .r ." | "
-rot swap +do i . dup +loop drop ;
-2 2 1 test-seq
-2 2 0 test-seq
-2 2 -1 test-seq
-2 2 10 test-seq
2 -2 1 test-seq
2 2 1 test-seq
2 2 -1 test-seq
2 2 0 test-seq
0 0 0 test-seq

the output is:

start: -2 stop: 2 inc: 1 | -2 -1 0 1 ok
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2 ok
start: -2 stop: 2 inc: 10 | -2 ok
start: 2 stop: -2 inc: 1 | ok
start: 2 stop: 2 inc: 1 | ok
start: 2 stop: 2 inc: -1 | ok
start: 2 stop: 2 inc: 0 | ok
start: 0 stop: 0 inc: 0 | ok

The same as the ?DO variant except for the "start: 2 stop: -2 inc: 1"
case.

I don't consider performing one iteration if index=limit good
behaviour.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 09:24:36 2024

From Newsgroup: comp.lang.forth

Anton Ertl wrote:

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.

A very handy addition when working with arrays. I use similar words

.. NEXT and <FOR .. NEXT \ index N for 1-dim vectors

.. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.

Recently I also added a "runtime control flow stack" to my system to hold
loop indices. I just hated UNLOOP et al too much. ;-)
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Wed Mar 13 10:00:05 2024

From Newsgroup: comp.lang.forth

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area
or at the address of the first item to process?

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Wed Mar 13 14:30:00 2024

From Newsgroup: comp.lang.forth

- DOxxx performs the loop
- Indices are integers.
- forms of DO
one-bound {BODY} DO) \ 0 ... one-bound-1
one-bound {BODY} DO] \ 1 ... one-bound
b1 b2 {BODY} DO[] \ b1 .. b2
b1 b2 stride {BODY} DO[..] \ b1 b1+stride b1+2*stride .. b2

Maybe
b1 b2 {BODY} DO[) \ b1 .. b2-1
to accommodate
array length OVER + {BODY} DO[)

Note the stride is now constant obviously.
If it is negative, the loop goes down.
If you want to straddle from positive to negative (addresses?),
program it explicitly and conspicuously.

Note 1
The [ ) convention comes from mathematics, example:
[1,9] interval 1 2 3 4 5 6 7 8 9
[1,9) interval 1 2 3 4 5 6 7 8
(0,9) interval 1 2 3 4 5 6 7 8

Note 2
{BODY} leans heavily on [: ;] presence. (Or ciforth's { } )

Note 3
If you want to change the stride mid-program, you have to
use BEGIN WHILE REPEAT, as you should have done in the first place.

The four DO's replace the four don't's : ?DO DO LOOP +LOOP .

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 00:47:12 2024

From Newsgroup: comp.lang.forth

On 13/03/2024 9:00 pm, mhx wrote:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

Make one using BEGIN WHILE REPEAT. That's what Forth is for.

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 13:53:14 2024

From Newsgroup: comp.lang.forth

Also handy When you have list types:

(( 1 2 3 5 7 11 13 17 )) DO-WITH ..

or

(( H2 O2 CO CO2 )) DO-WITH ..
where H2 et al can be numbers/addresses or arrays/strings
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Mar 13 14:15:49 2024

From Newsgroup: comp.lang.forth

dxf wrote:

On 13/03/2024 9:00 pm, mhx wrote:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

Make one using BEGIN WHILE REPEAT. That's what Forth is for.

Scratch with the chickens, don't fly with the eagles! ;-)
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Wed Mar 13 16:41:37 2024

From Newsgroup: comp.lang.forth

mhx@iae.nl (mhx) writes:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP

I used the locals stack for the stride in the general case (when the
stride is not a constant). If MEM+DO works correctly, that value is
cleaned up automatically. Let's see if it works correctly:

: foo pad swap dup mem+do unloop exit loop ;
: bar 123 {: a :} cell foo a . ;
bar

This prints 123, so it works as intended. Let's see if LEAVE also
works as it should:

: foo 123 {: a :} pad swap dup mem+do leave loop a . ;
cell foo

This also prints 123 as it should.

and does one point at the start of the area
or at the address of the first item to process?

For MEM+DO addr is the first item to process, for MEM-DO the last.
I.e., you use exactly the same parameters whether you process the
array forwards with MEM+DO or backwards with MEM-DO, as long as ubytes
is a multiple of +nstride.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Mar 13 19:04:47 2024

From Newsgroup: comp.lang.forth

On 13-03-2024 08:20, Anton Ertl wrote:

Concerning the name +DO, this is taken in Gforth since at least
Gforth-0.2 (1996) for entering a loop only if index<limit (signed comparison), without providing a stride.

Don't worry, Anton - I have no intention to implement this one. About a
third of my loops are well behaved DO or ?DO .. LOOPs. I like BEGIN..WHILE..REPEAT a lot more. I very, very rarely use UNLOOP and
LEAVE makes me feel uncomfortable as well. So I don't feel +DO adds a
whole lot.

I must say I'm quite in love with the FOR..NEXT I designed in uBasic/4tH:
- DO..LOOP executes the very same code <shhhh!>;
- All these are valid expressions:
FOR x=1 TO 5
FOR x=1 TO 5 STEP 2
FOR x=1
FOR x=1 STEP 2
FOR (equals DO)
FOR x=1 WHILE x<5
FOR x=1 UNTIL x=5
FOR x=1 TO 5 UNTIL y=3
- It supports BREAK and CONTINUE, like:
IF n=5 THEN BREAK
IF n>5 THEN CONTINUE
- You can place BREAK and CONTINUE everywhere - and they take effect immediately
- "IF n=5 THEN BREAK" is equivalent to:
UNTIL n=5
WHILE n<>5
- It features UNLOOP as well, so you can safely GOTO or RETURN out of a
loop.

Note you can reuse a lot of the components, like UNLOOP in BREAK, like
BREAK in WHILE and UNTIL. I consider its design quite Forthy:

: exec_unloop fpop fscrap ;
: exec_break exec_unloop skip_next ;
: exec_while get_exp 0= if exec_break then ;
: exec_until get_exp if exec_break then ;

"One loop to rule them all!!" ;-)

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 10:42:52 2024

From Newsgroup: comp.lang.forth

On 14/03/2024 1:15 am, minforth wrote:

dxf wrote:

On 13/03/2024 9:00 pm, mhx wrote:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

Make one using BEGIN WHILE REPEAT. That's what Forth is for.

Scratch with the chickens, don't fly with the eagles! ;-)

A loop that needs more than one test and one branch is already
inefficient so chickens it is :)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Mar 13 18:03:56 2024

From Newsgroup: comp.lang.forth

albert@spenarnc.xs4all.nl writes:

So [Algol68] nil + reference takes the same place as NULL + pointer in c.

I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative,
using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.

You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.

For sure, it is usually better to crash than to keep running and give
nonsense answers. Of course that usually requires a hardware fault on dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.

Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Thu Mar 14 09:16:55 2024

From Newsgroup: comp.lang.forth

In article <87bk7h1v5v.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:

albert@spenarnc.xs4all.nl writes:

So [Algol68] nil + reference takes the same place as NULL + pointer in c.

I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.

You are supposed to test for this case, but if you fail you get a
"Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.

For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.

Algol68 doesn't crash. It gives a run time error of the type
dereferencing a <nil> (<ref> <ref> <my_struct> aap) on line .. of ...
called from line .. of ..
..
called from line .. of main

Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Mar 14 20:34:33 2024

From Newsgroup: comp.lang.forth

On 14/03/2024 10:42 am, dxf wrote:

On 14/03/2024 1:15 am, minforth wrote:

dxf wrote:

On 13/03/2024 9:00 pm, mhx wrote:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

Make one using BEGIN WHILE REPEAT. That's what Forth is for.

Scratch with the chickens, don't fly with the eagles! ;-)

A loop that needs more than one test and one branch is already
inefficient so chickens it is :)

Chicken feed...

\ FOR..WHILE..STEP..NEXT loop

: STEP ( ?comp) postpone 2r> ; immediate

: IDROP postpone step postpone 2drop ; immediate

: FOR postpone begin postpone 2dup postpone 2>r ; immediate

: NEXT postpone repeat postpone idrop ; immediate

: >= < 0= ;
: <= > 0= ;

: t1 9 0 for >= while r@ . step 1+ next ; \ 0..9
: t2 0 9 for <= while r@ . step 1- next ; \ 9..0

: t3 10 0 for > while r@ . step 1+ next ; \ 0..9
: t4 0 0 for > while r@ . step 1+ next ; \ does nothing

: t5 10 0 for > while r@ 5 <> while r@ . step 1+ next else idrop then ; \ 0..4 : t6 10 0 for > while r@ 5 <> while r@ . step 1+ repeat then idrop ; \ 0..4

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Thu Mar 14 14:52:43 2024

From Newsgroup: comp.lang.forth

albert@spenarnc.xs4all.nl writes:

Algol68 doesn't crash. It gives a run time error of the type

Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Fri Mar 15 16:34:49 2024

From Newsgroup: comp.lang.forth

On 14/03/2024 12:30 am, albert@spenarnc.xs4all.nl wrote:

- DOxxx performs the loop
- Indices are integers.
- forms of DO
one-bound {BODY} DO) \ 0 ... one-bound-1
one-bound {BODY} DO] \ 1 ... one-bound
b1 b2 {BODY} DO[] \ b1 .. b2
b1 b2 stride {BODY} DO[..] \ b1 b1+stride b1+2*stride .. b2

Maybe
b1 b2 {BODY} DO[) \ b1 .. b2-1
to accommodate
array length OVER + {BODY} DO[)

Note the stride is now constant obviously.
If it is negative, the loop goes down.
If you want to straddle from positive to negative (addresses?),
program it explicitly and conspicuously.

Note 1
The [ ) convention comes from mathematics, example:
[1,9] interval 1 2 3 4 5 6 7 8 9
[1,9) interval 1 2 3 4 5 6 7 8
(0,9) interval 1 2 3 4 5 6 7 8

Note 2
{BODY} leans heavily on [: ;] presence. (Or ciforth's { } )

Note 3
If you want to change the stride mid-program, you have to
use BEGIN WHILE REPEAT, as you should have done in the first place.

The four DO's replace the four don't's : ?DO DO LOOP +LOOP .

All these alternatives to DO LOOP that folks propose don't get off the
ground because the 'lesser programmers' for whom they're intended don't eventuate.

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Mar 15 09:26:11 2024

From Newsgroup: comp.lang.forth

Paul Rubin wrote:

albert@spenarnc.xs4all.nl writes:

Algol68 doesn't crash. It gives a run time error of the type

Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.

In Forth parlance: unless you're doing system programming where you need it, don't use direct memory operations like @ ! MOVE, etc. This also prohibits
the use of VARIABLE. VARIABLES are uninitialized and are accessed by @ !.

So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Fri Mar 15 11:37:55 2024

From Newsgroup: comp.lang.forth

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about
arrays? What about ALLOT or ALLOCATE?

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.

Yes I should start doing that too. I only mess with Forth for fun
though. I feel like it helps me stay sharp compared with safer
languages, even including C. I'm not old enough to have written
significant amounts of machine code.
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Mar 15 19:55:07 2024

From Newsgroup: comp.lang.forth

Paul Rubin wrote:

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about arrays?

Arrays are heap-allocated dynamic objects with access methods. Direct memory access is virtually impossible (but with "carnal knowledge"). There
is an array stack for more complex operations and a chain of array values
for persistent storage. Stack and array values contain only pointers.

F.ex.
XZ14[ designates array (matrix or vector) value XZ14
<index or indices> ] reads a vector/matrix element
<index or indices> ]! writes to a vector/matrix element
M"[ 2 1 ] from 3rd array on array stack read 1st element in 2nd row
(M[ M'[ M"[ designate top, second and third matrix on array stack)
XZ14[ ]' pushes transposed matrix XZ14 onto array stack

XZ14 (or TO XZ14) writes top matrix to array value XZ14

et cetera

IOW there is a special word set for array operations. Operators check
that there is no memory violation like index out of bounds, and do some housekeeping like (re)allocating memory.

What about ALLOT or ALLOCATE?

Above word set would be overkill for normal Forth applications.
Nevertheless you could SEAL your search order and exclude or make
safer versions of ALLOT et al for your application wordlist.
I never understood why SEAL did not make it into ANS Forth's
Search-Order word set, as it is just a simple SET-ORDER thing.

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do in general.

Yes and no. It is easy to forget correct initialization when 0 is wrong.
VALUEs explicitly require conscious initialization.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Mar 16 11:40:09 2024

From Newsgroup: comp.lang.forth

On 16/03/2024 5:37 am, Paul Rubin wrote:

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

So I regularly use either xVALUEs (x means different data types) or data
objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.

Yes I should start doing that too. I only mess with Forth for fun
though. I feel like it helps me stay sharp compared with safer
languages, even including C. I'm not old enough to have written
significant amounts of machine code.

In early forths (microFORTH, figFORTH) one was required to supply an
initial value:

0 VARIABLE name

Nowadays one can write:

VARIABLE name 0 name !

or as I do:

\ Set application defaults
: DEFAULTS ( -- )
0 to outdev shaping off spacing off
train on koch off
plain-text punct off compress on ignore off
7 send.s ! 15 char.s ! 3 cspace ! 7 wspace !
700 tone ! 7 volume ! sqr1wave
6 groupcols ! 4 grouprows ! 3 groupsize !
lsignal off 0 to hide ;

defaults

With one word I initialize everything in the application that deserves it.
That leaves VALUEs which are needlessly initialized twice. If it was
deemed poor practice to initialize VARIABLEs at creation, the same applies
to VALUEs.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Mar 16 16:15:17 2024

From Newsgroup: comp.lang.forth

On 16/03/2024 5:37 am, Paul Rubin wrote:

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can
pretend - if only to myself - that VALUEs are different from VARIABLEs.

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Mar 16 09:35:13 2024

From Newsgroup: comp.lang.forth

dxf wrote:

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.

Indeed, if you only work with integers in cell size, VARIABLEs and some
code discipline are sufficient.

VALUEs are like variants in VBA. You can only change them with TO <NAME>,
and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

When you implement your type-specific TO variants with built-in
appropriate checking, you are on the safer side.
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Mar 16 10:20:16 2024

From Newsgroup: comp.lang.forth

minforth@gmx.net (minforth) writes:

Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.

2VALUE is standard.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Mar 16 11:13:07 2024

From Newsgroup: comp.lang.forth

Anton Ertl wrote:

minforth@gmx.net (minforth) writes:

Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.

2VALUE is standard.

2VALUEs are for cell pairs. DVALUEs do not exist, because
the standard assumes equivalency of double numbers and cell
pairs (although mathematically they are not).
ZVALUEs are for complex numbers.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sun Mar 17 12:04:54 2024

From Newsgroup: comp.lang.forth

On 16/03/2024 8:35 pm, minforth wrote:

dxf wrote:

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can
pretend - if only to myself - that VALUEs are different from VARIABLEs.

Indeed, if you only work with integers in cell size, VARIABLEs and some
code discipline are sufficient.

VALUEs are like variants in VBA. You can only change them with TO <NAME>,
and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

When you implement your type-specific TO variants with built-in
appropriate checking, you are on the safer side.

If safety is where one wants to be, there are surely better choices than
Forth. For myself, I'd have to agree with Paul:

"I only mess with Forth for fun though. I feel like it helps me stay
sharp compared with safer languages, even including C."

But VALUEs in Forth had little to do with safety. Its history, form,
issues and attempted solutions is summarized here:

https://pastebin.com/p5P5EVTm

(ref: "svars.arc" Taygeta Forth archive)

--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Mar 17 07:30:07 2024

From Newsgroup: comp.lang.forth

Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Mar 18 11:04:46 2024

From Newsgroup: comp.lang.forth

On 17/03/2024 6:30 pm, mhx wrote:

Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

-marcel

A momentary lapse by Moore? TO was an abstraction. 'Under the hood' it was still
addresses, @ and ! .

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.forth on Tue Mar 19 19:57:35 2024

From Newsgroup: comp.lang.forth

On 05/03/2024 14:03, minforth wrote:

Tristan Wibberley wrote:

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

And then we're not even trying to talk about what's in use and for sale
today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
peculiarities wouldn't have been present if there weren't some
efficiency earned.

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Tue Mar 19 21:39:41 2024

From Newsgroup: comp.lang.forth

Tristan Wibberley wrote:

On 05/03/2024 14:03, minforth wrote:

Tristan Wibberley wrote:

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

And then we're not even trying to talk about what's in use and for sale today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those peculiarities wouldn't have been present if there weren't some
efficiency earned.

Although repeatedly proclaimed dead, we can still observe Moore's Law.
With the increasing 3-dimensional design of CPUs and the hunger for massive computing power through AI applications, the trend is likely to continue. Another driver is the need for lower energy consumption.

This means that as the complexity of systems grows almost exponentially,
the consequences of software errors will become increasingly dangerous in
the same magnitude. Just as a professional electrician only works with insulated tools, a professional programmer should also choose his tools,
e.g. programming languages, which do not allow even simple errors to occur
in the first place. They should also use operating systems and software containers equipped with protective functions.

These means of protection that already exist today are not available in
archaic programming languages such as C or Forth. Stoic language
conservativism (a tenor in standard Forth) won't help.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	915
Nodes:	10 (2 / 8)
Uptime:	45:51:37
Calls:	12,170
Files:	186,521
Messages:	2,234,593

Re: push for memory safe languages -- impact on Forth

Who's Online

System Info