On 10-03-2024 10:56, Paul Rubin wrote:
...
That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.
That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".
It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.
Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
It might be worse for RISC V.
It is. That's a failure of RISC-V.
In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
It might be worse for RISC V.
It is. That's a failure of RISC-V.
As far as I can tell it was a design choice for DEC Alpha and RISC-V.
Apparently flags are detrimental to parallelism.
You can't call that a failure because you don't like it.
No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."
You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
2+2=5 is also deterministic yet wrong.In Java 2+2 gives 4. What do you hope to gain by putting up straw men?
2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is.
It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.
In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.
Tony Hoare in 2009 said about null pointers:
Java-style wraparound
arithmetic is more of the same. A bug magnet,
Java also has null pointers, another possible mistake. Ada doesn't have >them,
C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.
On 11/03/2024 2:37 am, Hans Bezemer wrote:
On 10-03-2024 10:56, Paul Rubin wrote:
...
That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.
That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".
It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.
At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)
wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
structure.
but even that works surprisingly well, so well that the RISC-V
designers have not seen a need to include an efficient way to detect
those cases where the result deviates from that in Z.
Still, the nice algebraic properties of modular arithmetic can be of
benefit even in such cases.... 64 bit machine
In what world can it be right for n to be a positive integer and n+1 toIt's how Java's int and long types work.
be a negative integer? That's not how integers work.
And if you want something closer to Z, Java also has BigInteger.
Tony Hoare in 2009 said about null pointers:And the relevance is?
Java-style wraparound arithmetic is more of the same. A bug magnet,Unsupported claim.
I think I saw the unintended result on a 32-bit machine
I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.
The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above
...
Another thing, if I run the same integer calculation on two machines, at least programmed in a HLL, I should expect the same result on both. But
if the word sizes are different then the results will be different. (If
one or both crash due to implementation restrictions such as machine overflow, that's annoying, but it's better than getting wrong answers).
Paul Rubin <no.email@nospam.invalid> writes:<SNIP?
Java also has null pointers, another possible mistake. Ada doesn't have >>them,
Ada certainly has null.
C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.
I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.
- anton
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}
void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}
int main() {
GetInput();
return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
input sanitization).
Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:
gcc xxx.c|xxx.c: In function ‘GetInput’:
|xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
you mean ‘fgets’? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
should not be used.
So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:
|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.
- anton
On 11/03/2024 2:37 am, Hans Bezemer wrote:
On 10-03-2024 10:56, Paul Rubin wrote:
...
That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.
That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".
It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.
At this point in time there's no way ?DO can be wrested away from forthers. They'll point to all the memory errors it has prevented :)
On 11-03-2024 06:26, dxf wrote:
On 11/03/2024 2:37 am, Hans Bezemer wrote:
On 10-03-2024 10:56, Paul Rubin wrote:
...
That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.
That's exactly the attitude that some people have down here. Just squat the problem without properly thinking it through. "Yeah, lets limit cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every single integer possible before stopping", "Yeah, lets introduce ?DO. It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS to a single address unit", "Yeah, lets abuse the weird behavior of MOVE when it overlaps and make it into a feature, because it's so neat".
It's the kind of design decision making that is sold as "pragmatic", but actually is lazy and sloppy.
At this point in time there's no way ?DO can be wrested away from forthers. >> They'll point to all the memory errors it has prevented :)
Yeeaaah - and NO! In order to make an informed decision you have to know in which the loop will be progressing. And in Forth, you don't know that. Worse, with a classical "DO" you don't do anything. You just put a few items on the return stack. The *real* decision is made by "+LOOP" (or "LOOP". "?DO" introduces a *SECOND* word that makes a decision. If I had my way, "LOOP" would be dumb - and just jump back, leaving some component of "DO" make the ultimate decision (because it can't be a single word).
In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.
Let's call the word that initializes these actions "+DO". +DO equals ( limit index step -- R: limit index step)
"DO" would become : DO 1 postpone +DO ;
It would function like a BASIC "FOR" and have just about the same behavior - as far as BASIC "FOR" have sane behavior. That's open for discussion ;-)
Sure it'd overload the return stack even more and affect I, I' and J
but:
10 0 -1 +DO (..) LOOP
Would not run. Neither would:
-10 0 DO (..) LOOP
Nor:
0 0 DO (..) LOOP
I'd consider that sane behavior.
In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.
Let's call the word that initializes these actions "+DO". +DO equals (
limit index step -- R: limit index step)
Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth
Note that 4tH behaves different here. It catches most of the exceptional >situations:
start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2
start: -2 stop: 2 inc: -1 | -2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2
start: 2 stop: 2 inc: 1 | 2
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2
start: 0 stop: 0 inc: 0 | 0
Versus:
Some of these loop infinitely, and some under/overflow, so for the sake
of brevity long outputs will be truncated by ....
start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...
I still don't think 4tH's performance is perfect, but it's a tradeoff >between compatibility and intuitive behavior.
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.
.. NEXT and <FOR .. NEXT \ index N for 1-dim vectors
.. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )[..]
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?
On 13/03/2024 9:00 pm, mhx wrote:
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?
Make one using BEGIN WHILE REPEAT. That's what Forth is for.
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )[..]
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP
and does one point at the start of the area
or at the address of the first item to process?
Concerning the name +DO, this is taken in Gforth since at least
Gforth-0.2 (1996) for entering a loop only if index<limit (signed comparison), without providing a stride.
dxf wrote:
On 13/03/2024 9:00 pm, mhx wrote:
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?
Make one using BEGIN WHILE REPEAT. That's what Forth is for.
Scratch with the chickens, don't fly with the eagles! ;-)
So [Algol68] nil + reference takes the same place as NULL + pointer in c.
You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.
albert@spenarnc.xs4all.nl writes:
So [Algol68] nil + reference takes the same place as NULL + pointer in c.
I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.
You are supposed to test for this case, but if you fail you get a
"Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.
For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.
Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.
On 14/03/2024 1:15 am, minforth wrote:
dxf wrote:
On 13/03/2024 9:00 pm, mhx wrote:
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?
Make one using BEGIN WHILE REPEAT. That's what Forth is for.
Scratch with the chickens, don't fly with the eagles! ;-)
A loop that needs more than one test and one branch is already
inefficient so chickens it is :)
Algol68 doesn't crash. It gives a run time error of the type
You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.
- DOxxx performs the loop
- Indices are integers.
- forms of DO
one-bound {BODY} DO) \ 0 ... one-bound-1
one-bound {BODY} DO] \ 1 ... one-bound
b1 b2 {BODY} DO[] \ b1 .. b2
b1 b2 stride {BODY} DO[..] \ b1 b1+stride b1+2*stride .. b2
Maybe
b1 b2 {BODY} DO[) \ b1 .. b2-1
to accommodate
array length OVER + {BODY} DO[)
Note the stride is now constant obviously.
If it is negative, the loop goes down.
If you want to straddle from positive to negative (addresses?),
program it explicitly and conspicuously.
Note 1
The [ ) convention comes from mathematics, example:
[1,9] interval 1 2 3 4 5 6 7 8 9
[1,9) interval 1 2 3 4 5 6 7 8
(0,9) interval 1 2 3 4 5 6 7 8
Note 2
{BODY} leans heavily on [: ;] presence. (Or ciforth's { } )
Note 3
If you want to change the stride mid-program, you have to
use BEGIN WHILE REPEAT, as you should have done in the first place.
The four DO's replace the four don't's : ?DO DO LOOP +LOOP .
albert@spenarnc.xs4all.nl writes:
Algol68 doesn't crash. It gives a run time error of the type
Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.
You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.
No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.
minforth@gmx.net (minforth) writes:
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
That helps but I'm sure there are other hazards. What do you do about arrays?
XZ14 (or TO XZ14) writes top matrix to array value XZ14et cetera
What about ALLOT or ALLOCATE?
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do in general.
minforth@gmx.net (minforth) writes:
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.
So I regularly use either xVALUEs (x means different data types) or data
objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.
Yes I should start doing that too. I only mess with Forth for fun
though. I feel like it helps me stay sharp compared with safer
languages, even including C. I'm not old enough to have written
significant amounts of machine code.
minforth@gmx.net (minforth) writes:
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
That helps but I'm sure there are other hazards. What do you do about arrays? What about ALLOT or ALLOCATE?
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.
That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.
Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.
minforth@gmx.net (minforth) writes:
Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.
2VALUE is standard.
dxf wrote:
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.
That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can
pretend - if only to myself - that VALUEs are different from VARIABLEs.
Indeed, if you only work with integers in cell size, VARIABLEs and some
code discipline are sufficient.
VALUEs are like variants in VBA. You can only change them with TO <NAME>,
and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.
When you implement your type-specific TO variants with built-in
appropriate checking, you are on the safer side.
Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).
-marcel
Tristan Wibberley wrote:
Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.
You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment
On 05/03/2024 14:03, minforth wrote:
Tristan Wibberley wrote:
Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.
You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment
And then we're not even trying to talk about what's in use and for sale today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those peculiarities wouldn't have been present if there weren't some
efficiency earned.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 915 |
Nodes: | 10 (2 / 8) |
Uptime: | 45:51:37 |
Calls: | 12,170 |
Files: | 186,521 |
Messages: | 2,234,593 |