I wonder: wouldn't it be useful to have stackless basic
arith operations? I mean instead of fetching the values
first and putting them on the stack, then doing something,
and in the end storing the result somewhere wouldn't it
be practical to use directly the variables?
But after I came up with this idea I realized someone
surely invented that before - it looks so obvious — yet
I didn't see it anywhere.
Did anyone of you see something
like this in any code?
If so — actually why somehow
(probably?) such solution has not become widespread?
Looks good to me; math can be done completely in ML
avoiding "Forth machine" engagement, therefore saving many
cycles.
With respect, the more important questions are:
For what type of machine?
Desktop or embedded?
Minimal kernel only or full standard compliant?
Hobby or professional support/service required?
But to mention another example:
https://mecrisp.sourceforge.net/#
Probably because the case where the two operands
of a + are in memory, and the result is needed
in memory is not that frequent.
I don't think that it would be faster or shorter to use
memory-to-memory operations here. That's also why the VAX died: RISCs
just outperformed it.
I wonder: wouldn't it be useful to have stackless basic
arith operations? I mean instead of fetching the values
first and putting them on the stack, then doing something,
and in the end storing the result somewhere wouldn't it
be practical to use directly the variables? Like this:
: +> ( addr1 addr2 addr3 -- )
rot @ rot @ + swap ! ;
Of course the above is just an illustration; I mean coding
such word directly in ML. It should be significantly
faster than going through stack usual way.
: +> ( addr1 addr2 addr3 -- )
rot @ rot @ + swap ! ;
Of course the above is just an illustration; I mean coding
such word directly in ML. It should be significantly
faster than going through stack usual way.
A set of three addresses on the stack is messy even before
one does anything with them.
Probably because the case where the two operands
of a + are in memory, and the result is needed
in memory is not that frequent.
One example could be matrix multiplication.
It's rather trivial but cumbersome operation,
where usually a few transitional variables are
used to maintain clarity of the code.
Probably "bigger" Forth compilers are indeed
already "too good" for the difference to be
(practically) noticeable — still maybe for
simpler Forths, I mean like the ones for DOS
or even for 8-bit machines it would make sense?
One example could be matrix multiplication.
It's rather trivial but cumbersome operation,
where usually a few transitional variables are
used to maintain clarity of the code.
Earlier you wrote about performance, now you switch to clarity of the
code. What is the goal?
If we stick with performance, the fastest version in
[..]
Forth was designed for small machines and very simple implementations.
We have words like "1+" that are beneficial in that setting. We also
have "+!", which is the closest to what you have in mind. But even in
those times nobody went for a word like "+> ( addr1 addr2 addr3 -- )", because it is not useful often enough.
Earlier you wrote about performance, now you switch to clarity of the
code. What is the goal?
Both — one isn't contrary to another.
What I have in mind is: by performing OOS operation
we don't have to employ the whole "Forth machine" to
do the usual things (I mean, of course, the usual
steps described by Brad Rodriguez in his "Moving
Forth" paper).
It comes with a cost: usual Forth words, that use
the stack, are versatile, while such OOS words
aren't that versatile anymore — yet (at least in
the case of ITC non-optimizing Forths) they should
be faster.
Clarity of the code comes as a "bonus" :) yes, we've
got VALUEs and I use them when needed, but their use
still means employing the "Forth machine".
Earlier you wrote about performance, now you switch to clarity of the
code. What is the goal?
Both — one isn't contrary to another.
Sometimes the clearer code is slower and the faster code is less clear
(as in the FAXPY-NOSTRIDE example).
What I have in mind is: by performing OOS operation
we don't have to employ the whole "Forth machine" to
do the usual things (I mean, of course, the usual
steps described by Brad Rodriguez in his "Moving
Forth" paper).
What does "OOS" stand for?
What do you mean with "the usual steps"; I
am not going to read the whole paper and guess which of the code shown
there you have in mind.
[..]
Clarity of the code comes as a "bonus" :) yes, we've
got VALUEs and I use them when needed, but their use
still means employing the "Forth machine".
What do you mean with 'the "Forth machine"', and how does "OOS"
(whatever that is) avoid it?
But, SQUARE is a high-level "colon" definition… [..]” etc.
( https://www.bradrodriguez.com/papers/moving1.htm )
Many of these steps in particular cases can be avoided
by the use of proposed OOS words, making (at least sometimes)
the Forth program faster — and, as a kinda "bonus", clarity
of the code increases.
But, SQUARE is a high-level "colon" definition… [..]” etc.
( https://www.bradrodriguez.com/papers/moving1.htm )
Many of these steps in particular cases can be avoided
by the use of proposed OOS words, making (at least sometimes)
the Forth program faster — and, as a kinda "bonus", clarity
of the code increases.
After having avoided premature optimisation, every 'decent'
Forth programmer will recode some few bottleneck words e.g.
in assembler, where necessary. IOW microbenchmarking SQUARE,
which can be implemented in a handful of lines of machine code
or less, does not bring new insights.
I mean the description how the "Forth machine" works:
"Assume SQUARE is encountered while executing some other Forth word.
Forth's Interpreter Pointer (IP) will be pointing to a cell in memory -- >contained within that "other" word -- which contains the address of the
word SQUARE. (To be precise, that cell contains the address of SQUARE's
Code Field.) The interpreter fetches that address, and then uses it to
fetch the contents of SQUARE's Code Field. These contents are yet
another address -- the address of a machine language subroutine which >performs the word SQUARE. In pseudo-code, this is:
(IP) -> W fetch memory pointed by IP into "W" register
...W now holds address of the Code Field
IP+2 -> IP advance IP, just like a program counter
(assuming 2-byte addresses in the thread)
(W) -> X fetch memory pointed by W into "X" register
...X now holds address of the machine code
JP (X) jump to the address in the X register
This illustrates an important but rarely-elucidated principle: the
address of the Forth word just entered is kept in W. CODE words don't
need this information, but all other kinds of Forth words do.
If SQUARE were written in machine code, this would be the end of the
story: that bit of machine code would be executed, and then jump back to
the Forth interpreter -- which, since IP was incremented, is pointing to
the next word to be executed. This is why the Forth interpreter is
usually called NEXT.
But, SQUARE is a high-level "colon" definition… [..]” etc.
( https://www.bradrodriguez.com/papers/moving1.htm )
Many of these steps in particular cases can be avoided
by the use of proposed OOS words, making (at least sometimes)
the Forth program faster — and, as a kinda "bonus", clarity
of the code increases.
Probably in case of the "optimizing compiler" the gain
may not be too significant, from what I already learned
here, still in the case of simpler compilers — and maybe
especially in the case of the ones created for CPUs not
that suitable for Forth at all (lack of registers, like
8051, for example) — probably it may be advantageous.
By the "Forth machine" I mean that internal work of the
Forth compiler - see the above quote from Brad's paper
- and when we don't need to "fetch memory pointed by
IP into "W" register, advance IP, just like a program
counter" etc. etc. — replacing the whole process,
(which is repeated for each subsequent word again and
again) by a short string of ML instructions — we should
note significant gain in the processing speed.
: +> ( addr1 addr2 addr3 -- )
rot @ rot @ + swap ! ;
Of course the above is just an illustration; I mean coding
such word directly in ML. It should be significantly
faster than going through stack usual way.
A set of three addresses on the stack is messy even before
one does anything with them.
Yep, but I meant the case of, for example:
var1 @ var2 @ + var3 !
The above isn't messy at all.
So IMHO by using such OOS (out-of-stack) operation - coded
directly in ML - we can replace the above by:
var1 var2 var3 +>
...
In case of slower ITC non-optimizing Forths - like
fig-Forth, as the most obvious example - the "boost"
may be noticeable.
I'll check that.
In case of slower ITC non-optimizing Forths - like
fig-Forth, as the most obvious example - the "boost"
may be noticeable.
I'll check that.
code +> ( x y z -- )
dx pop cx pop bx pop 0 [bx] ax mov cx bx xchg
0 [bx] ax add dx bx xchg ax 0 [bx] mov next
end-code
Timing (adjusted for loop time):
var1 @ var2 @ + var3 ! 8019 mS
var1 var2 var3 +> 5657 mS
What Rodriguez describes above is NEXT. As I mentioned in the ealier posting, using a VM with VM registers reduces the number of NEXTs
executed, but if you go for dynamic superinstructions or native-code compilation, the number of NEXTs is reduced even more. And this can
be done while still working with ordinary Forth code, no OOS needed.
And these kinds of compilers can be done with relatively little
effort.
Probably in case of the "optimizing compiler" the gain
may not be too significant, from what I already learned
here, still in the case of simpler compilers — and maybe
especially in the case of the ones created for CPUs not
that suitable for Forth at all (lack of registers, like
8051, for example) — probably it may be advantageous.
I cannot speak about the 8051, but machine Forth is a simple
native-code system and it's stack-based.
By the "Forth machine" I mean that internal work of the
Forth compiler - see the above quote from Brad's paper
- and when we don't need to "fetch memory pointed by
IP into "W" register, advance IP, just like a program
counter" etc. etc. — replacing the whole process,
(which is repeated for each subsequent word again and
again) by a short string of ML instructions — we should
note significant gain in the processing speed.
Yes, dynamic superinstructions provide a good speedup for Gforth, and native-code systems also show a good speedup compared to classic threaded-code systems. But it's not necessary to eliminate the stack
for that. Actually dealing with the stack is orthogonal to
threaded code vs. native code.
I wonder: wouldn't it be useful to have stackless basic
arith operations? I mean instead of fetching the values
first and putting them on the stack, then doing something,
and in the end storing the result somewhere wouldn't it
be practical to use directly the variables? Like this:
: +> ( addr1 addr2 addr3 -- )
rot @ rot @ + swap ! ;
Of course the above is just an illustration; I mean coding
such word directly in ML. It should be significantly
faster than going through stack usual way.
But after I came up with this idea I realized someone
surely invented that before - it looks so obvious — yet
I didn't see it anywhere. Did anyone of you see something
like this in any code? If so — actually why somehow
(probably?) such solution has not become widespread?
Looks good to me; math can be done completely in ML
avoiding "Forth machine" engagement, therefore saving many
cycles.
----
In case of slower ITC non-optimizing Forths - like
fig-Forth, as the most obvious example - the "boost"
may be noticeable.
I'll check that.
code +> ( x y z -- )
dx pop cx pop bx pop 0 [bx] ax mov cx bx xchg
0 [bx] ax add dx bx xchg ax 0 [bx] mov next
end-code
Timing (adjusted for loop time):
var1 @ var2 @ + var3 ! 8019 mS
var1 var2 var3 +> 5657 mS
So even in case of fast DTC Forth, like DX Forth,
it's already something worth of closer attention,
I believe.
I expect even bigger gain in case of older fig-Forth
model.
--
I agree with you - still it does take decent Forth programmer.
Recall the ones described by Jeff Fox? These Forth programmers,
that refused to use Machine Forth just because "they were hired
to program in ANS Forth"?
I don't believe they were be able to recode anything in
assembler - and note, it was about 30 years ago. Since that
time assembler programming became even less popular.
A bit off-topic: I have been in a similar situation when some of
our service engineers were very reluctant to modify inner
software parts of controllers. The guys were not dumb, but with
such modifications comes responsibility when s.th. unexpected
happens like a system crash. So it was more of a legal than a
technical issue.
I have done some work on optimisation on ciforth.
This work has stalled, but the infamous byte prime benchmark,
was in the ballpark of swiftforth and mpeforth.
(Disingenuous, because this was the example I used.)
See https://home.hccnet.nl/a.w.m.van.der.horst/forthlecture5.html
This is about folding, a generalisation of constant folding.
This requires that you know the properties of the Forth Words,
i.e. that you can execute + at compile time, if the inputs
are constant.
[..]
Then I got stalled. I introduced complicated rules [..]
A bit off-topic: I have been in a similar situation when some of
our service engineers were very reluctant to modify inner
software parts of controllers. The guys were not dumb, but with
such modifications comes responsibility when s.th. unexpected
happens like a system crash. So it was more of a legal than a
technical issue.
Yes, I'm aware the reason may be different in the different
case; still Jeff portrayed that situation rather clear way:
they didn't want to use Machine Forth just because "they
were paid for ANS Forth programming", they signed kind of
agreement for that, therefore they "weren't interested" in
any changes etc.
Unfortunately we won't have any opportunity anymore to ask
Jeff for more details.
--
var1 @ var2 @ + var3 !
The above isn't messy at all.
I wonder: wouldn't it be useful to have stackless basicNo. Because you should minimize the use of variables. So if you're using
arith operations? I mean instead of fetching the values
first and putting them on the stack, then doing something,
and in the end storing the result somewhere wouldn't it
be practical to use directly the variables?
I know nothing about Machine Forth.
BTW: is it available for download anywhere (if not
commercial/restricted)?
Now I'm pondering about DO..LOOP construct; actually
probably it doesn't necessarily need to rely on return
stack.
I wonder: wouldn't it be useful to have stackless basicNo. Because you should minimize the use of variables. So if you're using THREE variables, you're definitely doing something VERY WRONG.
arith operations? I mean instead of fetching the values
first and putting them on the stack, then doing something,
and in the end storing the result somewhere wouldn't it
be practical to use directly the variables?
If you wanna write Forth, write Forth. If you wanna write C, write C.
If you can't handle a stack, you're definitely a C programmer. It's very simple..
(add two variables then store result into third one)pop DI
V1 @ V3 ! V2 @ V1 ! V3 @ V2 ! - 40s 150ms
V1 V2 :=: - 15s 260ms
So there is a noticeable difference indeed.
zbigniew2011@gmail.com (LIT) writes:
V1 @ V3 ! V2 @ V1 ! V3 @ V2 ! - 40s 150ms
Too much OOS thinking? Try
V1 @ V2 @ V1 ! V2 !
V1 V2 :=: - 15s 260ms
So there is a noticeable difference indeed.
The question is how often you use these new words in applications.
In case of slower ITC non-optimizing Forths - like
fig-Forth, as the most obvious example - the "boost"
may be noticeable.
I'll check that.
code +> ( x y z -- )
dx pop cx pop bx pop 0 [bx] ax mov cx bx xchg
0 [bx] ax add dx bx xchg ax 0 [bx] mov next
end-code
Timing (adjusted for loop time):
var1 @ var2 @ + var3 ! 8019 mS
var1 var2 var3 +> 5657 mS
So even in case of fast DTC Forth, like DX Forth,
it's already something worth of closer attention,
I believe.
...
So I did some quite basic testing with x86
fig-Forth for DOS. I devised 4 OOS words:
:=: (exchange values among two variables)
pop BX
pop DI
mov AX,[BX]
xchg AX,[DI]
mov [BX],AX
jmp NEXT
++ (increment variable by one)
pop BX
inc WORD PTR [BX}
jmp NEXT
-- (similar to above, just uses dec -- not tested, it'll give same
result)
(add two variables then store result into third one)pop DI
pop BX
mov CX,[BX]
pop BX
mov AX,[BX]
add AX,CX
mov [DI],AX
jmp NEXT
How the simplistic tests have been done:
7 VARIABLE V1
8 VARIABLE V2
9 VARIABLE V3
: TOOK ( t1 t2 -- )
DROP SPLIT TIME@ DROP SPLIT
ROT SWAP - CR ." It took " U. ." seconds and "
- 10 * U. ." milliseconds "
;
: TEST1
1000 0 DO 10000 0 DO
...expression...
LOOP LOOP
;
0 0 TIME! TIME@ TEST TOOK
The results are (for the following expressions):
V1 @ V2 @ + V3 ! - 25s 430ms
V1 V2 V3 +> - 17s 240ms
1 V1 +! - 14s 60ms
V1 ++ - 10s 820ms
V1 @ V3 ! V2 @ V1 ! V3 @ V2 ! - 40s 150ms
V1 V2 :=: - 15s 260ms
So there is a noticeable difference indeed.
I remain skeptical of such optimizations. Not even twice the
performance
and the hope it represents a bottle-neck in order to realize that gain.
Potential alternative is
a pair of operations, say PUSH and POP, and Forth compiler
that replaces pair like V1 @ by PUSH(V1). Note that here
address of V1 is intended to be part to PUSH (so it will
take as much space as separate V1 and @, but is only a
single primitive).
More generally, a simple "optimizer" that replaces short
sequences of Forth primitives by different, shorter sequence
of primitives is likely to give similar gain. However,
chance of match decreases with length of the sequence.
Above you bet on relatively long seqences (and on programmer
writing alternative seqence). Shorter seqences have more
chance of matching, so you need smaller number of them
for similar gain.
One can
do better than using machine stack, namely keeping thing in
registers, but that means generating machine code and doing
optimization.
Forgive me for being contrary, but IMHO use of locals
is much more C-ish than the use of "as many as" three
(OMG!) variables in a single program. ;)
In case of slower ITC non-optimizing Forths - like
fig-Forth, as the most obvious example - the "boost"
may be noticeable.
I'll check that.
code +> ( x y z -- )
dx pop cx pop bx pop 0 [bx] ax mov cx bx xchg
0 [bx] ax add dx bx xchg ax 0 [bx] mov next
end-code
Timing (adjusted for loop time):
var1 @ var2 @ + var3 ! 8019 mS
var1 var2 var3 +> 5657 mS
So even in case of fast DTC Forth, like DX Forth,
it's already something worth of closer attention,
I believe.
I expect even bigger gain in case of older fig-Forth
model.
--
zbigniew2011@gmail.com (LIT) writes:
V1 @ V3 ! V2 @ V1 ! V3 @ V2 ! - 40s 150ms
Too much OOS thinking? Try
V1 @ V2 @ V1 ! V2 !
V1 V2 :=: - 15s 260ms
So there is a noticeable difference indeed.
The question is how often you use these new words in applications.
- anton
I remain skeptical of such optimizations. Not even twice the
performance
and the hope it represents a bottle-neck in order to realize that gain.
I've got a feeling it would have more
of a significance in 8088 era, say IBM 5150
or XTs. 486 is already "too good" probably
to see as much as 50% gain.
I've got working XT board - if I manage to
get at least FDD interface for that (no,
not today... it'll take some time) I'll
do some more testing.
Save yourself the time: use an emulator eg PCem, DOSBox(X) or QEMU
If I apply that rule, I wrote an entire 1000+ line BASIC interpreter
using *three* variables (stack frame pointer, partition pointer and a
counter on the number of currently emitted characters on a line - TAB() remember?).
These words might make sense connected to a sorting application. 1]
Define those words there and don't clobber the global name space.
: :=: ( a b -- ) \ exchange values among two variables
OVER @ >R DUP @ ROT ! R> SWAP ! ;
These words, as I already wrote, were just examples to illustrate the approach, which isn't limited to operations commonly associated
to do sorting kind of work.
I created also ROR/ROL words, that have nothing to do with any
sorting processes:
These words, as I already wrote, were just examples to illustrate the
approach, which isn't limited to operations commonly associated
to do sorting kind of work.
I created also ROR/ROL words, that have nothing to do with any
sorting processes:
I mean, that's the whole thing with Forth - you *can* define any words
you like, based on your needs, extending the basic set of operations
into a whole domain-specific language suited to the problem you're
trying to solve. But that's not in itself a strong argument for adding
XYZ to the "standard" dictionary.*
mhx@iae.nl (mhx) writes:
: :=: ( a b -- ) \ exchange values among two variables
OVER @ >R DUP @ ROT ! R> SWAP ! ;
: :=: ( addr1 addr2 -- )
OVER @ >R DUP @ ROT ! R> SWAP ! ;
I didn't create BASIC interpreters, but I'm afraidI won't call a BASIC interpreter trivial.
just the rather trivial programs can "live" without
handful of variables.
For example: how do you create
even modest (screen-oriented) editor without adding
several variables that reflect its state - where the
cursor is at the moment, what's the filename in use,
how are values of user's settings/preferences - all
that - etc. etc.?
For example: how do you create
even modest (screen-oriented) editor without adding
several variables that reflect its state - where the
cursor is at the moment, what's the filename in use,
how are values of user's settings/preferences - all
that - etc. etc.?
Like I said - arrays are a different thing.
These words, as I already wrote, were just examples to illustrate the
approach, which isn't limited to operations commonly associated
to do sorting kind of work.
I created also ROR/ROL words, that have nothing to do with any
sorting processes:
I mean, that's the whole thing with Forth - you *can* define any words
you like, based on your needs, extending the basic set of operations
into a whole domain-specific language suited to the problem you're
trying to solve. But that's not in itself a strong argument for adding
XYZ to the "standard" dictionary.*
If you could, please, remind me when and where I was
proposing to add these XYZs to standard dictionary?
Thanks in advance!
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
: :=: ( addr1 addr2 -- )
OVER @ >R DUP @ ROT ! R> SWAP ! ;
: ex ( a1 a2 -- ) 2>r 2r@ @ swap @ r> ! r> ! ;
looks a little simpler.
r 2->1 mov -$8[r14],r13 mov -$8[rax],r13 @ 1->1mov -$8[r14],r15 sub r14,$10 mov r13,[r10] mov r13,$00[r13]
1->2 @ 2->2 mov r15,[r15] mov r15,$08[r10]mov r15,[r14] mov r15,[r15] @local0 2->3 add r10,$08
These words, as I already wrote, were just examples to illustrate the
approach, which isn't limited to operations commonly associated
to do sorting kind of work.
I created also ROR/ROL words, that have nothing to do with any
sorting processes:
I mean, that's the whole thing with Forth - you *can* define any words
you like, based on your needs, extending the basic set of operations
into a whole domain-specific language suited to the problem you're
trying to solve. But that's not in itself a strong argument for adding
XYZ to the "standard" dictionary.*
If you could, please, remind me when and where I was
proposing to add these XYZs to standard dictionary?
Thanks in advance!
Then you have an existing application that demonstrates the benefit
after
having examined and ruled out other ways of optimizing the code?
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
You expected me to "have an existing application..." etc. etc.
immediately after I came up with this idea? You mean: within
hours range, literally?
I'd like to create one - unfortunately, I'm busy with others
things.
A lot of the libraries I wrote "just for fun" remain unused for exactly
that reason - I obviously never really needed them to begin with.
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
Oh dear, I hope you don't have a formal education in CS. If so, I'd ask
for my money back. You know - it's not a different term - it's a
different concept, with quite different characteristics.
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
Oh dear, I hope you don't have a formal education in CS. If so, I'd ask
for my money back. You know - it's not a different term - it's a
different concept, with quite different characteristics.
„In computer science, array is a data type that represents
a collection of elements (values or variables), each
selected by one or more indices”
Now feel free to go and ask for your money back.
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
Oh dear, I hope you don't have a formal education in CS. If so, I'd ask
for my money back. You know - it's not a different term - it's a
different concept, with quite different characteristics.
„In computer science, array is a data type that represents
a collection of elements (values or variables), each
selected by one or more indices”
Now feel free to go and ask for your money back.
Interesting.. In your class they taught computer science by Wikipedia?
Didn't they have money for real books? Must have been a real poor city college..
Results (on Zen4):
gforth-fast (development): ...
Paul Rubin <no.email@nospam.invalid> writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
: :=: ( addr1 addr2 -- )
OVER @ >R DUP @ ROT ! R> SWAP ! ;
: ex ( a1 a2 -- ) 2>r 2r@ @ swap @ r> ! r> ! ;
looks a little simpler.
This inspires another one:
: exchange2 ( addr1 addr2 -- )
dup >r @ over @ r> ! swap ! ;
With some other versions this results in the following benchmark
program:
[defined] !@ [if]
: exchange ( addr1 addr2 -- )
over @ swap !@ swap ! ;
[then]
\ Paul Rubin <875xkwo5io.fsf@nightsong.com>
: ex ( addr1 addr2 -- )
2>r 2r@ @ swap @ r> ! r> ! ;
: ex-locals {: x y -- :} x @ y @ x ! y ! ;
\ Anton Ertl
: exchange2 ( addr1 addr2 -- )
dup >r @ over @ r> ! swap ! ;
\ Marcel Hendrix
: :=: ( addr1 addr2 -- )
OVER @ >R DUP @ ROT ! R> SWAP ! ;
variable v1
variable v2
1 v1 !
2 v2 !
: bench ( "name" -- )
v1 v2
:noname ]] 100000000 0 do 2dup [[ parse-name evaluate ]] loop ; [[
execute ;
Results (on Zen4):
gforth-fast (development):
:=: exchange ex ex-locals exchange2
814_881_277 879_389_133 928_825_521 875_574_895 808_543_975 cyc. 3_908_874_164 3_708_891_336 4_508_966_770 4_209_778_557 3_708_865_505 inst.
vfx64 5.43:
:=: ex ex-locals exchange2
335_298_202 432_614_804 928_542_678 336_134_513 cyc. 1_166_400_242 1_366_264_943 2_866_547_067 1_166_280_641 inst.
And here's the code produced by gforth-fast:
:=: ex ex-locals exchange2
over 1->2 2>r 1->0 l 1->1 dup >r 1->1
mov r15,$08[r10] add r10,$08 mov rax,rbp >r 1->1
@ 2->2 mov r15,r13 add r10,$08 mov -$8[r14],r13
mov r15,[r15] mov r13,[r10] lea rbp,-$8[rbp] sub r14,$08
r 2->1 mov -$8[r14],r13 mov -$8[rax],r13 @ 1->1mov -$8[r14],r15 sub r14,$10 mov r13,[r10] mov r13,$00[r13]
sub r14,$08 mov [r14],r15 >l @local0 1->1 over 1->2
dup 1->2 2r@ 0->2 @local0 1->1 mov r15,$08[r10]
mov r15,r13 mov r13,$08[r14] mov rax,rbp @ 2->2
@ 2->2 mov r15,[r14] lea rbp,-$8[rbp] mov r15,[r15]
mov r15,[r15] @ 2->2 mov -$8[rax],r13 r> 2->3
rot 2->3 mov r15,[r15] @ 1->1 mov r9,[r14]
mov r9,$08[r10] swap 2->2 mov r13,$00[r13] add r14,$08
add r10,$08 mov rax,r13 @local1 1->2 ! 3->1
! 3->1 mov r13,r15 mov r15,$08[rbp] mov [r9],r15
mov [r9],r15 mov r15,rax @ 2->2 swap 1->2
1->2 @ 2->2 mov r15,[r15] mov r15,$08[r10]mov r15,[r14] mov r15,[r15] @local0 2->3 add r10,$08
add r14,$08 r> 2->3 mov r9,$00[rbp] ! 2->0
swap 2->3 mov r9,[r14] ! 3->1 mov [r15],r13
add r10,$08 add r14,$08 mov [r9],r15 ;s 0->1
mov r9,r13 ! 3->1 @local1 1->2 mov r13,$08[r10]
mov r13,[r10] mov [r9],r15 mov r15,$08[rbp] add r10,$08
! 3->1 r> 1->2 ! 2->0 mov rbx,[r14]
mov [r9],r15 mov r15,[r14] mov [r15],r13 add r14,$08
;s 1->1 add r14,$08 lp+2 0->1 mov rax,[rbx]
mov rbx,[r14] ! 2->0 mov r13,$08[r10] jmp eax
add r14,$08 mov [r15],r13 add r10,$08
mov rax,[rbx] ;s 0->1 add rbp,$10
jmp eax mov r13,$08[r10] ;s 1->1
add r10,$08 mov rbx,[r14]
mov rbx,[r14] add r14,$08
add r14,$08 mov rax,[rbx]
mov rax,[rbx] jmp eax
jmp eax
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
Results (on Zen4):
gforth-fast (development): ...
It's interesting how little difference there is with gforth-fast. Could
you also do gforth-itc?
exchange2 is a big win with VFX, suggesting its
optimizer could do better with some of the other versions.
On 27/02/2025 07:29, Anton Ertl wrote:...
\ Anton Ertl
: exchange2 ( addr1 addr2 -- )
dup >r @ over @ r> ! swap ! ;
Results (on Zen4):How does a crude definition not involving the R stack compare:
gforth-fast (development):
:=: exchange ex ex-locals exchange2
814_881_277 879_389_133 928_825_521 875_574_895 808_543_975 cyc. >> 3_908_874_164 3_708_891_336 4_508_966_770 4_209_778_557 3_708_865_505 inst. ...
: ex3 over @ over @ 3 pick ! over ! 2drop ;
r 1->1 mov [r10],r13mov -$08[r14],r13 sub r10,$08
2->3 fourth 2->3mov r9,[r14] mov r9,$10[r10]
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
Oh dear, I hope you don't have a formal education in CS. If so, I'd ask >>>> for my money back. You know - it's not a different term - it's a
different concept, with quite different characteristics.
„In computer science, array is a data type that represents
a collection of elements (values or variables), each
selected by one or more indices”
Now feel free to go and ask for your money back.
Interesting.. In your class they taught computer science by Wikipedia?
Didn't they have money for real books? Must have been a real poor city
college..
At least in that college they didn't taught that
„Forth uses FIFO stack” -- as they taught you in
your really rich city college. :]
Anything wrong with the quoted definition?
Another variant:
: exchange ( addr1 addr2 -- )
dup @ rot !@ swap ! ;
This uses the primitive
'!@' ( u1 a-addr -- u2 ) gforth-experimental "store-fetch"
load U2 from A_ADDR, and store U1 there, as atomic operation
I worry that the atomic part will result in it being slower than the
versions that do not use !@.
!@ is now the nonatomic version.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
!@ is now the nonatomic version.
Is the nonatomic one useful often?
We've done without it all this time.
see-code exchange see-code exchange4 see-code exchange2
over 1->2 dup 1->2 dup >r 1->1
mov r15,$08[r12] mov r15,r8 >r 1->1
@ 2->2 @ 2->2 mov -$08[r13],r8
mov r15,[r15] mov r15,[r15] sub r13,$08
swap 2->3 rot 2->3 @ 1->1
add r12,$08 mov r9,$08[r12] mov r8,[r8]
mov r9,r8 add r12,$08 over 1->2
mov r8,[r12] !@ 3->2 mov r15,$08[r12]
!@ 3->2 mov rax,r15 @ 2->2
mov rax,r15 mov r15,[r9] mov r15,[r15]
mov r15,[r9] mov [r9],rax r> 2->3
mov [r9],rax swap 2->3 mov r9,$00[r13]
swap 2->3 add r12,$08 add r13,$08
add r12,$08 mov r9,r8 ! 3->1
mov r9,r8 mov r8,[r12] mov [r9],r15
mov r8,[r12] ! 3->1 swap 1->2
! 3->1 mov [r9],r15 mov r15,$08[r12]
mov [r9],r15 ;s 1->1 add r12,$08
;s 1->1 mov rbx,$00[r13] ! 2->0
mov rbx,$00[r13] add r13,$08 mov [r15],r8
add r13,$08 mov rax,[rbx] ;s 0->1
mov rax,[rbx] jmp eax mov r8,$08[r12]
jmp eax add r12,$08
mov rbx,$00[r13]
add r13,$08
mov rax,[rbx]
jmp eax
3 compared to ROT 2->3.
Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
!@ is now the nonatomic version.
Is the nonatomic one useful often?
Some numbers of uses in the Gforth image:
11 !@
3 atomic!@
66 +!
We've done without it all this time.
Sure, you can replace it with DUP @ >R ! R>. Having a word for that
relieves the programmer of producing such a sequence (possibly with a
bug) and the reader of having to analyse what's going on here.
11 !@
3 atomic!@
66 +!
We've done without it all this time.
Sure, you can replace it with DUP @ >R ! R>. Having a word for that
relieves the programmer of producing such a sequence (possibly with a
bug) and the reader of having to analyse what's going on here.
On 01-03-2025 12:47, Anton Ertl wrote:
11 !@
3 atomic!@
66 +!
We've done without it all this time.
Sure, you can replace it with DUP @ >R ! R>. Having a word for that
relieves the programmer of producing such a sequence (possibly with a
bug) and the reader of having to analyse what's going on here.
I found the sequence exactly twice in my code
However, if it is that rare there is no point in adding it. Creating too >many superfluous abstractions may even get counter productive in the
sense that predefined abstractions are ignored and reinvented.
Oh, so it's simpler way than anyone could guess:
"just use different term, avoid the word 'variable' ".
Done. :)
Oh dear, I hope you don't have a formal education in CS. If so, I'd ask >>>>> for my money back. You know - it's not a different term - it's a
different concept, with quite different characteristics.
„In computer science, array is a data type that represents
a collection of elements (values or variables), each
selected by one or more indices”
Now feel free to go and ask for your money back.
Interesting.. In your class they taught computer science by Wikipedia?
Didn't they have money for real books? Must have been a real poor city
college..
At least in that college they didn't taught that
„Forth uses FIFO stack” -- as they taught you in
your really rich city college. :]
I don't think I ever did that in any publication, but even if I did -
people get confused when calling bit 0 "bit 1" because it represents
"1". They get confused choosing the wrong side when they talk about "big endian". They get confused when classifying the 8088. They go left when
their instructor calls "right".
It's like a spelling error. Only petty people try to use that as a
counter argument. It's a different kind of error compared to proposing "stackless operations" on a stack based language. It's like asking why a Ferrari can't pour a concrete floor.
01:45 -- but you know: Forth's stack works on the rule „last in — first out”, not „first in, first out”. Or am I wrong?No, you're not. I pulled it and I'm uploading an updated version.
Anything wrong with the quoted definition?
Yes. You couldn't produce one. You had to look it up. Something as basic
a concept as "array".
I can't find `DUP @ >R ! R>` (+ variants with spacings)
in any of 1667 files.
However, `DUP @ >R` is found 12 times and `! R>` 29 times.
`DUP @ -ROT !` gets hit 0 times, `DUP >R @ SWAP R> !` once.
And you try to present yourself as an authority after something
like that?
Mr. FIFO, don't you be ridiculous again... :]
And you try to present yourself as an authority after something
like that?
"Mr. Twain - you made a spelling error. And you call yourself the
greatest American writer of the 19th century?"
I told you you were petty.. :)
No, it WASN'T humble "spelling error"; YOU STATED
THAT OUT LOUD, in a complete sentence. :]
BTW: comparing yourself to Twain? It seems you're
not just a greatest "computer scientist" if not in
a world, then at least in this newsgroup, sure :D
— but also the most modest one... :)))
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,030 |
Nodes: | 10 (1 / 9) |
Uptime: | 63:59:06 |
Calls: | 13,350 |
Calls today: | 2 |
Files: | 186,574 |
D/L today: |
2,075 files (560M bytes) |
Messages: | 3,358,635 |