Forum: War Ensemble BBS

Re: Avoid treating the stack as an array [Re: "Back & Forth" isback!]

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 3 11:23:20 2024

From Newsgroup: comp.lang.forth

On 3/09/2024 2:03 am, Buzz McCool wrote:

On 8/30/24 13:32, minforth wrote:

use locals if you have too many parameters

I like this quite a bit. Tell me if I like it too much.

: CylVolLoop {: W: StartHeight W: FinalHeight F: Radius -- Tabular Output :} cr ." Radius " Radius fe.
StartHeight
begin dup FinalHeight <=
while
dup

f

fdup
cr ." Height " fe.
Radius
VolOfCyl
." Volume " fe.
1 +
repeat
drop
cr ;

Under VFX Forth:

see CylVolLoop
...
( 193 bytes, 39 instructions )

\ Without locals...

: CylVolLoop ( StartHeight FinalHeight Radius -- )
cr ." Radius " fdup fe.
swap ( FinalHeight Height)
begin 2dup >= while
dup s>f fdup cr ." Height " fe.
fover ( Height Radius) VolOfCyl ." Volume " fe.
1+
repeat 2drop fdrop
cr ;

see CylVolLoop
...
( 148 bytes, 27 instructions )

--- Synchronet 3.20a-Linux NewsLink 1.114

From Buzz McCool@buzz_mccool@yahoo.com to comp.lang.forth on Mon Sep 2 22:53:54 2024

From Newsgroup: comp.lang.forth

On 9/2/24 18:23, dxf wrote:

Under VFX Forth:

...

\ Without locals...

: CylVolLoop ( StartHeight FinalHeight Radius -- )
cr ." Radius " fdup fe.
swap ( FinalHeight Height)
begin 2dup >= while
dup s>f fdup cr ." Height " fe.
fover ( Height Radius) VolOfCyl ." Volume " fe.
1+
repeat 2drop fdrop
cr ;

see CylVolLoop
...
( 148 bytes, 27 instructions )

Nice. I will study your technique.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 3 17:27:47 2024

From Newsgroup: comp.lang.forth

On 3/09/2024 3:53 pm, Buzz McCool wrote:

On 9/2/24 18:23, dxf wrote:

Under VFX Forth:

...

\ Without locals...

: CylVolLoop ( StartHeight FinalHeight Radius -- )
   cr ." Radius " fdup fe.
   swap ( FinalHeight Height)
   begin 2dup >= while
     dup s>f fdup cr ." Height " fe.
     fover ( Height Radius) VolOfCyl ." Volume " fe.
     1+
   repeat 2drop fdrop
   cr ;

see CylVolLoop
...
( 148 bytes, 27 instructions )

Nice. I will study your technique.

Efficient use of the stack is Moore's technique :)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:18:07 2024

From Newsgroup: comp.lang.forth

On 31-08-2024 07:59, BuzzMcCool wrote:

On 8/30/24 18:05, dxf wrote:

On 31/08/2024 2:04 am, Buzz McCool wrote:

...
Does anyone have suggestions on a better approach when you have
several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess
comments
that don't add value and missing stack parameter comment in colon
definitions.

Thanks for the feedback. Yes I do need to work on my stack parameter comments.

Given that the area of the circle doesn't change - why recalculate that
every time? Ok, I changed VolOfCirc a bit, but it saves me both time and complexity. Note this only works if there is a separate FP stack. Which
is the standard nowadays.

Alternatives:
1. Change the order of parameters (float last);
2. Change the order of parameters (carnal knowledge of the size of a float);
3. Specify the radius as an integer.

: AreaOfCir fdup pi f* f* ;
: VolOfCyl f* ;

: CylVolLoop
cr ." Radius " fdup fe.
AreaOfCir 1+ swap ?do
i s>f fdup cr ." Height " fe.
fover VolOfCyl ." Volume " fe.
loop fdrop
;

Hans Bezemer
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:37:03 2024

From Newsgroup: comp.lang.forth

On 05-09-2024 17:18, Hans Bezemer wrote:

On 31-08-2024 07:59, BuzzMcCool wrote:

On 8/30/24 18:05, dxf wrote:

On 31/08/2024 2:04 am, Buzz McCool wrote:

...
Does anyone have suggestions on a better approach when you have
several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess
comments
that don't add value and missing stack parameter comment in colon
definitions.

Thanks for the feedback. Yes I do need to work on my stack parameter
comments.

Given that the area of the circle doesn't change - why recalculate that every time? Ok, I changed VolOfCirc a bit, but it saves me both time and complexity. Note this only works if there is a separate FP stack. Which
is the standard nowadays.

Alternatives:
1. Change the order of parameters (float last);
2. Change the order of parameters (carnal knowledge of the size of a
float);
3. Specify the radius as an integer.

This is the same routine with a shared stack. Note I used option 3. here
- it retains the same possibilities as the original. Note this is in
4tH. F% is followed by an FP number:

include lib/fp2.4th
include lib/zenconst.4th
include 4pp/lib/float.4pp

: AreaOfCir fdup pi f* f* ;
aka f* VolOfCyl ( 4tH alias)

: CylVolLoop ( radius start end --)
>r >r cr ." Radius " fdup fe.
AreaOfCir r> r> 1+ swap ?do
i s>f fdup cr ." Height " fe.
fover VolOfCyl ." Volume " fe.
loop fdrop cr
;

f% 1.2 1 20 CylVolLoop

Radius 1.E0
Height 1.E0 Volume 3.141592653589793238E0
Height 2.E0 Volume 6.283185307179586476E0
Height 3.E0 Volume 9.42477796076937971E0
...
Height 19.E0 Volume 59.69026041820607152E0
Height 20.E0 Volume 62.83185307179586476E0

--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:42:07 2024

From Newsgroup: comp.lang.forth

On 05-09-2024 17:37, Hans Bezemer wrote:

f% 1.2 1 20 CylVolLoop

Radius 1.E0

Yeah, I copied the last test with the output of the fist test. My bad..
Sorry ;-)

Should have been: f% 1 1 20 CylVolLoop

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From Buzz McCool@buzz_mccool@yahoo.com to comp.lang.forth on Fri Sep 6 14:03:38 2024

From Newsgroup: comp.lang.forth

On 9/5/2024 8:18 AM, Hans Bezemer wrote:

Given that the area of the circle doesn't change - why recalculate that every time?

Excellent observation.

Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack
manipulations afterwards if necessary.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?

--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Sat Sep 7 14:40:41 2024

From Newsgroup: comp.lang.forth

On 06-09-2024 23:03, Buzz McCool wrote:

On 9/5/2024 8:18 AM, Hans Bezemer wrote:

Given that the area of the circle doesn't change - why recalculate
that every time?

Excellent observation.

Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack manipulations afterwards if necessary.

Oh, I talk a lot about locals: don't use them. The point is: you have
random access to locals. So I doubt very much it will help you to
uncover a smart way to do it without them. Basically any non-Forth
Algol-like language will do the job.

And that's in essence you I am opposed to them. It takes out what makes
Forth unique - and the way thinking of Forth unique.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?

I can't really tell. In 4tH (my own implementation) the use of locals
requires an external library - so it always consumes more instructions.
It also heavily depends on the style and the skill of the programmer. If you're a newbie doing a lot of stack acrobatics, I doubt it.

What bothers me most technologically is that parameters flow through the
stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Needless to say this copying, releasing and stuff takes time. Even when
you don't use locals. In all honesty I must state that this overhead is
not always translated to a diminished performance - at least not in the
tests I did.

****
TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality. I also don't like Python, PHP and Perl for
those very same reasons - one because I think its paradigms are
fundamentally flawed, the second and third because of their "have we
thrown in the kitchen sink yet" mentality.

I don't think there will ever be a "Back&Forth" episode on locals -
frankly, because - apart from some demonstrations - there is only one
single, ported program that uses locals in my repository. How can you
teach if you never used them yourself?
****

Note that 4tH features R@, R'@ and R"@ which can server very
conveniently as "local variables" - provided you leave the Return Stack
alone. I learned that trick from the programmer of the FIG editor.

See: https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/lib/gcircle.4th
for a nice example of that one.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 8 14:56:01 2024

From Newsgroup: comp.lang.forth

On 6 Sep 2024 at 23:03:38 CEST, "Buzz McCool" <buzz_mccool@yahoo.com> wrote:

Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack manipulations afterwards if necessary.

Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?

We (MPE) converted much of our TCP/IP stack not to use locals. This
was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
the period (say 15 years ago) were similar. Code density improved by
about 25% and performance by about 50%.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sun Sep 8 16:09:32 2024

From Newsgroup: comp.lang.forth

On Sun, 8 Sep 2024 14:56:01 +0000, Stephen Pelc wrote:

On 6 Sep 2024 at 23:03:38 CEST, "Buzz McCool" <buzz_mccool@yahoo.com>
wrote:

Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack
manipulations afterwards if necessary.

Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

We (MPE) converted much of our TCP/IP stack not to use locals. This
was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
the period (say 15 years ago) were similar. Code density improved by
about 25% and performance by about 50%.

These are good examples of "it depends". And also that one should never
start optimising without profiling. I have had similar experiences in
the
other direction (i.e. with locals) with vector maths.

Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals. The code conversion for your TCP/IP
stack must have taken a lot of programming time, but it must have been
worth it because it paid off on another level.

But when to use or avoid locals is an old argument that has long since
been put to rest. It all depends...
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sun Sep 8 16:27:47 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

We (MPE) converted much of our TCP/IP stack not to use locals. This
was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
the period (say 15 years ago) were similar. Code density improved by
about 25% and performance by about 50%.

So MPE (and Forth, Inc.) discourage the use of locals because they
implement locals inefficiently, and they implement locals
inefficiently because there are so few uses of locals around. A chicken-and-egg problem.

Concerning the conversion of the TCP/IP stack: Have you considered the alternative of spending MPE's time on making the locals implementation
more efficient?

See also:

@InProceedings{ertl22-locals,
author = {M. Anton Ertl},
title = {Are Locals Inevitably Slow?},
crossref = {euroforth22},
pages = {48--49},
url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
video = {https://www.youtube.com/watch?v=tPjSKetEJn0},
OPTnote = {presentation slides},
abstract = {Code quality of locals on two code examples on
various systems}
}

An update on the table for the example:

: 3dup.3 {: a b c :} a b c a b c ;

instr. bytes system
31 117 Gforth AMD64
16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
7 19 lxf 1.6-982-823 32-bit
32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
26 92 VFX Forth 64 5.11 RC2

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Mon Sep 9 17:15:32 2024

From Newsgroup: comp.lang.forth

On 08-09-2024 18:09, minforth wrote:

Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals.

I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you
cease to consider the algorithm itself, but start banging out code.

You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..

I do dispute that "no locals" Forth kills maintainability - or
readability. I'm always happy to see a whole bunch of one-liners.
Doesn't happen to me every day, but often enough. And then you can functionally comment your code. I usually comment it from column 40 on
and at the top of a word.

I've maintained non-trivial programs for *DECADES* without any trouble.
I've plugged in a garbage collection module in my uBasic/4tH interpreter
- and radically changed it later. My rule is: if you can't figure it
out, rewrite it until you do. It happens, but not frequently.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 9 17:34:03 2024

From Newsgroup: comp.lang.forth

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >@InProceedings{ertl22-locals,

author = {M. Anton Ertl},
title = {Are Locals Inevitably Slow?},
crossref = {euroforth22},
pages = {48--49},
url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
video = {https://www.youtube.com/watch?v=tPjSKetEJn0},
OPTnote = {presentation slides},
abstract = {Code quality of locals on two code examples on
various systems}
}

An update on the table for the example:

: 3dup.3 {: a b c :} a b c a b c ;

instr. bytes system
31 117 Gforth AMD64
16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
7 19 lxf 1.6-982-823 32-bit
32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
26 92 VFX Forth 64 5.11 RC2

And here's another update. A recent change in Gforth resulted in more
code, and we now have reverted that change:

instr. bytes system
28 103 Gforth AMD64
16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
7 19 lxf 1.6-982-823 32-bit
32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
26 92 VFX Forth 64 5.11 RC2

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 9 21:16:49 2024

From Newsgroup: comp.lang.forth

On Mon, 9 Sep 2024 15:15:32 +0000, Hans Bezemer wrote:

I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you cease to consider the algorithm itself, but start banging out code.

You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..

The thing is that your train of mind is focused on optimising the
parameter flow via the stack. you are doing stupid work that an
intelligent compiler does automatically today. it makes much more sense
to focus your brainware on the algorithms or automation tasks to be
solved.

Since such algorithms/tasks are mostly formulated mathematically or
logically, an almost 1:1 translation of such formulations by using
locals
is straightforward and less error prone. Use descriptive names and the
code
becomes quasi commented simultaneously.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 10 12:21:30 2024

From Newsgroup: comp.lang.forth

On 10/09/2024 7:16 am, minforth wrote:

...
Since such algorithms/tasks are mostly formulated mathematically or logically, an almost 1:1 translation of such formulations by using
locals
is straightforward and less error prone. Use descriptive names and the
code
becomes quasi commented simultaneously.

Mathematical formulations are typically expressed algebraically. Forth
is stack-based and uses RPN. It's a different world. To use the latter effectively requires a different mindset. Do you really formulate or
sketch out tasks algebraically? For me it ended when I stopped using
BASIC.

--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Sep 10 12:10:06 2024

From Newsgroup: comp.lang.forth

In article <nnd$32690b01$49b74327@97bd85089db44cd3>,
Hans Bezemer <the.beez.speaks@gmail.com> wrote:

On 08-09-2024 18:09, minforth wrote:

Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals.

I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you >cease to consider the algorithm itself, but start banging out code.

You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..

I do dispute that "no locals" Forth kills maintainability - or
readability. I'm always happy to see a whole bunch of one-liners.
Doesn't happen to me every day, but often enough. And then you can >functionally comment your code. I usually comment it from column 40 on
and at the top of a word.

I've maintained non-trivial programs for *DECADES* without any trouble.
I've plugged in a garbage collection module in my uBasic/4tH interpreter
- and radically changed it later. My rule is: if you can't figure it
out, rewrite it until you do. It happens, but not frequently.

I'm cleaning up the editor that I use all the time. It sports dozens of
global variables and it is hard to see why it could dispense with them.

LOCAL is an expensive feature, because they are re-entrant.
Forthers may know where and why an expensive feature is used.

Hans Bezemer

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 10 04:26:51 2024

From Newsgroup: comp.lang.forth

Hans Bezemer <the.beez.speaks@gmail.com> writes:

What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals. Simple
implementations of locals put them in the return stack too.
"Destroying" the stack frame just means adjusting RP when the function
exits. Usually a single instruction.

Needless to say this copying, releasing and stuff takes time.

Similar to DUP (copy) or DROP (release).

In all honesty I must state that this overhead is not always
translated to a diminished performance

Right, I don't think one can assert a performance hit without
measurements supporting the idea.

TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality.

Sure, that's reasonable, it's a matter of what you prefer. That's
harder to take issue with than claims about performance.

I also don't like Python, PHP and Perl for those very same reasons -

Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
compare to something like C, or a hypothetical cleaned up version of C,
or even to Forth with locals ;).
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 10 23:19:29 2024

From Newsgroup: comp.lang.forth

On 10/09/2024 9:26 pm, Paul Rubin wrote:

Hans Bezemer <the.beez.speaks@gmail.com> writes:

What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals. Simple implementations of locals put them in the return stack too.
"Destroying" the stack frame just means adjusting RP when the function
exits. Usually a single instruction.
...

In forth the programmer uses the return stack as a temporary holder. Not
so locals which spill all input to the return stack and then shuffle these to/from the parameter stack. The latter is akin to a novice programmer who uses too many variables.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 11 12:03:05 2024

From Newsgroup: comp.lang.forth

On 10/09/2024 9:26 pm, Paul Rubin wrote:

Hans Bezemer <the.beez.speaks@gmail.com> writes:

What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals.

Looking at an application with 154 colon definitions, only 2 were found
to use the return stack for temporary storage. Even I was surprised :)

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 11 14:32:36 2024

From Newsgroup: comp.lang.forth

On 11/09/2024 12:03 pm, dxf wrote:

On 10/09/2024 9:26 pm, Paul Rubin wrote:

Hans Bezemer <the.beez.speaks@gmail.com> writes:

What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals.

Looking at an application with 154 colon definitions, only 2 were found
to use the return stack for temporary storage. Even I was surprised :)

From the same app:

dup 54
drop 29
swap 22
over 16
2drop 9
rot 8
2dup 3

r 2

2

2swap 1
2nip 1
locals 0

The easiest stack operations (DUP DROP) account for most. SWAP averaged
1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a problem in forth?
It doesn't appear to be.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 11:02:00 2024

From Newsgroup: comp.lang.forth

On 10-09-2024 13:26, Paul Rubin wrote:

Hans Bezemer <the.beez.speaks@gmail.com> writes:

What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals. Simple implementations of locals put them in the return stack too.
"Destroying" the stack frame just means adjusting RP when the function exits. Usually a single instruction.

Needless to say this copying, releasing and stuff takes time.

Similar to DUP (copy) or DROP (release).

In all honesty I must state that this overhead is not always
translated to a diminished performance

Right, I don't think one can assert a performance hit without
measurements supporting the idea.

TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality.

Sure, that's reasonable, it's a matter of what you prefer. That's
harder to take issue with than claims about performance.

I also don't like Python, PHP and Perl for those very same reasons -

Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
compare to something like C, or a hypothetical cleaned up version of C,
or even to Forth with locals ;).

A lot depends on how solid you want to make your implementation. I got
locals in uBasic/4tH.

: exec_local ( --)
[: get_exp 0 max 27 frame dup @ - + min negate cells frame + dup local <
if E.MANYLOC throw else frame @ over ! to frame then ;]
exec_function \ execution semantics for LOCALS()
;

This one reserves room for locals. You may use up to 26 locals per
function since there are 26 letters in the alphabet (duh!).

: exec_param ( --)
frame exec_local frame \ allocate locals, save pointers
begin over over > while cell+ (pop) over ! repeat drop drop
;

If the reserved room has to be initialized by the stack, it calls
EXEC_LOCAL and then copies the values there.

: exec_return ( --)
get_token paren? putback if ['] get_push exec_function then
gpop prog ! frame dup local #local 1- cells + >
if E.NOSCOPE throw ;then @ to frame
;

This one looks whether RETURN returns a value - and if it does, it
pushes this value on the stack. Then it sets the return address. It
checks for the sanity of the stack frame and if okay THEN it finally
updates the stack pointer.

You comfortable left out the initialization of the stack frame. Agreed,
if ALL values are transferred to the return stack the overhead is
minimal. But how often happens that?

Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
compare to something like C, or a hypothetical cleaned up version of C,
or even to Forth with locals ;).

True - but that's not the level of abstraction I'm considering. I think
a language should have a well designed core, surrounded by a
constellation of extensions. Like C with its standard library and Forth
with its word sets. For comparison - C got a few dozen keywords. PHP got
at least two different ways to extend binary extensions alone. A full
Python installation is scattered all over the filesystem, so you got a
hell of a job to extract a single, transferable application. Not to
mention the awkward syntax (although they fixed some of it in v3). In
Perl you always have to wonder which prefix is fashionable today.

Now, I won't say Forth doesn't have its issues. I think IN ESSENCE
recognizers are a beautiful idea. Extend it to strings and you could
eradicate "parsing words" and have something like:

"lib/mylib.4th" include

"Square" : "the square is:" print dup * cr ;

But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
standard. Clean up the dictionary, pump out an executable.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 11:20:14 2024

From Newsgroup: comp.lang.forth

On 30-08-2024 22:32, minforth wrote:

Two classic answers:
use DO..LOOPs to hide away loop indices
use locals if you have too many parameters
(some technical/physical formulas are difficult
or impossible to factorise into smaller words
which would otherwise be the classic Forth mantra)

Tips:
- Use multiple Return Stack registers (R@, R'@, R"@);
- If parameters come in duplets or triplets, use corresponding stack
operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more
palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)

It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Sep 11 09:49:37 2024

From Newsgroup: comp.lang.forth

On Wed, 11 Sep 2024 9:20:14 +0000, Hans Bezemer wrote:

Tips:
- Use multiple Return Stack registers (R@, R'@, R"@);
- If parameters come in duplets or triplets, use corresponding stack operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)

It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Good advice if you can access the return stack directly.

Otherwise, for non-trivial words, it is preferable to let the compiler recognise patterns and save your precious human time. If the compiled
code is too bad, profile and optimise it afterwards.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Wed Sep 11 13:29:03 2024

From Newsgroup: comp.lang.forth

In article <nnd$545e2daa$4f8af75c@548f76d6156a46d8>,
Hans Bezemer <the.beez.speaks@gmail.com> wrote:
<SNIP>

Now, I won't say Forth doesn't have its issues. I think IN ESSENCE >recognizers are a beautiful idea. Extend it to strings and you could >eradicate "parsing words" and have something like:

"lib/mylib.4th" include

"Square" : "the square is:" print dup * cr ;

You have that backward, it must be:

{ "the square is:" print dup * cr } : Square

If there is one thing to preserve in Forth that is the
convention that defining words can parse new names in
the dictionary by forward scanning, without those considered strings.
Here { introduces a denotation, without being a PREFIX (" recognizer"),
such as 0x in 0xDEADBEEF is. It is the same within a definition like
numbers and nowadays strings.

{ "the square is:" print dup * cr } CONSTANT orang_utan
orang_utan DUP : Square : quadrate

But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
standard. Clean up the dictionary, pump out an executable.

I have create a language on that principle, e.g. meta
accepts 2 xt's a build and a run one. meta is the mother of
all defining words:
{ , } { @ } meta CONSTANT
{ CELL ALLOT } { } meta VARIABLE
{ 2 CELLS ALLOT } { } meta 2VARIABLE
{ } { EXECUTE } meta :
{ } { } meta DATA \ My favorite.

CREATE DOES> is the right idea, an object with an allocation
part and a behavior, but the syntax is akward beyond despair.

I have a backlog, busy with preserving projects dating from the
80's, so don't expect a publication soon.

Hans Bezemer

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 14:41:35 2024

From Newsgroup: comp.lang.forth

On 11-09-2024 11:49, minforth wrote:

On Wed, 11 Sep 2024 9:20:14 +0000, Hans Bezemer wrote:

Tips:
- Use multiple Return Stack registers (R@, R'@, R"@);
- If parameters come in duplets or triplets, use corresponding stack
operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more
palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
   E.g. SPIN ( a b c -- c b a)
        STOW ( a b -- a a b)
        RISE ( a b c -- b a c)

It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Good advice if you can access the return stack directly.

Otherwise, for non-trivial words, it is preferable to let the compiler recognise patterns and save your precious human time. If the compiled
code is too bad, profile and optimise it afterwards.

You know - in my experience these kinds of problems mostly manifest
themselves when making my library routines - the stuff you rarely touch afterwards (and even more rarely in a fundamental way).

Putting the application components to work doesn't affect the stack in
the same way. I think there is where the "10x savings" actually are.

Again - just a hunch of mine..

Hans Bezemer
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 12 14:01:10 2024

From Newsgroup: comp.lang.forth

On 11/09/2024 7:20 pm, Hans Bezemer wrote:

On 30-08-2024 22:32, minforth wrote:

Two classic answers:
use DO..LOOPs to hide away loop indices
use locals if you have too many parameters
(some technical/physical formulas are difficult
or impossible to factorise into smaller words
which would otherwise be the classic Forth mantra)

Tips:
- Use multiple Return Stack registers (R@, R'@, R"@);
- If parameters come in duplets or triplets, use corresponding stack operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)

It helps you to THINK in these patterns and more easily recognize them. It depends highly on your coding habits, so it helps to analyze your legacy code to see if they often occur.

swap rot 0
over swap 0
rot swap 1

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Sep 11 23:51:00 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Looking at an application with 154 colon definitions...

From the same app:
The easiest stack operations (DUP DROP) account for most.

Is the code for this app available?

SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a problem in forth? It doesn't appear to be.

The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction
inversion (with a smart compiler, the data ends up in registers that
could be named by locals) or they are stack traffic whose cost has to be compared with the cost of indexed references to locals in the return
stack. I'd agree that they aren't necessary "juggling" which evokes
permuting stuff in the stack outside the usual FIFO order. That does
happpen a little bit though, with OVER, ROT, etc.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Thu Sep 12 00:10:03 2024

From Newsgroup: comp.lang.forth

Hans Bezemer <the.beez.speaks@gmail.com> writes:

You comfortable left out the initialization of the stack
frame. Agreed, if ALL values are transferred to the return stack the
overhead is minimal. But how often happens that?

I don't understand this. {: a b c :} transfers 3 elements from the
parameter stack to the return stack. That has some cost, but it is
offset by avoiding some DUP and similar operations. Is it relevant at
all anyway? Old fashioned Forth interpreters are pretty fast, and if
you're worrying about avoiding a stack transfer here or there, you need
an optimizing compiler.

Adding safety checks has a cost, but once the program appears debugged,
I think Forth philosophy is to turn off the checks.

True - but that's not the level of abstraction I'm considering. I
think a language should have a well designed core, surrounded by a constellation of extensions. Like C with its standard library and
Forth with its word sets.

You might like Lua or Scheme for simple higher level languages with that
style of design. C has some warts but its complexity in terms of
keywords doesn't seem much worse than Forth's core words.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 12 18:21:43 2024

From Newsgroup: comp.lang.forth

On 12/09/2024 4:51 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

Looking at an application with 154 colon definitions...

From the same app:
The easiest stack operations (DUP DROP) account for most.

Is the code for this app available?

Previously posted. You may have seen it.

https://pastebin.com/2xcRSbQW

SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction inversion (with a smart compiler, the data ends up in registers that
could be named by locals) or they are stack traffic whose cost has to be compared with the cost of indexed references to locals in the return
stack. I'd agree that they aren't necessary "juggling" which evokes permuting stuff in the stack outside the usual FIFO order. That does
happpen a little bit though, with OVER, ROT, etc.

If a cost, it's one the programmer can keep to minimum. With locals there's
an upfront cost that can't be avoided. Using registers is appealing until
one realizes a call to an external function necessitates placing it back on
the stack. Costs multiply in the face of many small functions. Moore touches on this in one of his speeches:

"I keep asking that question. What is Forth? Forth is highly factored code.
I don't know anything else to say except that Forth is definitions. If you
have a lot of small definitions you are writing Forth. In order to write a
lot of small definitions you have to have a stack. Stacks are not popular.
Its strange to me that they are not. There is a just lot of pressure from
vested interests that don't like stacks, they like registers. Stacks are not
a solve all problems concept but they are very very useful, especially for
information hiding and you have to have two of them." - Chuck Moore 1999

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Thu Sep 12 09:08:20 2024

From Newsgroup: comp.lang.forth

On Thu, 12 Sep 2024 8:21:43 +0000, dxf wrote:

If a cost, it's one the programmer can keep to minimum. With locals
there's
an upfront cost that can't be avoided. Using registers is appealing
until
one realizes a call to an external function necessitates placing it back
on
the stack. Costs multiply in the face of many small functions.

This is history (or your archaic compiler). Modern compilers try to pass
most parameters through registers.

https://langdev.stackexchange.com/questions/2584/are-modern-compilers-passing-parameters-in-registers-instead-of-on-the-stack
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Thu Sep 12 10:11:36 2024

From Newsgroup: comp.lang.forth

On Thu, 12 Sep 2024 9:08:20 +0000, minforth wrote:

This is history (or your archaic compiler). Modern compilers try to pass
most parameters through registers.

The rules are very complicated, though. One has to account for there
being
too many parameters, for different architectures with different register assignments, for integer and floating-point type parameters, and under
some
circumstances both the registers *and* the stack must be used, where
some
extra 'working space' may, or may not, be needed.

I was very happy when it finally worked on all of our target OSes.

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Thu Sep 12 08:55:26 2024

From Newsgroup: comp.lang.forth

Paul Rubin <no.email@nospam.invalid> writes:

The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction >inversion (with a smart compiler, the data ends up in registers that
could be named by locals)

I don't see an inversion here. The programmer-visible stack abstracts (ideally) the registers in one way, the programmer-visible locals
abstracts them in a different way.

And if we look at the VICHECK example from Nick Nelson's Better Values <http://www.euroforth.org/ef22/papers/nelson-values-slides.pdf> the
version with locals, followed by the version that eliminates the
locals:

: VICHECK {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid
pindex paddr
ELSE \ Index is invalid
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
Z" Invalid index " 2 PICK ZFORMAT Z+
Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
Z" length " Z+ OVER @ ZFORMAT Z+
ERROR
NIP 0 SWAP \ Use zeroth index
THEN ;

So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF. With a
more capable Forth system a synthesis of the two approaches is
possible:

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
{: pindex paddr :}
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

Or one could factor out the code between IF and THEN and stay within
the confines of VFX:

: VIERROR {: pindex paddr -- 0 paddr :}
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
;

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
VIERROR
THEN ;

The check can be simplified, which also simplifies the stack handling:

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
2dup @ u>= IF \ Index is invalid
VIERROR
THEN ;

or they are stack traffic whose cost has to be
compared with the cost of indexed references to locals in the return
stack.

That check often results in the code without locals winning, but that
is, for a large part, due to suboptimal implementations of locals.
Ideally a perfect compiler will produce the same code for code using
locals and for equivalent code using stack manipulation words, because
the data flow is the same. This actually works out in the case of lxf processing various implementations of 3DUP, including a locals-based
one; see <2024Apr10.090038@mips.complang.tuwien.ac.at>. However, in
general Forth systems do not produce perfect results.

I have now looked at what happens for the first two variants of
VICHECK; I have defined the non-standard words as follows to make it
possible to compile the code:

defer dummy
: z" [char] " parse 2drop postpone dummy ; immediate
defer zformat
defer z+
defer >name
defer error

I looked at 3 systems: Gforth (because I work on it); lxf (because it
produces the best results in the 3DUP case); VFX (because it's the
system Nick Nelson uses). The numbers below are the number of bytes
of native code:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)

I'd agree that they aren't necessary "juggling" which evokes
permuting stuff in the stack outside the usual FIFO order. That does
happpen a little bit though, with OVER, ROT, etc.

In particular, in Starting Forth ROT is illustrated with a juggler
(you see the juggling balls right beside her), and the swap dragon
comments: "I hate jugglers".

https://www.forth.com/wp-content/uploads/2015/03/ch2-rot.gif

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Thu Sep 12 10:19:03 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Using registers is appealing until
one realizes a call to an external function necessitates placing it back on >the stack.

Not if the stack item does not live across the call. And even if it
lives across the call and cannot be placed in a callee-saved register,
the save before and restore after the call is amortized typically
across more than one register access on each side of the call.

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Costs multiply in the face of many small functions.

Register allocation is also effective for small functions.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Thu Sep 12 10:31:44 2024

From Newsgroup: comp.lang.forth

I can well imagine that. Some wheels are particularly difficult
to reinvent. For desktop systems, it can therefore make sense
to use an IR (e.g. LLVM or WASM, or simply C) and use the
optimisation functions of proven compilers for this IR.

Sometimes a much simpler solution: use code inlining.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Fri Sep 13 09:37:29 2024

From Newsgroup: comp.lang.forth

On 12/09/2024 8:19 pm, Anton Ertl wrote:

dxf <dxforth@gmail.com> writes:

Using registers is appealing until
one realizes a call to an external function necessitates placing it back on >> the stack.

Not if the stack item does not live across the call. And even if it
lives across the call and cannot be placed in a callee-saved register,
the save before and restore after the call is amortized typically
across more than one register access on each side of the call.

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Costs multiply in the face of many small functions.

Register allocation is also effective for small functions.

Moore talked about registers. It's worth repeating for those who may be new
to forth.

"But such registers raises the question of local variables. There is a lot of
discussion about local variables. That is another aspect of your application
where you can save 100% of the code. I remain adamant that local variables
are not only useless, they are harmful. If you are writing code that needs
them you are writing, non-optimal code" - Chuck Moore 1999

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Sep 13 07:56:37 2024

From Newsgroup: comp.lang.forth

On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

On 12/09/2024 8:19 pm, Anton Ertl wrote:

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Costs multiply in the face of many small functions.

Register allocation is also effective for small functions.

Moore talked about registers. It's worth repeating for those who may be
new
to forth.

"But such registers raises the question of local variables. There is a
lot of
discussion about local variables. That is another aspect of your application
where you can save 100% of the code. I remain adamant that local
variables
are not only useless, they are harmful. If you are writing code that
needs
them you are writing, non-optimal code" - Chuck Moore 1999

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.

Besides, the world has changed a wee bit since then...
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Fri Sep 13 19:47:46 2024

From Newsgroup: comp.lang.forth

On 13/09/2024 5:56 pm, minforth wrote:

On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

On 12/09/2024 8:19 pm, Anton Ertl wrote:

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Costs multiply in the face of many small functions.

Register allocation is also effective for small functions.

Moore talked about registers. It's worth repeating for those who may be
new
to forth.

"But such registers raises the question of local variables. There is a
lot of
discussion about local variables. That is another aspect of your
application
where you can save 100% of the code. I remain adamant that local
variables
are not only useless, they are harmful. If you are writing code that
needs
them you are writing, non-optimal code" - Chuck Moore 1999

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.

Besides, the world has changed a wee bit since then...

Claims made in respect of locals in forth - ease of use, better performance through less 'stack juggling', better readability/maintainability - were all made in the 1980's. What has changed? Forthers today are more willing to believe, to accept the word of authority, lack the interest to discover the truth for themselves? If so, that would be a pity.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Fri Sep 13 03:38:51 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

"I remain adamant that local variables are not only useless, they
are harmful. If you are writing code that needs them you are
writing, non-optimal code" - Chuck Moore 1999 ...

Claims made in respect of locals in forth - ease of use, better
performance through less 'stack juggling', better
readability/maintainability - were all made in the 1980's. What has
changed? Forthers today are more willing to believe, to accept the
word of authority, lack the interest to discover the truth for
themselves?

Is avoiding locals because of the Chuck Moore quote not an example of
accepting the word of authority? And how often do even you care whether
your code is optimal? It's likely difficult to get any interpreted
Forth code to run at better than 1/5th the speed of assembly code. So
if optimization is your main concern, why use Forth to begin with?

I would say that the claim of better performance from locals depends on
the implementation and in any case has to be scrutinized if it matters,
but even if there's a performance loss, that might be an acceptable
trade if the programmer finds offsetting gains in the other areas.

My main programming language for random hacking is Python, which is
possibly 10x slower than interpreted Forth or 50x slower than compiled
Forth or C. Yet it usually doesn't matter unless I'm trying to do
something unusually compute intensive. Once the program is fast enough
to not be annoying to use, I don't need to optimize it more.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Fri Sep 13 13:07:32 2024

From Newsgroup: comp.lang.forth

In article <66e40a42$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:

On 13/09/2024 5:56 pm, minforth wrote:

On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

On 12/09/2024 8:19 pm, Anton Ertl wrote:

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Costs multiply in the face of many small functions.

Register allocation is also effective for small functions.

Moore talked about registers. It's worth repeating for those who may be >>> new
to forth.

"But such registers raises the question of local variables. There is a >>> lot of
discussion about local variables. That is another aspect of your
application
where you can save 100% of the code. I remain adamant that local
variables
are not only useless, they are harmful. If you are writing code that >>> needs
them you are writing, non-optimal code" - Chuck Moore 1999

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.

Besides, the world has changed a wee bit since then...

Claims made in respect of locals in forth - ease of use, better performance >through less 'stack juggling', better readability/maintainability - were all >made in the 1980's. What has changed? Forthers today are more willing to >believe, to accept the word of authority, lack the interest to discover the >truth for themselves? If so, that would be a pity.

I object to locals because it introduce a superfluous extra concept.
It is foreign to a stack oriented language.
Also there are numerous conflicting notations, and giving a name to a
single cell, isn't sufficient. You need not local doubles, floats and structures.
There are people fond of their information hiding aspect, that can
easily be done with normal data and an addition like marking
some words private.
The remaining argument is re-entrancy, an overrated argument.

I am also fond of Algol68/go. A different end of the spectrum,
but it has a common feature that Forth has: consistency.
Local variables break that.

I don't take Moore's word for gospel, but I pay attention, because
he is an accomplished individual.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Jan Coombs@jan4comp.lang.forth@murray-microft.co.uk to comp.lang.forth on Fri Sep 13 13:07:32 2024

From Newsgroup: comp.lang.forth

On Fri, 13 Sep 2024 03:38:51 -0700
Paul Rubin <no.email@nospam.invalid> wrote:

I would say that the claim of better performance from locals depends
on the implementation[...]

Absolutely. As Chucks prime target of interest (hardware) uses LIFO
registers for stacks, only the top top one, or so, R stack items could
be used for restricted local storage (which is also common practice).

I accept that locals are useful, and would like to see hardware stack
engine implementations that support this better while retaining the
performance advantage of a stack cache implemented as LIFO registers
rather than in RAM.

Jan Coombs
--

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 01:12:13 2024

From Newsgroup: comp.lang.forth

On 13/09/2024 8:38 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

"I remain adamant that local variables are not only useless, they
are harmful. If you are writing code that needs them you are
writing, non-optimal code" - Chuck Moore 1999 ...

Claims made in respect of locals in forth - ease of use, better
performance through less 'stack juggling', better
readability/maintainability - were all made in the 1980's. What has
changed? Forthers today are more willing to believe, to accept the
word of authority, lack the interest to discover the truth for
themselves?

Is avoiding locals because of the Chuck Moore quote not an example of accepting the word of authority?

Or I've yet to hear a convincing argument from the locals authorities :)

You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Fri Sep 13 17:59:27 2024

From Newsgroup: comp.lang.forth

Jan Coombs <jan4comp.lang.forth@murray-microft.co.uk> writes:

Absolutely. As Chucks prime target of interest (hardware) uses LIFO >registers for stacks, only the top top one, or so, R stack items could
be used for restricted local storage (which is also common practice).

I accept that locals are useful, and would like to see hardware stack
engine implementations that support this better while retaining the >performance advantage of a stack cache implemented as LIFO registers
rather than in RAM.

AFAIK Chuck Moore implements the stack as SRAM indexed with his stack
pointer; maybe the stack pointer is a rotating shift register with
only one bit set, don't remember.

He also uses an A register in addition to R and the data TOS last I
looked. So much for Chuck Moore denouncing registers. When he
introduced A, some people played with the idea to add A and possibly
more registers to Forth.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Fri Sep 13 18:07:34 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Claims made in respect of locals in forth - ease of use, better performance >through less 'stack juggling', better readability/maintainability - were all >made in the 1980's.

Where can I find claims about better performance? All I have read is
claims about worse performance.

What has changed? Forthers today are more willing to
believe, to accept the word of authority

Is that why you cite Chuck Moore on locals rather than arguing from
facts?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 12:48:45 2024

From Newsgroup: comp.lang.forth

On 14/09/2024 4:07 am, Anton Ertl wrote:

dxf <dxforth@gmail.com> writes:

Claims made in respect of locals in forth - ease of use, better performance >> through less 'stack juggling', better readability/maintainability - were all >> made in the 1980's.

Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.
It's a catch cry that's become synonymous with locals. Identify something wrong with forth and introduce a solution is the gameplay.

What has changed? Forthers today are more willing to
believe, to accept the word of authority

Is that why you cite Chuck Moore on locals rather than arguing from
facts?

The facts AFAICT is locals are an appeal to prejudice. If locals were a bona- fide extension it ought to be crystal clear when to apply them and when not. Vague statements about readability and maintainability don't cut it. The fact is locals challenge and contradict forth - why I'm vitally interested in getting
at the truth of it. The best way I knew of doing that is see whether I needed locals in practice. When the result is good forth coding can stand on its own, why shouldn't I quote Moore.

--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Sep 14 05:47:11 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 2:48:45 +0000, dxf wrote:

The facts AFAICT is locals are an appeal to prejudice.

This is one of the best sentences ever uttered on this forum! :-)
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 06:19:52 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

On 14/09/2024 4:07 am, Anton Ertl wrote:

Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.

Not to me. To me it sounds like a statement about the ease of writing
and reading the code.

The performance of locals vs. stack juggling depends on the
implementation. I know no implementation that performs register
allocation of locals or stack items (except the TOS) to registers
across basic block boundaries. This seems to hurt code with locals
more than code that keeps everything on the stacks. Here's the data
from an earlier posting <2024Sep12.105526@mips.complang.tuwien.ac.at>,
now including data from iForth:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)

The data from iForth is the outlier here, let's look at the code:

Source code:
defer dummy
: z" [char] " parse 2drop postpone dummy ; immediate
defer zformat
defer z+
defer >name
defer error

: VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid
pindex paddr
ELSE \ Index is invalid
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

: VICHECK2 ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
Z" Invalid index " 2 PICK ZFORMAT Z+
Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
Z" length " Z+ OVER @ ZFORMAT Z+
ERROR
NIP 0 SWAP \ Use zeroth index
THEN ;

One difference is that VICHECK2 does not just replace the locals with
stack stuff and eliminate the first branch of the IF, but also
replaces ">NAME 1+" with "CELL- @".

Disassembled code:
VICHECK1 VICHECK2
pop rbx pop rbx
lea rsi, [rsi #-16 +] qword mov rdi, [rsp] qword
mov [esi] dword, rbx push rbx
pop rbx push rdi
lea rsi, [rsi #-16 +] qword push 0 b#
mov [esi] dword, rbx mov rbx, [rsp #16 +] qword
mov rbx, [rsi #16 +] qword pop rdi
mov rbx, [rbx] qword mov rax, rdi
mov rdi, [rsi] qword sub rax, [rbx] qword
cmp rbx, rdi neg rax
jbe $10227337 offset NEAR pop rbx
push [rsi] qword sub rbx, rdi
push [rsi #16 +] qword cmp rax, rbx
jmp $10227395 offset NEAR seta bl
call $10226600 qword-offset movzx rbx, bl
push [rsi] qword neg rbx
call $10226E90 qword-offset cmp rbx, 0 b#
call $10226EB0 qword-offset jne $10227465 offset NEAR
call $10226600 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset mov rbx, [rsp #16 +] qword
push [rsi #16 +] qword push rbx
call $10226ED0 qword-offset call $10226E90 qword-offset
pop rbx call $10226EB0 qword-offset
lea rbx, [rbx 1 +] qword call $10226600 qword-offset
push rbx call $10226EB0 qword-offset
call $10226EB0 qword-offset pop rbx
call $10226600 qword-offset mov rdi, [rsp] qword
call $10226EB0 qword-offset push rbx
mov rbx, [rsi #16 +] qword push [rdi -8 +] qword
push [rbx] qword call $10226EB0 qword-offset
call $10226E90 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset call $10226EB0 qword-offset
call $10226EF0 qword-offset pop rbx
push 0 b# mov rdi, [rsp] qword
push [rsi #16 +] qword push rbx
add rsi, #32 b# push [rdi] qword
; call $10226E90 qword-offset
call $10226EB0 qword-offset
call $10226EF0 qword-offset
pop rbx
pop rdi
mov rdi, 0 d#
mov rcx, rdi
push rcx
push rbx
;

iForth 5.1-mini does not even keep the TOS in a register on basic
block boundaries, which results in pops and pushes at all the
boundaries, especially for the stack-only code. However, in the
actual application (where Z", ZFORMAT etc. don't compile as deferred
words) it would probably inline many of these words which might result
in better code for the stack variant. It does not keep locals in
stack items, either, but accesses them in memory through a separate
stack pointer.

The code at the start of VICHECK2 does not suffer from basic block
boundaries, yet makes less use of registers than I expected. By
contrast, in VICHECK1 iforth discovers that "0 paddr @ within" is
equivalent to "paddr @ u<", while for "0 2 pick @ within" it fails to
make the equivalent discovery.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 18:40:53 2024

From Newsgroup: comp.lang.forth

On 14/09/2024 4:19 pm, Anton Ertl wrote:

dxf <dxforth@gmail.com> writes:

On 14/09/2024 4:07 am, Anton Ertl wrote:

Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.

Not to me. To me it sounds like a statement about the ease of writing
and reading the code.

The performance of locals vs. stack juggling depends on the
implementation.
...

Surely you mean locals vs. forth. The easiest way to achieve performance
in forth is making your stack operations efficient. 'Stack juggling' is
a visual cue that it's not. I'm sorry that you feel forth isn't readable.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 14 01:56:20 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

could be written:

: EMITS {: n char -- :} n 0 ?do char emit loop ;
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 21:56:41 2024

From Newsgroup: comp.lang.forth

On 14/09/2024 6:56 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

could be written:

: EMITS {: n char -- :} n 0 ?do char emit loop ;

Compiling under DX-Forth resulted in a code size of 23 and 26 bytes respectively. Under VFX ...

( 71 bytes, 18 instructions )

( 102 bytes, 28 instructions )

Not only were you able to read forth code, the result was more efficient. Perhaps locals in forth were meant to be clever? That would explain the interest however it's high price to pay.

--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 12:32:07 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

On 12/09/2024 4:51 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

https://pastebin.com/2xcRSbQW

SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

: ARG ( n -- adr len -1 | 0 )
>r 0 0 cmdtail r> 0 ?do
2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 14:52:59 2024

From Newsgroup: comp.lang.forth

Hi,
In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
as:

mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
(c-x)/(c-b) for b <= x < c,
0e elsewere.

defining it with locals:

: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

But defining it without locals ????!!!!!

: tri_mf() ( f: x a b c -- mv) ....

How?

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 15:08:36 2024

From Newsgroup: comp.lang.forth

melahi_ahmed@yahoo.fr (Ahmed) writes:

Hi,
In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
as:

mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
(c-x)/(c-b) for b <= x < c,
0e elsewere.

defining it with locals:

: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

But defining it without locals ????!!!!!

: tri_mf() ( f: x a b c -- mv) ....

How?

I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
that tends to get passed around without changing it. In that case
defining it as a structure in memory and accessing its members there
might be a solution.

But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 14 09:10:58 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Compiling under DX-Forth resulted in a code size of 23 and 26 bytes respectively. Under VFX ...

I can't help it if those compilers generate worse code for the locals
version. Can you conveniently try lxf?

Not only were you able to read forth code, the result was more
efficient.

Sometimes it isn't too hard to read, sometimes it takes head scratching,
and sometimes I can't make any sense of it. The function Anton posted
was an example that didn't make sense. I remember thinking I might sit
down and try to figure it out to rewrite it, but it doesn't seem worth
the effort.

Anyway, if efficiency was important for that example, I'd use CODE.
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:13:51 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 15:08:36 +0000, Anton Ertl wrote:

I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
that tends to get passed around without changing it. In that case
defining it as a structure in memory and accessing its members there
might be a solution.

a, b and are the parameters of the membership function.
Yes, we can use structures, arrays ...

But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

I did it without locals as an exercise. Here it is:

Without locals:

: tri_mf: ( f: a b c )
create frot f, fswap f, f,
does> ( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
f@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over float+ ( ad_a -1|0 ad_b) ( f: x)
fdup f@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup f@ f- ( ad_a) ( f: x-a)
dup f@ ( ad_a) ( f: x-a a)
float+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
float+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
f@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over float+ ( ad_b -1|0 ad_c) ( f: x)
fdup f@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup float+ f@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup float+ ( ad_b ad_c) ( f: x-c)
swap f@ f@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: neg_big
-1e 0e 1e tri_mf: zero
0e 1e 1e309 tri_mf: pos_big

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
0.7e, 1e, 20e}
-10e fuzzify and so on.

\ ---------------

With locals:
: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

: neg_big -1e309 -1e 0e tri_mf() ;
: zero -1e 0e 1e tri_mf() ;
: pos_big 0e 1e 1e309 tri_mf() ;

: fuzzify { f: x }
x neg_big cr f.
x zero cr f.
x pos_big cr f.
;

Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
0.7e, 1e, 20e}
-10e fuzzify and so on.

I notice a great difference in readibality and simplicity when using
locals.

Using gforth under WSL (Windows Subsystem for Linux):

utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:43:52 2024

From Newsgroup: comp.lang.forth

Oops.
Please read micro seconds (us) instead of milli seconds (ms).

Without locals: about 18 us
with locals: about 19 us

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:41:23 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms

Ahmed

Oops.

Please read micro seconds (us) instead of milli seconds (ms).

with locals: about 19 us
without locals: about 18 us

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sat Sep 14 18:54:46 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 17:41:23 +0000, Ahmed wrote:

On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms

Ahmed

Oops.

Please read micro seconds (us) instead of milli seconds (ms).

with locals: about 19 us
without locals: about 18 us

That can't be correct.

In iForth I used dfloats instead of floats
( 4.9ns instead of 7.3ns).
Using structs is not a great idea in this case.

anew -testlocals

: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
df@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
fdup df@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup df@ f- ( ad_a) ( f: x-a)
dup df@ ( ad_a) ( f: x-a a)
dfloat+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
df@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
fdup df@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup dfloat+ df@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
swap df@ df@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: nol_neg_big

: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;

: loc_neg_big -1e309 -1e 0e (tri_mf) ;
: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing ;

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 19:19:25 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 18:54:46 +0000, mhx wrote:

That can't be correct.

You are right.
I find with gforth:

: go 0 do -0.1e neg_big fdrop loop ;

without locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8
times: (67.62 ns)

and with locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
1e8 times: (99.61 ns)

I missused the timing in the previous post.
Thanks for the correction.

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok

-marcel

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 15:17:20 2024

From Newsgroup: comp.lang.forth

On 15/09/2024 2:10 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

Compiling under DX-Forth resulted in a code size of 23 and 26 bytes
respectively. Under VFX ...

I can't help it if those compilers generate worse code for the locals version. Can you conveniently try lxf?

Windows NT/Forth (32 bit):

( 67 bytes, 19 instructions )
( 87 bytes, 24 instructions )

Not only were you able to read forth code, the result was more
efficient.

Sometimes it isn't too hard to read, sometimes it takes head scratching,
and sometimes I can't make any sense of it. The function Anton posted
was an example that didn't make sense. I remember thinking I might sit
down and try to figure it out to rewrite it, but it doesn't seem worth
the effort.

It would be no different were locals used. It would still require one to
sit down and figure out what the code did. The more experienced one is in
the language the easier it is.

Going back to the EMITS example:

- despite lack of comments you quickly deduced what it did
- stack operations were few and simple and still you didn't like it
- your ideal is that every stack operation should go, which is what
you did

If one takes from forth that which makes it efficient, then one takes away
its reason for existence. Unfortunately for forth, this is what locals
users are doing, whether they're aware of it or not.

Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you. I understand. You've stated Python is your language of preference. Forth is mine and I'll program it
the best way I know how.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 15:28:24 2024

From Newsgroup: comp.lang.forth

On 14/09/2024 10:32 pm, Anton Ertl wrote:

dxf <dxforth@gmail.com> writes:

On 12/09/2024 4:51 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

https://pastebin.com/2xcRSbQW

SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a >>>> problem in forth? It doesn't appear to be.

: ARG ( n -- adr len -1 | 0 )
>r 0 0 cmdtail r> 0 ?do
2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

I believe it's well written and efficient.

: 2nip 2swap 2drop ;
: end postpone exit postpone then ; immediate
defer cmdtail ( -- adr len)

: ARG ( n -- adr len -1 | 0 )
>r 0 0 cmdtail r> 0 ?do
2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

VFX:

( 180 bytes, 44 instructions )

The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

I see many small colon definitions and very few variables - global or
local:

integer #TERMS \ number of terminals in DTA file
integer TERM \ working terminal#
variable #DIGIT
variable LEN
integer MAXCHR

The first two are necessarily global and would exist regardless.
The remaining three are used by a group of functions with the view of
keeping them simple. The alternative would be to carry them around as parameters shuffling them from one function to another. That seems
worse to me.
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sun Sep 15 06:17:18 2024

From Newsgroup: comp.lang.forth

On Sat, 14 Sep 2024 19:19:25 +0000, Ahmed wrote:

You are right.
I find with gforth:

: go 0 do -0.1e neg_big fdrop loop ;

without locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8 times: (67.62 ns)

and with locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
1e8 times: (99.61 ns)

I missused the timing in the previous post.
Thanks for the correction.

So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

While the locals version was easy to code, pretty straightforward and
probably bug-free out of the box, how long did it take to code and debug
the stack juggling version?

Say 10 minutes longer. Break-even point would be around 2*10^10 runs,
and the dubious assumption that CPU time is as valuable as human time.
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 07:30:24 2024

From Newsgroup: comp.lang.forth

On Sun, 15 Sep 2024 6:17:18 +0000, minforth wrote:

So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

I think you mean: if you run the code 3*10^8 times it adds up to 1 sec disadvantage.

While the locals version was easy to code, pretty straightforward and probably bug-free out of the box, how long did it take to code and debug
the stack juggling version?

It took me several tries and corrections (and time).

Perhaps, one can factor the code in the does> part.

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 07:35:14 2024

From Newsgroup: comp.lang.forth

On Sun, 15 Sep 2024 7:30:24 +0000, Ahmed wrote:

On Sun, 15 Sep 2024 6:17:18 +0000, minforth wrote:

So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

I think you mean: if you run the code 3*10^8 times it adds up to 1 sec disadvantage.

Oops!
You are right. 3*10^7 times running the code gives about 1 sec
disadvantage.

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 18:14:17 2024

From Newsgroup: comp.lang.forth

On 15/09/2024 3:13 am, Ahmed wrote:

On Sat, 14 Sep 2024 15:08:36 +0000, Anton Ertl wrote:

I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
that tends to get passed around without changing it. In that case
defining it as a structure in memory and accessing its members there
might be a solution.

a, b and are the parameters of the membership function.
Yes, we can use structures, arrays ...

But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

I did it without locals as an exercise. Here it is:

Without locals:

: tri_mf: ( f: a b c )
    create frot f, fswap f, f,
    does>             ( ad_a)           ( f: x)       dup fdup        ( ad_a ad_a)      ( f: x x)
      f@              ( ad_a)           ( f: x x a)
      f>=             ( ad_a -1|0)      ( f: x)       over float+     ( ad_a -1|0 ad_b) ( f: x)
      fdup f@         ( ad_a -1|0)      ( f: x x b)       f< and if       ( ad_a)           ( f: x)         dup f@ f-     ( ad_a)           ( f: x-a)         dup f@        ( ad_a)           ( f: x-a a)         float+        ( ad_b)           ( f: x-a a)         f@ fswap f-                     ( f: x-a b-a)
        f/                              ( f: [x-a]/[b-a])
        exit
      then
      float+          ( ad_b)           ( f: x)       dup fdup        ( ad_b ad_b)      ( f: x x)
      f@              ( ad_b)           ( f: x x b)
      f>=             ( ad_b -1|0)      ( f: x)       over float+     ( ad_b -1|0 ad_c) ( f: x)
      fdup f@         ( ad_b -1|0)      ( f: x x c)       f< and if       ( ad_b)           ( f: x)         dup float+ f@ ( ad_b)           ( f: x c)         f-            ( ad_b)           ( f: x-c)         dup float+    ( ad_b ad_c)      ( f: x-c)         swap f@ f@ f-                   ( f: x-c b-c)
        f/                              ( f: [x-c]/[b-c])
        exit
      then
      drop fdrop
      0e
;

That appears no better than FVALUEs ...

0e fvalue a
0e fvalue b
0e fvalue c
0e fvalue x

: tri_mf() ( f: x a b c -- mv)
to c to b to a to x
x a f>=
x b f< and if
x a f- b a f- f/ exit
then
x b f>=
x c f< and if
c x f- c b f- f/ exit
then
0e
;

--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 08:58:20 2024

From Newsgroup: comp.lang.forth

On Sun, 15 Sep 2024 8:14:17 +0000, dxf wrote:

That appears no better than FVALUEs ...

0e fvalue a
0e fvalue b
0e fvalue c
0e fvalue x

: tri_mf() ( f: x a b c -- mv)
to c to b to a to x
x a f>=
x b f< and if
x a f- b a f- f/ exit
then
x b f>=
x c f< and if
c x f- c b f- f/ exit
then
0e
;

I knew about this solution and also the use of fvariables,
I wanted tri_mf() to be used in defining for example:
neg_big, zero and pos_big like this:

: neg_big -1e309 -1e 0e tri_mf() ;
: zero -1e 0e 1e tri_mf() ;
: pos_big 0e 1e 1e309 tri_mf() ;

It is ok.

Here the fvalues a, b and c are shared between these words without
problem.

Using the same test to estimate the speed (gforth under wsl) gives about
88 ns/call.
: go 0 do -0.1e neg_big fdrop loop ; ok

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08933806 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08499321 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08958042 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09034804 ok

And with fvariables, the timing gives about 86 ns/call

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08831171 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08438598 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08442013 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08619858 ok

( with locals: 99 ns/call,
without locals and no fvalues nor fvariables: 67 ns/call) (see
previous posts)

So naming (cells, ...) ( locals, values, variables, ...) simplifies the elaboration of the solution (code) leaving away heavy stack juggling but
with a loss in speed (not so much).

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:14:53 2024

From Newsgroup: comp.lang.forth

In article <87cyl6396z.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:

dxf <dxforth@gmail.com> writes:

You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

could be written:

: EMITS {: n char -- :} n 0 ?do char emit loop ;

I think TYPE should be the primitive and EMIT should
be handle a 1 char string.

: EMIT DSP@ 1 TYPE DROP ;

Imagine that you have concurrent tasks and one will write
in red, the other in blue. You could lock up the terminal
with undefined escape sequence.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:20:17 2024

From Newsgroup: comp.lang.forth

In article <e29088cacf765cd0da6519e333fa78f1@www.novabbs.com>,
Ahmed <melahi_ahmed@yahoo.fr> wrote:

Hi,
In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
as:

mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
(c-x)/(c-b) for b <= x < c,
0e elsewere.

defining it with locals:

: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

But defining it without locals ????!!!!!

: tri_mf() ( f: x a b c -- mv) ....

How?

locals doesn't help here. flocals maybe, but that
is the whole point. You are halfway through the rabbit hole
if you demand flocals dlocals ..

Ahmed

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:42:26 2024

From Newsgroup: comp.lang.forth

In article <90389fea385c08c72f39d4fdef04d076@www.novabbs.com>,
mhx <mhx@iae.nl> wrote:

On Sat, 14 Sep 2024 17:41:23 +0000, Ahmed wrote:

On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms

Ahmed

Oops.

Please read micro seconds (us) instead of milli seconds (ms).

with locals: about 19 us
without locals: about 18 us

That can't be correct.

In iForth I used dfloats instead of floats
( 4.9ns instead of 7.3ns).
Using structs is not a great idea in this case.

anew -testlocals

: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
df@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
fdup df@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup df@ f- ( ad_a) ( f: x-a)
dup df@ ( ad_a) ( f: x-a a)
dfloat+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
df@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
fdup df@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup dfloat+ df@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
swap df@ df@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: nol_neg_big

: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;

: loc_neg_big -1e309 -1e 0e (tri_mf) ;
: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing ;

This captures the meaning of the problem not good.
Anton Ertl is right that you have to bound a b c
into something, that is more than its parts.

0E0 FDUP FDUP class triangle-function
M: a F@ M; F,
M: b F@ M; F,
M: c F@ M; F,

M: fx ( f1 -- f1 )
FDUP a f>= FDUP b f< and if a f- b a f- f/ exit then
FDUP b f>= FDUP c f< and if c FSWAP f- c b f- f/ exit then
0e M;
endclass

5E0 3E0 1E0 triangle-function orang-utan

orang-utan
2E0 fx F.
4E0 fx F.

Note that I have not introduced anything special, only classes
that you need anyway. These classes are straightforward
generalisation of the CREATE DOES> construct,minus the
awkward syntax.
Note that x is passed as it should, volatile in Forth fashion.
Passing 4 parameters is c-style.

NOTE:
These are presentation of ideas, nothing is tested.

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok

-marcel

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:53:09 2024

From Newsgroup: comp.lang.forth

In article <66e67077$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:

On 14/09/2024 10:32 pm, Anton Ertl wrote:

The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

I see many small colon definitions and very few variables - global or
local:

integer #TERMS \ number of terminals in DTA file
integer TERM \ working terminal#
variable #DIGIT
variable LEN
integer MAXCHR

The first two are necessarily global and would exist regardless.
The remaining three are used by a group of functions with the view of
keeping them simple. The alternative would be to carry them around as >parameters shuffling them from one function to another. That seems
worse to me.

One anecdote. I had a project that consisted of squashing bugs.
Proud to say that I accurately predicted the timing of each bug
separately and I was not 5 % off for the total.
One bug I refused to get a timing estimate on.
This program was written in c by lispers, and they didn't understand
that some variables are group-local, i.e. in fact global.
There was a variable ERROR , and once set the second time there
was an error this was inspected, and the program was supposed to give up.

The lispers went recursively about it and kept defining new ERROR
that were initialised to false. In case of an error,
this program never stopped.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Sep 15 09:58:23 2024

From Newsgroup: comp.lang.forth

This unearthed a "bug": -1e309 does not fit in a dfloat,
it prints as -Inf.

anew -testlocals

0e dfvalue a PRIVATE
0e dfvalue b PRIVATE
0e dfvalue c PRIVATE

( based on dxf's outline )
: gv_tri_mf ( f: x a b c -- mv )
to c to b to a
fdup a f>= fdup b f< and if a f- b a f- f/ exit endif
fdup b f>= fdup c f< and if c fswap f- c b f- f/ exit endif
0e ;

: gv_neg_big -1e308 ( ! ) -1e 0e gv_tri_mf ;

: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
df@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
fdup df@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup df@ f- ( ad_a) ( f: x-a)
dup df@ ( ad_a) ( f: x-a a)
dfloat+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
df@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
fdup df@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup dfloat+ df@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
swap df@ df@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: nol_neg_big

: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;

: loc_neg_big -1e309 -1e 0e (tri_mf) ;

: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing
CR ." \ globals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e gv_neg_big FDROP LOOP .timing ;

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.4ns/call.
\ globals: 6.2ns/call. ok

Surprisingly, there is hardly a difference between no locals and
global variables. The stack juggling in tri_mf: is merely an
intellectual exercise (in this case).

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (ahmed) to comp.lang.forth on Sun Sep 15 12:06:52 2024

From Newsgroup: comp.lang.forth

On Sun, 15 Sep 2024 9:58:23 +0000, mhx wrote:

This unearthed a "bug": -1e309 does not fit in a dfloat,
it prints as -Inf.

In practice, the universe of discourse of x is bounded [xmin, xmax].
I use normalized univers of discours [-1, 1].
So to get neg_big I just use a big value (absolute value) for the
parameter a (for example: -1e6)

-1e6 -1e 0e tri_mf: neg_big
-1e 0e 1e6 tri_mf: pos_big

and this gives: x is between -2e and 2e for example
neg_big(x) equals approximately 1 for all x less than -1.
pos_big(x) equals approximately 1 for all x greater than 1.

So I don't use 1e309 or -1e309.

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 15 15:04:11 2024

From Newsgroup: comp.lang.forth

On 14 Sep 2024 at 08:19:52 CEST, "Anton Ertl" <Anton Ertl> wrote:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)

There are design decisions within locals that can impact optimisation.
The design of locals in VFX was influenced by Don Colburn's Forth's
and by a desire to use locals to simplify source code when interfacing
to a host operating system. Many operating systems return data
to the caller by passing the address of a variable/buffer as an input parameter. Locals that can have an accessible address make such
code much easier to read and write. The example below comes from
early system access code in VFX (see kernel/386Lin/syspatch.fth).
The locals design dates from long before ANS.

$541B equ FIONREAD

: (OS_key?) { | nread[ cell ] -- flag }
?PrepTerm nread[ off
nread[ FIONREAD stdin @ dll_ioctl @ 3 nxcall -1 = if
0 \ Error return from ioctl
else
nread[ @ 0<>
then
;

: (OS_Key) \ -- key ; SFP003
{ | iobuff[ cell ] -- char }
?PrepTerm
1 iobuff[ stdin @ dll_ReadFile @ 3 nxcall drop
iobuff[ c@
;

Code such as this has been around for a very long time and the use
of addresses of locals, and of local buffers, has proven itself over
time. Yes, we could put in a great effort to improve the performance
of locals, but this is Forth and there are other optimisations that may
produce bigger changes to application performance. In the last
decade or so there has been very little customer demand for
faster code. However, higher level source code has been much
in demand. An example is Nick Nelson's value flavoured structures,
which are of particular merit when converting code from 32 bit to
64 bit host Forths.

Just because many of the Forth applications visible to the Forth
community now run on CPUs with 16 or 32 address registers
does not mean that all systems can implement the compiler
techniques required for high-performance locals.

I can buy a lot of CPU cycles for the cost of one day of programmer
time. I am reminded when looking at locals that a client's Forth
engine is currently at 4GHz on a 12nm process. The performance
was detuned to 4GHz becuase the machine was more than fast
enough.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 09:52:24 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Going back to the EMITS example:

- despite lack of comments you quickly deduced what it did
- stack operations were few and simple and still you didn't like it
- your ideal is that every stack operation should go, which is what
you did

It was the first word in the program that used any stack operations at
all. I saw that it was more concise and imho more readable without
them. Other words there were much harder to read.

If one takes from forth that which makes it efficient, then one takes away its reason for existence. Unfortunately for forth, this is what locals
users are doing, whether they're aware of it or not.

I'm not persuaded that the stack ops make Forth efficient. Certainly
not as much as advanced compilers do, and yet one of the big attractions
of Forth has been very simple interpreters.

On my x86-64 laptop, gcc -c -S -Os on

void emit(char);
void emits(char c, int n) {
while (n-- > 0) emit(c);
}

gives me 27 bytes, 15 instructions, beating all of the Forth examples.
Several of the 14 instructions seem related to passing parameters in
registers. Passing on the stack like in old fashioned systems would
save a few more, at the expense of some speed. So if I want efficiency,
I should use C.

Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you.

I would say efficiency is usually not very important to me, whether in
forth or any other language. It's the usual story of programs having
hot spots. Aim for efficiency in the hot spots and readability and ease
of implementation everywhere else.

Also, you define "forth" as using stack ops instead of locals. I don't
define it that way. Forth with locals is still Forth. They are in the standard after all.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 09:56:42 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

That appears no better than FVALUEs ...

Those are essentially global variables, with all of their issues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sun Sep 15 16:16:34 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

On 14 Sep 2024 at 08:19:52 CEST, "Anton Ertl" <Anton Ertl> wrote:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)

There are design decisions within locals that can impact optimisation.
The design of locals in VFX was influenced by Don Colburn's Forth's
and by a desire to use locals to simplify source code when interfacing
to a host operating system. Many operating systems return data
to the caller by passing the address of a variable/buffer as an input >parameter. Locals that can have an accessible address make such
code much easier to read and write.

Gforth has had variable-flavoured locals from the start, and
implemented VFX's local-buffer syntax some time ago without problems,
so Gforth's design decisions are obviously compatible with these
requirements.

Now Gforth's numbers above are the worst of all Forth systems, so why
would Gforth be relevant? The native code for locals by iForth seems
to be very much in the same spirit: A separate locals stack, and
locals are accessed relative to the locals-stack pointer; and iForth
has the best locals code size of all (but looking at the VFX code, my
guess is that this happens to be in the present case mainly because
iForth uses RSP for the data stack and some other stack for the return
stack). Actually, even with your approach of keeping the locals on
the return stack, and having a separate locals-frame pointer, I don't
see why the locals code should be worse. But looking at the start of
the VFX64 code for VICHECK1, there is a bit of superfluous work:

: VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid

VICHECK1
( 0050A460 488BD4 ) MOV RDX, RSP
( 0050A463 48FF7500 ) PUSH QWORD [RBP]
( 0050A467 53 ) PUSH RBX
( 0050A468 52 ) PUSH RDX
( 0050A469 57 ) PUSH RDI
( 0050A46A 488BFC ) MOV RDI, RSP
( 0050A46D 4881EC00000000 ) SUB RSP, # 00000000
( 0050A474 488B5D08 ) MOV RBX, [RBP+08]
( 0050A478 488D6D10 ) LEA RBP, [RBP+10]
( 0050A47C 488B5710 ) MOV RDX, [RDI+10]
( 0050A480 488B12 ) MOV RDX, 0 [RDX]
( 0050A483 B900000000 ) MOV ECX, # 00000000
( 0050A488 482BD1 ) SUB RDX, RCX
( 0050A48B 488B4718 ) MOV RAX, [RDI+18]
( 0050A48F 482BC1 ) SUB RAX, RCX
( 0050A492 483BC2 ) CMP RAX, RDX
( 0050A495 0F8319000000 ) JNB/AE 0050A4B4

It's not clear to me why you push so much on the return stack at the
start, instead of just the two values pindex and paddr (which you do
in 0050A463 and 0050A467). Ok, you also push old locals-frame pointer
RDI in 0050A469, which is a result of having the locals on the return
stack instead of in a separate stack, but why push the old return
stack pointer? You know the size of your locals, just adjust RSP by
that much in the end.

The instruction at 0050A46D seems superfluous. My guess is that it's
there for the possible | part in the locals definition.

The next two instructions refill the TOS register RBX and adjust the
data stack pointer RBP. That completes the code for the locals
definition. From then on locals are loaded from memory, as
in iforth. Let's also inspect the end:

0 paddr \ Use zeroth index
THEN ;

( 0050A535 488D6DF0 ) LEA RBP, [RBP+-10]
( 0050A539 48C7450000000000 ) MOV QWord [RBP], # 00000000
( 0050A541 48895D08 ) MOV [RBP+08], RBX
( 0050A545 488B5F10 ) MOV RBX, [RDI+10]
( 0050A549 488B6708 ) MOV RSP, [RDI+08]
( 0050A54D 488B3F ) MOV RDI, 0 [RDI]
( 0050A550 C3 ) RET/NEXT

The THEN is right before 0050A549. The code before THEN pushes 0 and paddr
on the data stack, and stores the former TOS in memory before loading
the new TOS. The three instructions after the THEN restore the return
stack and locals-frame pointer and return.

So there is a little bit that can be done without much effort, but not
much.

I always thought that a separate locals stack is a thing I did in
Gforth out of lazyness, and pay for it by having to maintain a
separate stack pointer, but it turns out that with locals on the
return stack, you still need an extra register for locals in memory,
and you spend additional overhead.

In the last
decade or so there has been very little customer demand for
faster code.

See below.

However, higher level source code has been much
in demand. An example is Nick Nelson's value flavoured structures,
which are of particular merit when converting code from 32 bit to
64 bit host Forths.

Gforth has worked on 64-bit hosts since early 1996, and I found that
Forth code tends to have fewer portability problems between 32-bit and
64-bit platforms than C code, and that's not just my code, the
applications in appbench and many others are also quite portable.

A major merit for value-flavoured structures is that you can change
the field size (e.g, from 1 byte to 2 bytes or vice versa) without
changing all the code accessing those fields. That's independent of
cell size.

Just because many of the Forth applications visible to the Forth
community now run on CPUs with 16 or 32 address registers
does not mean that all systems can implement the compiler
techniques required for high-performance locals.

It's obvious that hardly any Forth system implements register
allocation of locals, with the exception being lxf, which uses an
architecture with 8 general-purpose registers (address registers
recall bad memories from the 68000 days); and for lxf, register
allocation is limited to basic blocks or less.

I can buy a lot of CPU cycles for the cost of one day of programmer
time.

Some guy called Stephen Pelc (must be a different one) recentlu posted <vbkdu0$1v8lq$1@dont-email.me>:

|We (MPE) converted much of our TCP/IP stack not to use locals. This
|was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
|the period (say 15 years ago) were similar. Code density improved by
|about 25% and performance by about 50%.

How much time did that conversion cost? And this Stephen Pelc
suggested that Buzz McCool (and probably everyone else) should also
spend their time on avoiding and eliminating locals from their code.

I am with you here, not with the other Stephen Pelc: Programmers
should use locals liberally if it saves them time, even in the face of
slow locals implementations, because you can buy a lot of CPU cycles
for the additional programming cost of avoiding locals.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 12:39:28 2024

From Newsgroup: comp.lang.forth

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF.

Is the repeated mention just a matter of DRY, assuming the compiler puts
the locals in registers so that the extra mention doesn't transfer them
between stacks a second time? I do prefer your version where you factor
out VIERROR.

I wonder whether Moore's 1999 aversion to locals had something to do
with his hardware designs of that era, where having more registers
(besides T and N) connected to the ALU would have cost silicon and
created timing bottlenecks. Today's mainstream processors have GPR's
anyway, but I wonder what the real problem was with stack caches like
the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

Commenters there say CRISP failed basically because its early
implementation was buggy, it lost an important design win because of the
bugs, and AT&T management then gave up on it.

I remember the SPARC had "register windows" but I don't know if that's
similar or what went wrong with them.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 15 21:35:00 2024

From Newsgroup: comp.lang.forth

On 15 Sep 2024 at 18:16:34 CEST, "Anton Ertl" <Anton Ertl> wrote:

I can buy a lot of CPU cycles for the cost of one day of programmer
time.

Some guy called Stephen Pelc (must be a different one) recentlu posted <vbkdu0$1v8lq$1@dont-email.me>:

|We (MPE) converted much of our TCP/IP stack not to use locals. This
|was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
|the period (say 15 years ago) were similar. Code density improved by
|about 25% and performance by about 50%.

How much time did that conversion cost? And this Stephen Pelc
suggested that Buzz McCool (and probably everyone else) should also
spend their time on avoiding and eliminating locals from their code.

I am with you here, not with the other Stephen Pelc: Programmers
should use locals liberally if it saves them time, even in the face of
slow locals implementations, because you can buy a lot of CPU cycles
for the additional programming cost of avoiding locals.

What you ignore is that the constraints of embedded systems with small
alow CPUs (by comparison with desktop CPUs) are very different from
those of desktop CPUs. Converting the TCP/IP stack was driven by the
client requirement to fit a TCP/IP app into 128k/256k Flash and 16k RAM.

I would not make that trade off today.

So there's only one Stephen Pelc but two application domains.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 14:45:22 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

I would not make that trade off today.
So there's only one Stephen Pelc but two application domains.

I wonder how much effort de-localizing the TCP/IP stack took, compared
to hypothetically updating the compiler to optimize locals more. If the
TCP/IP stack code can compile with iForth or lxf, is there a way to
compare the code size with VFX's? I can understand wanting to use VFX
for actual delivery, of course.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 12:46:43 2024

From Newsgroup: comp.lang.forth

On 16/09/2024 2:52 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

Going back to the EMITS example:

- despite lack of comments you quickly deduced what it did
- stack operations were few and simple and still you didn't like it
- your ideal is that every stack operation should go, which is what
you did

It was the first word in the program that used any stack operations at
all. I saw that it was more concise and imho more readable without
them. Other words there were much harder to read.

If one takes from forth that which makes it efficient, then one takes away >> its reason for existence. Unfortunately for forth, this is what locals
users are doing, whether they're aware of it or not.

I'm not persuaded that the stack ops make Forth efficient.

That's been the evidence thus far.

Certainly
not as much as advanced compilers do, and yet one of the big attractions
of Forth has been very simple interpreters.

On my x86-64 laptop, gcc -c -S -Os on

void emit(char);
void emits(char c, int n) {
while (n-- > 0) emit(c);
}

gives me 27 bytes, 15 instructions, beating all of the Forth examples. Several of the 14 instructions seem related to passing parameters in registers. Passing on the stack like in old fashioned systems would
save a few more, at the expense of some speed. So if I want efficiency,
I should use C.

Yes - if you want efficiency with locals use C since C is built upon a
locals paradigm. Also modern cpu's are optimized for the likes of C.

But just because C can beat forth on a benchmark is no reason to dismiss
either Forth or efficient programming. The weak links are the programmer
and the tools he's given. All I ever seem to hear about other languages
is how they make life easy for the programmer. And this is what some are trying to bring to forth. To hell with what they offer I say. The universe gave me a brain. I intend to use it.

Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you.

I would say efficiency is usually not very important to me, whether in
forth or any other language. It's the usual story of programs having
hot spots. Aim for efficiency in the hot spots and readability and ease
of implementation everywhere else.

Also, you define "forth" as using stack ops instead of locals. I don't define it that way. Forth with locals is still Forth. They are in the standard after all.

I don't believe in religion - the priests, the holy books, the promises.
I'll take what is and make the best of it.

--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 14:11:35 2024

From Newsgroup: comp.lang.forth

On 16/09/2024 2:56 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

That appears no better than FVALUEs ...

Those are essentially global variables, with all of their issues.

With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 23:32:28 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

Didn't one of the Chuck Moore quotes you posted say using the stacks was
better for information hiding than using globals? That includes the
return or locals stack, of course. Your computer hardware has the
capability of accessing inside the stack randomly, and Forth has words
like 2ROT which reach up to 6 levels deep in the parameter stack.
What's wrong with being able to give names to the cells? I don't
understand the obsession with refusing to use those capabilities of your hardware.

The central idea of Forth to me is its traditional implementation as a
threaded interpreter with its extremely simple one-pass compiler. That
made it possible to make a complete interactive development environment
on a 1970s minicomputer with a floppy disc. All the language features
like the stack oriented VM are just incidental affordances on the route
to that simple interpreter. To the extent that there is a cult of the
stack machine, I don't belong to it.

Moore calls that 'solving the general problem' - which he eschews.

The idea as I saw it was don't do extra work to solve the general
problem, if a simpler approach solves the immediate problem at hand.

If the general solution takes LESS work then the limited one, then doing
the extra work for the limited solution is just masochism.
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 16 08:48:10 2024

From Newsgroup: comp.lang.forth

Twisting even simple problem solutions to fit the stack machine model
just to make code execution easier in the stack machime falls into
Knuth's famous "Premature Optimization is the Root of all Evil".

There are many parallels with some Forth coding styles: https://www.geeksforgeeks.org/premature-optimization/
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 09:13:24 2024

From Newsgroup: comp.lang.forth

Hi,
Here is another version (no locals (flocals), no fvalues, no
fvariables).
I tried to factor the code little bit.
It gives about 81 ns/call (gforth under wsl).

: x_a_b ( f: x a b c -- x a b c x a b)
3 fpick 3 fpick 3 fpick
;

: x_b_c ( f: x a b c -- x a b c x b c)
3 fpick 2 fpick 2 fpick
;

: fwithin ( f: x r s --) ( -- -1|0)
frot ftuck
f>= f< and
;

: mv ( f: x r s -- mv)
fover f- ( f: x r s-r)
frot frot f- ( f: s-r x-r)
fswap f/
;

: 4fdrop fdrop fdrop fdrop fdrop ;

: tri_mf ( f: x a b c -- mv)
x_a_b fwithin if fdrop mv exit then
x_b_c fwithin if frot fdrop fswap mv exit then
4fdrop 0e
;

: neg_big -1e308 -1e 0e tri_mf ;
: zero -1e 0e 1e tri_mf ;
: pos_big 0e 1e 1e308 tri_mf ;

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

: go 0 do -0.1e neg_big fdrop loop ;

utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08081444 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.0806888 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08064737 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08140588 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08233884 ok

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 20:01:32 2024

From Newsgroup: comp.lang.forth

On 16/09/2024 4:32 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

Didn't one of the Chuck Moore quotes you posted say using the stacks was better for information hiding than using globals?

He didn't elaborate what he meant by 'information hiding'. OTOH he did
say "It is necessary to have variables".

That includes the
return or locals stack, of course. Your computer hardware has the
capability of accessing inside the stack randomly, and Forth has words
like 2ROT which reach up to 6 levels deep in the parameter stack.
What's wrong with being able to give names to the cells? I don't
understand the obsession with refusing to use those capabilities of your hardware.

2ROT assumes '3 pairs' of cells on the stack. But even then, how often is
it used? I can't imagine juggling 6 items - though I can imagine a locals
user doing it.

The central idea of Forth to me is its traditional implementation as a threaded interpreter with its extremely simple one-pass compiler. That
made it possible to make a complete interactive development environment
on a 1970s minicomputer with a floppy disc. All the language features
like the stack oriented VM are just incidental affordances on the route
to that simple interpreter. To the extent that there is a cult of the
stack machine, I don't belong to it.

So you are free of all external influences?

Moore calls that 'solving the general problem' - which he eschews.

The idea as I saw it was don't do extra work to solve the general
problem, if a simpler approach solves the immediate problem at hand.

If the general solution takes LESS work then the limited one, then doing
the extra work for the limited solution is just masochism.

When is a general solution less work? There may be a supposition it
will result in less work in the future but that's far from guaranteed.

--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 10:13:19 2024

From Newsgroup: comp.lang.forth

[..]
FORTH> tnb
\ no locals: 5ns/call.
\ locals: 18.2ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call. ok

This appears not to be a good idea.
The root cause is piling up too many
items on the F-stack (exceeding the
hardware FPU stack limits).

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 10:36:38 2024

From Newsgroup: comp.lang.forth

Thanks for the information.
So the best is clear.

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Mon Sep 16 12:19:25 2024

From Newsgroup: comp.lang.forth

On 15 Sep 2024 at 23:45:22 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

Stephen Pelc <stephen@vfxforth.com> writes:

I would not make that trade off today.
So there's only one Stephen Pelc but two application domains.

I wonder how much effort de-localizing the TCP/IP stack took, compared
to hypothetically updating the compiler to optimize locals more. If the TCP/IP stack code can compile with iForth or lxf, is there a way to
compare the code size with VFX's? I can understand wanting to use VFX
for actual delivery, of course.

On modern desktop CPUs, I would probably spend the effort on
optimising locals more. However, the ability to provide the address
of a local is essential in our world. I have not inspected our code
base to see how many uses of a local declaration of a buffer
: bah {: ... | FOO[ cell ] ... -- :}
there are compared to the use of the ADDR (address) operator
applied to a normally defined local
: bah {: ... | FOO ... -- :}
...
addr FOO

Local buffers are remarkably useful.
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 22:47:10 2024

From Newsgroup: comp.lang.forth

On 16/09/2024 8:13 pm, mhx wrote:

[..]
FORTH> tnb
\ no locals: 5ns/call.
\ locals: 18.2ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call. ok

This appears not to be a good idea.
The root cause is piling up too many
items on the F-stack (exceeding the
hardware FPU stack limits).

FVALUEs may be the way to go for hardware stack.
Is this any better?

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

--- Synchronet 3.20a-Linux NewsLink 1.114

From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 13:21:19 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 12:47:10 +0000, dxf wrote:

On 16/09/2024 8:13 pm, mhx wrote:

[..]
FORTH> tnb
\ no locals: 5ns/call.
\ locals: 18.2ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call. ok

This appears not to be a good idea.
The root cause is piling up too many
items on the F-stack (exceeding the
hardware FPU stack limits).

FVALUEs may be the way to go for hardware stack.
Is this any better?

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

Your solution gives the best speed compared to others. With gforth under
wsl, I find 59ns/call

Here is the code:
\ here is your definition

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

\ and then the code
: neg_big -1e308 -1e 0e tri_mf ;
: zero -1e 0e 1e tri_mf ;
: pos_big 0e 1e 1e308 tri_mf ;

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

: go 0 do -0.1e neg_big fdrop loop ;

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05871598 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05926772 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05896149 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05899284 ok

Ahmed
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 13:33:53 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 12:47:10 +0000, dxf wrote:

[..]

FVALUEs may be the way to go for hardware stack.
Is this any better?

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

No, it (no locals3) is worse. FPICK is a
problem for iForth because in principle
there can be many values on the FPU stack.
The easy way out was to flush to memory
(assuming real Forthers would balk at
PICK and ROLL anyway).

The title of this thread is quite
appropriate: don't pile on the stack,
don't try to grow it, sparingly re-arrange
and then consume items with operators
that do real work.

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 18.3ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call.
\ no locals3: 23.5ns/call. ok

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 16 14:37:50 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 12:19:25 +0000, Stephen Pelc wrote:

Local buffers are remarkably useful.

True. In addition, to pass the address of normal locals
to other words or to external library functions
(pass-by-reference instead of pass-by-value)
I borrowed the address operator & from C, like in:

: FUNC { f: a b -- badr f: aval }
... a \ push value of a to fp-stack
... &b \ push address of b to stack
... ;
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 16:26:51 2024

From Newsgroup: comp.lang.forth

Paul Rubin <no.email@nospam.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF.

Is the repeated mention just a matter of DRY, assuming the compiler puts
the locals in registers so that the extra mention doesn't transfer them >between stacks a second time?

That, too, but the elimination of the ELSE has more weight with me.

In the VICHECK ( pindex paddr -- pindex' paddr ) case this favours the locals-less code. For a word that is similar in having an IF where
only one side has to do something other than to make sure that the
stack effect is satisfied, but with the stack effect ( x1 x2 -- ), the advantage s with locals code:

: WORD1 {: x1 x2 -- :}
... ( f ) if ( )
... x1 ... x2 ...
then ;

: WORD2 ( x1 x2 -- )
... ( f ) if ( x1 x2 )
...
else
2drop
then ;

Forth has a special word ?DUP for one specific variant of this
situation, but it helps only in specific cases.

I wonder whether Moore's 1999 aversion to locals had something to do
with his hardware designs of that era, where having more registers
(besides T and N) connected to the ALU would have cost silicon and
created timing bottlenecks.

I think he had the aversion long before he did such hardware designs.
He has been quoted as thinking that humans should do all they can to
make the computer's work easier (or something like that). While his
sayings, like any religious text, are sufficiently fuzzy to be
interpretable in many ways, his denouncing of locals over the years
makes it clear that he thinks that humans should invest time to write
code with stack manipulation words and globals, so that the compiler
does not need to be bloated by the code for dealing with locals.

Today's mainstream processors have GPR's
anyway, but I wonder what the real problem was with stack caches like
the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

I don't think that the CRISP lived long enough for the real problems
to become big: In contrast to GPRs or the stacks of Chuck Moore's
chips, the stack accesses in CRISP alias with potentially all memory
accesses, so every load of a C variable on a stack may potentially
have to produce the result of a preceding store (and it often actually
is the result of the previous instruction). In the last four decades,
CPU designers have invented a number of techniques for predicting when
loads don't alias earlier stores, and for fast store-to-load
forwarding when they do, but these techniques are not cheap. Even
today, a CPU can do maybe 3 loads and two stores, while they can deal
with a dozen or so input operands in registers, and maybe 6 output
operands in registers. The CRISP's successors would have been
uncompetetive soon after introduction, and I doubt that they would
ever have reached competetive performance.

I remember the SPARC had "register windows" but I don't know if that's >similar or what went wrong with them.

Not at all similar. Register windows were a window into a larger
register file, no aliasing with memory at all; that was treated as a
stack of register windows.

In a similar vein (all heritage of Berkeley RISC) were the AMD 29K's
and the IA-64's register stack. It's interesting that Forthers were
never excited about that; the register stack allows to push or pop
individual registers instead of register windows. I think the pushing
and popping is not a cheap operation, so you would want to use it only
at the call, but you could have used it for one of the Forth stacks,
and avoided some memory accesses that way.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 17:29:20 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

On 15 Sep 2024 at 18:16:34 CEST, "Anton Ertl" <Anton Ertl> wrote:

I can buy a lot of CPU cycles for the cost of one day of programmer
time.

Some guy called Stephen Pelc (must be a different one) recentlu posted
<vbkdu0$1v8lq$1@dont-email.me>:

|We (MPE) converted much of our TCP/IP stack not to use locals. This
|was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
|the period (say 15 years ago) were similar. Code density improved by
|about 25% and performance by about 50%.

How much time did that conversion cost? And this Stephen Pelc
suggested that Buzz McCool (and probably everyone else) should also
spend their time on avoiding and eliminating locals from their code.

I am with you here, not with the other Stephen Pelc: Programmers
should use locals liberally if it saves them time, even in the face of
slow locals implementations, because you can buy a lot of CPU cycles
for the additional programming cost of avoiding locals.

What you ignore is that the constraints of embedded systems with small
alow CPUs (by comparison with desktop CPUs) are very different from
those of desktop CPUs. Converting the TCP/IP stack was driven by the
client requirement to fit a TCP/IP app into 128k/256k Flash and 16k RAM.

I would not make that trade off today.

Interesting. So why mention it in <vbkdu0$1v8lq$1@dont-email.me>
without adding that? And why do you write "What you ignore is [...]"
if the situation has vanished.

In any case, if such a situation still exists or reappears, and/or
customers who want more performance or smaller code appear, it seems
to me that the better (more general, i.e., the ultimate evil in the
eyes of some) solution is a native-code compiler that tries to keep
all values in registers, whether from the data, return, or FP stack or
in locals, and tries to do that throughout the definition, not just in
a basic block.

I should have found the time to do that long ago, maybe some day I
will.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 17:37:19 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

On 15 Sep 2024 at 23:45:22 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

Stephen Pelc <stephen@vfxforth.com> writes:

I would not make that trade off today.
So there's only one Stephen Pelc but two application domains.

I wonder how much effort de-localizing the TCP/IP stack took, compared
to hypothetically updating the compiler to optimize locals more. If the
TCP/IP stack code can compile with iForth or lxf, is there a way to
compare the code size with VFX's? I can understand wanting to use VFX
for actual delivery, of course.

On modern desktop CPUs, I would probably spend the effort on
optimising locals more. However, the ability to provide the address
of a local is essential in our world. I have not inspected our code
base to see how many uses of a local declaration of a buffer
: bah {: ... | FOO[ cell ] ... -- :}
there are compared to the use of the ADDR (address) operator
applied to a normally defined local
: bah {: ... | FOO ... -- :}
...
addr FOO

Yes, that's why Gforth does not support ADDR for locals by default:

: bah {: ... | FOO ... -- :}
...
addr foo
*the terminal*:3:8: error: Unsupported operation
addr >>>foo<<<

If you want that, there are two options: Either make it explicit with
WA: which local should support ADDR:

: bah {: ... | wa: FOO ... -- :}
...
addr foo
;

compiles without error. Alternatively, you can force slow mode on all
locals with DEFAULT-WA:. So

default-wa:

: bah {: ... | FOO ... -- :}
...
addr foo
;

compiles without error.

One intermediate option is to warn about ADDR applied to locals
defined without WA: FA: DA: CA:. Once the program compiles without
any of these warnings, you can set

DEFAULT-W:

to gain the full speed for all the other locals.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 18:58:22 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 17:37:19 +0000, Anton Ertl wrote:

[..]

Yes, that's why Gforth does not support ADDR for locals by default:

iForth supports getting the address of any type local with " 'OF a ".
This indeed has a negative effect on execution time.

The experimental PARAMS| a | construct does not support 'OF and tries
to keep integer locals in a register. It is not successful when there
are too many locals. Maybe I'll repair that with the next major
revision.

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Sep 16 12:16:09 2024

From Newsgroup: comp.lang.forth

mhx@iae.nl (mhx) writes:

This appears not to be a good idea. The root cause is piling up too
many items on the F-stack (exceeding the hardware FPU stack limits).

I wonder if any Forth compilers use SSE instead of the x86 FPU stack.
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 19:26:01 2024

From Newsgroup: comp.lang.forth

mhx@iae.nl (mhx) writes:

The experimental PARAMS| a | construct does not support 'OF and tries
to keep integer locals in a register.

Great. And using the same order as {: ... :} is also great. Now if
only (LOCAL) (which is used by the reference implementation of {:
... :}) used the same mechanism.

I just tried VICHECK1 with PARAMS| ... | instead of {: ... :}

401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)
182 iforth-5.1 mini using PARAMS|

Looking at the code, you store these registered locals on the locals
stack before the IF, and then load them into registers again after the
IF, and then reload them after every call (so apparently the registers
you use for them are caller-saved in iforth). And the problem in this
code is that ever local is used at most once between calls, so storing
it in a caller-saved register results in no better code than storing
it in memory.

Let's see how the 3DUP.3 example fares:

instr. bytes system
28 103 Gforth AMD64
16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
8 11 iforth 5.0.27 PARAMS| (plus 20 bytes entry and return code)
7 19 lxf 1.6-982-823 32-bit
32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
26 92 VFX Forth 64 5.11 RC2

Yes, in the right setting PARAMS| is very nice, too bad it's not used
for (LOCAL) (or directly for {:).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 19:55:29 2024

From Newsgroup: comp.lang.forth

Paul Rubin <no.email@nospam.invalid> writes:

mhx@iae.nl (mhx) writes:

This appears not to be a good idea. The root cause is piling up too
many items on the F-stack (exceeding the hardware FPU stack limits).

I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

Gforth 0.7.9_20240821
[...]
see f+
Code f+
55AF6580BDC1: add rbx,$08
55AF6580BDC5: mov rax,r12
55AF6580BDC8: lea r12,$08[r12]
55AF6580BDCD: addsd xmm15,$08[rax]
55AF6580BDD3: mov rax,[rbx]
55AF6580BDD6: jmp eax

VFX Forth 64 5.11 RC2 [build 0112] 2021-05-02 for Linux x64
[...]
see f+
F+
( 004C4100 F2450F584500 ) ADDSD XMM8, [R13]
( 004C4106 4983C508 ) ADD R13, # 08
( 004C410A C3 ) RET/NEXT
( 11 bytes, 3 instructions )

But:

VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
[...]
see f+
F+
( 00505620 DEC1 ) FADDP ST(1), ST
( 00505622 C3 ) RET/NEXT
( 3 bytes, 2 instructions )

The customers of VFX preferred the 80-bit floats.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 21:43:13 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 19:16:09 +0000, Paul Rubin wrote:

mhx@iae.nl (mhx) writes:

This appears not to be a good idea. The root cause is piling up too
many items on the F-stack (exceeding the hardware FPU stack limits).

I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

iForth would, if my tests had showed any positive effect.
(The effect has to be substantial to outweigh the advantage of 80-bit
floats whenever accuracy counts.)

I wrote routines to process 4 floats. For unfathomable reasons, they
are not nearly as good a pre-packaged library code. There is only
limited potential for standard FP code to benefit from SSE. If
parallelism can't be exploited, SSE does not seem to bring
anything over the old FPU. But maybe my hardware was not
good enough a few years back.

With SSE I need a substantial library for special functions,
which then become relatively slow DLL calls.

The only thing wrong with the FPU is that the special stack
overflow interrupts don't work.

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Tue Sep 17 06:53:43 2024

From Newsgroup: comp.lang.forth

On Mon, 16 Sep 2024 19:26:01 +0000, Anton Ertl wrote:

mhx@iae.nl (mhx) writes:

The experimental PARAMS| a | construct does not support 'OF and tries
to keep integer locals in a register.

[..]

Yes, in the right setting PARAMS| is very nice, too bad it's not used
for (LOCAL) (or directly for {:).

I thought at the time it needed a multi-pass compiler, and the
implications of that looked dark with respect to my goals (my
spare time was limited).

With multiple passes and on-the-fly compilation I think I can do
better (pre-tests in iSPICE).

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Tue Sep 17 08:43:53 2024

From Newsgroup: comp.lang.forth

On 16 Sep 2024 at 21:16:09 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

mhx@iae.nl (mhx) writes:

This appears not to be a good idea. The root cause is piling up too
many items on the F-stack (exceeding the hardware FPU stack limits).

I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

The current VFX 64 bit systems for x64 allow you to select float packs for
80x87 8 item internal stack
hfp87 80x87 external stack
SSE external stack

The external function interface adapts automagically to the pack in use.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Sep 17 10:47:40 2024

From Newsgroup: comp.lang.forth

In article <930672243542a2e04c6cd13d83108af9@www.novabbs.com>,
mhx <mhx@iae.nl> wrote:

On Mon, 16 Sep 2024 19:16:09 +0000, Paul Rubin wrote:

mhx@iae.nl (mhx) writes:

This appears not to be a good idea. The root cause is piling up too
many items on the F-stack (exceeding the hardware FPU stack limits).

I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

iForth would, if my tests had showed any positive effect.
(The effect has to be substantial to outweigh the advantage of 80-bit
floats whenever accuracy counts.)

I wrote routines to process 4 floats. For unfathomable reasons, they
are not nearly as good a pre-packaged library code. There is only
limited potential for standard FP code to benefit from SSE. If
parallelism can't be exploited, SSE does not seem to bring
anything over the old FPU. But maybe my hardware was not
good enough a few years back.

With SSE I need a substantial library for special functions,
which then become relatively slow DLL calls.

The only thing wrong with the FPU is that the special stack
overflow interrupts don't work.

In ciforth:
I added floating point support using the FPU with relatively
little work, especially because the transcendentals are easy.
I suspect that it might be not standard. E.g. F+ exhibits
more precision in 80 bits, and we are supposed to use
either IEEE 32 or 64 bits. Apparently I'm in good company
(iforth and vfxforth).
What does the language lawyers say?

-marcel

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Tue Sep 17 09:12:40 2024

From Newsgroup: comp.lang.forth

On Tue, 17 Sep 2024 8:47:40 +0000, albert@spenarnc.xs4all.nl wrote:

In ciforth:
I added floating point support using the FPU with relatively
little work, especially because the transcendentals are easy.
I suspect that it might be not standard. E.g. F+ exhibits
more precision in 80 bits, and we are supposed to use
either IEEE 32 or 64 bits. Apparently I'm in good company
(iforth and vfxforth).
What does the language lawyers say?

Your are in luck: the internal representation of fp numbers is implementation-defined.

However fp-alignment restrictions must be observed on affected
systems, which in itself is a rather superfluous requirement
since most such systems would crash anyway.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Tue Sep 17 12:18:38 2024

From Newsgroup: comp.lang.forth

On 16-09-2024 18:26, Anton Ertl wrote:

That, too, but the elimination of the ELSE has more weight with me.

: WORD1 {: x1 x2 -- :}
... ( f ) if ( )
... x1 ... x2 ...
then ;

: WORD2 ( x1 x2 -- )
... ( f ) if ( x1 x2 )
...
else
2drop
then ;

You mean - like this?

: WORD2 ( x1 x2 -- )
... ( f ) if ( x1 x2 )
...
exit
then 2drop ;

Forth has a special word ?DUP for one specific variant of this
situation, but it helps only in specific cases.

That's one of the reasons I don't like it - and don't support it
natively. The horror of returning two different stack diagrams..

I loved it when I introduced ;THEN. When doing short words, it allowed
the 4tH optimizer to kick in and make not one, but *TWO* tail call jumps.

Hans Bezemer

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 17 07:24:12 2024

From Newsgroup: comp.lang.forth

Stephen Pelc <stephen@vfxforth.com> writes:

The current VFX 64 bit systems for x64 allow you to select float packs for
80x87 8 item internal stack
hfp87 80x87 external stack
SSE external stack

I guess the next thing is to run that same benchmark with the SSE pack.

With SSE if you want to do transcendentals, is it usual to use a
software library that does the numerics? It seems easier for the
library to use the x87 FPU when it is available.

Maybe some processors will start supporting IEEE 128 bit floating point someday. I know that the RISC-V architecture contains instructions for
it, but I don't know of any processor hardware that does it.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 18 13:08:39 2024

From Newsgroup: comp.lang.forth

On 17/09/2024 2:26 am, Anton Ertl wrote:

Paul Rubin <no.email@nospam.invalid> writes:
...

I wonder whether Moore's 1999 aversion to locals had something to do
with his hardware designs of that era, where having more registers
(besides T and N) connected to the ALU would have cost silicon and
created timing bottlenecks.

I think he had the aversion long before he did such hardware designs.
He has been quoted as thinking that humans should do all they can to
make the computer's work easier (or something like that). While his
sayings, like any religious text, are sufficiently fuzzy to be
interpretable in many ways, his denouncing of locals over the years
makes it clear that he thinks that humans should invest time to write
code with stack manipulation words and globals, so that the compiler
does not need to be bloated by the code for dealing with locals.

When has Moore required humans to do anything? Did he stand up saying
'Follow me. I'll make you a better programmer, more productive. I'll
provide you with compilers and a Standard.'? No. That was others doing.
When the latter had attracted enough of a following they were self-
sufficient - didn't need Moore, other than perhaps his presence. What differentiates Moore and the group promoting Forth (their version of it),
is Moore has never changed his position, switched his tune, introduced
locals and mega-compilers - as the latter do today in an attempt to
maintain the interest, maintain a following. Of what use are leaders
without followers.

"Let me use a tool which I appreciate and if everyone can't use this
tool well, sorry, but that is not my goal." - C.M.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 17 22:39:51 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

...Moore has never changed his position, switched his tune, introduced
locals and mega-compilers - as the latter do today in an attempt to
maintain the interest, maintain a following.

Weren't we just quibbling about small (few percent) efficency
differences between using locals and using stack words? You get far
greater efficiency gains by using optimizing compilers. If you feel
like the optimizing compiler is needless bloat and want to use an
interpreter instead, that's fine, it just means that (like most of us),
you've found that code speed isn't that important for whatever you're
doing. Stephen Pelc posted a few days ago that in the past decade, his customers have stopped asking for faster code.

I think that is a better takeaway than "well we can give up the
optimizing compiler, because using stack words instead of locals
recovers a few percent of the lost speed".
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 18 17:30:30 2024

From Newsgroup: comp.lang.forth

On 18/09/2024 3:39 pm, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

...Moore has never changed his position, switched his tune, introduced
locals and mega-compilers - as the latter do today in an attempt to
maintain the interest, maintain a following.

Weren't we just quibbling about small (few percent) efficency
differences between using locals and using stack words? You get far
greater efficiency gains by using optimizing compilers. If you feel
like the optimizing compiler is needless bloat and want to use an
interpreter instead, that's fine, it just means that (like most of us), you've found that code speed isn't that important for whatever you're
doing. Stephen Pelc posted a few days ago that in the past decade, his customers have stopped asking for faster code.

I think that is a better takeaway than "well we can give up the
optimizing compiler, because using stack words instead of locals
recovers a few percent of the lost speed".

I think we're retreading old ground. Orders of 30% reduction in code
size were in respect of optimizing compilers (VFX). It's consistently
the case. There may be less to gain for floating point but even there
locals don't have much over globals. Can't speak for MPE customers
but neither can I ignore what I see. I can assure you I don't find
using stack operators a burden. Indeed I find them reassuring as it
puts me in control. Forth is a niche language. If there's success to
be had, it will be on its own merits and not ideas imported from other languages.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Sep 18 13:10:31 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

I think we're retreading old ground. Orders of 30% reduction in code
size were in respect of optimizing compilers (VFX).

That 30% difference was because VFX doesn't attempt to optimize locals.
If two pieces of code are obviously equivalent (the locals and no-locals version of EMITS) then a fancier optimizing compiler is likely to
generate the same code for both.

What I was getting at though is that VFX even using locals will still
beat the pants off any interpreter, even without locals. So if you have interpreted Forth code using locals and want it to be faster, you get
far more gain compiling it with VFX than you would get by undoing the
locals. If you're already using VFX then yes, you can squeeze out a bit
more performance by not using locals, but that just tells me that the
VFX optimizer is still a work in progress (which is fine).

I can assure you I don't find using stack operators a burden. Indeed
I find them reassuring as it puts me in control.

It's hard for me to understand that. If you're using VFX, the stack
operations are transformed by compiler gyrations to register ops so
SWAP, ROT, etc. generate no code at all, but this is completely out of
sight and you have no control over it. Locals on the other hand (in an interpreter) are equivalent to RPICK at specific offsets in obvious
ways, so there is no loss of control. That also happens with locals in
VFX but it's only because VFX (for now) hasn't pursued optimizing them.

That example using FVALUE just seems to be a loss: the storage cells are constantly tied up even when not active. If that function can be used
in a multitasking environment, you might even need a separate copy for
each task. Significant efficiency loss.

Forth is a niche language. If there's success to be had, it will be
on its own merits and not ideas imported from other languages.

That seems to support looking at any particular feature on its merits.
Adding to that a dislike of standardization, it would seem to be up to
the programmer, with most choices being legitmate for any particular programmer.
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 19 14:14:48 2024

From Newsgroup: comp.lang.forth

On 19/09/2024 6:10 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

I think we're retreading old ground. Orders of 30% reduction in code
size were in respect of optimizing compilers (VFX).

That 30% difference was because VFX doesn't attempt to optimize locals.
If two pieces of code are obviously equivalent (the locals and no-locals version of EMITS) then a fancier optimizing compiler is likely to
generate the same code for both.

What's the evidence? My observation is compilers do not generate native
code independently of the language. Parameter passing strategies differ between C and Forth and this necessarily affects the code compilers lay
down.

...

Forth is a niche language. If there's success to be had, it will be
on its own merits and not ideas imported from other languages.

That seems to support looking at any particular feature on its merits.
Adding to that a dislike of standardization, it would seem to be up to
the programmer, with most choices being legitmate for any particular programmer.

For me it comes down why have I chosen to use Forth. The philosophy of
it appeals to me in a way other languages don't. There's the question
which forth - because forth has essentially split down two paths with
rather incompatible motivations.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 28 13:49:46 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

That 30% difference was because VFX doesn't attempt to optimize locals.

What's the evidence? My observation is compilers do not generate
native code independently of the language. Parameter passing
strategies differ between C and Forth and this necessarily affects the
code compilers lay down.

1) comparisons between VFX and other compilers like iForth, 2) the
observation that there is any difference at all between the generated
code for the two versions of EMITS under VFX.

This isn't a question of C vs Forth. It's two equivalent pieces of
Forth code being compiled by the same optimizing Forth compiler, one
version resulting in worse code instead of identical code.

For me it comes down why have I chosen to use Forth. The philosophy
of it appeals to me in a way other languages don't. There's the
question which forth - because forth has essentially split down two
paths with rather incompatible motivations.

I gather that one path is industrial users who want there to be a
standard with well-supported commercial implementations, and who want to
run development projects with large teams of programmers (the Saudi
airport being the classic example).

I guess the other path is something like solo practitioners who don't
really care about standardization, perhaps because they just want the
most direct way to an end result. Philosophical appeal is another such motivation. That's fine too, but partly a matter of personal taste.

What I'm unclear about is what the philosophical purist path has to say
about optimizing compilers. I think anyone wanting to reject locals for reasons of code efficiency, probably should be using a VFX-style
compiler. My own idea of purity says to use a simple interpreter and
accept the speed penalty, using CODE when needed.

FWIW, most of the code I write these days doesn't spend much time on computation. It might spend 100ms retrieving something over the
network, and then 1ms computing. So if the computing part somehow sped
up by 1000x, I wouldn't notice or care about the difference.

FWIW 2, I suspect most computing operations in the real world right now
are spent in GPU kernels or large parallel batch jobs, rather than in
ordinary single-CPU programs.
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sat Sep 28 22:36:12 2024

From Newsgroup: comp.lang.forth

On Sat, 28 Sep 2024 20:49:46 +0000, Paul Rubin wrote:
[..]

FWIW 2, I suspect most computing operations in the real world right
now are spent in GPU kernels or large parallel batch jobs, rather
than in ordinary single-CPU programs.

Analog ( IC ) simulation can't be sped up with parallel tricks
like that. Unless the simulator is written from scratch.

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 29 14:22:26 2024

From Newsgroup: comp.lang.forth

On 29/09/2024 6:49 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

That 30% difference was because VFX doesn't attempt to optimize locals.

What's the evidence? My observation is compilers do not generate
native code independently of the language. Parameter passing
strategies differ between C and Forth and this necessarily affects the
code compilers lay down.

1) comparisons between VFX and other compilers like iForth, 2) the observation that there is any difference at all between the generated
code for the two versions of EMITS under VFX.

This isn't a question of C vs Forth.

Perhaps I misunderstood. So we agree Forth locals are unlikely to ever
match C locals for performance?

It's two equivalent pieces of
Forth code being compiled by the same optimizing Forth compiler, one
version resulting in worse code instead of identical code.

I don't know whether it's possible to make forth code using locals as
efficient as forth code using stack operations. What I do question is
the necessity for it and the wisdom of it.

For me it comes down why have I chosen to use Forth. The philosophy
of it appeals to me in a way other languages don't. There's the
question which forth - because forth has essentially split down two
paths with rather incompatible motivations.

I gather that one path is industrial users who want there to be a
standard with well-supported commercial implementations, and who want to
run development projects with large teams of programmers (the Saudi
airport being the classic example).

According to Elizabeth polyFORTH was used for that project. When c.l.f.
was aflame with 200x standards discussions, I recall asking how it was
no commercial programmers seemed to be participating. She replied words
to the effect they were busy programming. Certainly Forth Inc's early successes didn't rely on the existence of a standard.

I guess the other path is something like solo practitioners who don't
really care about standardization, perhaps because they just want the
most direct way to an end result. Philosophical appeal is another such motivation. That's fine too, but partly a matter of personal taste.

FWIW here's Jeff Fox' take on the topic:

https://www.ultratechnology.com/antiansi.htm
--- Synchronet 3.20a-Linux NewsLink 1.114

From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 29 14:40:58 2024

From Newsgroup: comp.lang.forth

In article <87h69zcxlh.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:
<SNIP>

What I'm unclear about is what the philosophical purist path has to say
about optimizing compilers. I think anyone wanting to reject locals for >reasons of code efficiency, probably should be using a VFX-style
compiler. My own idea of purity says to use a simple interpreter and
accept the speed penalty, using CODE when needed.

Maybe I'm a purist. Indirect threaded code is a clear expression
of programmers intent. That is the ideal foundation on which to
build optimisers. The only requirement for an optimiser is that
the results are the same. The program can be shorter or faster.
Locals are a hindrance.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.
--- Synchronet 3.20a-Linux NewsLink 1.114

From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Sep 29 16:53:11 2024

From Newsgroup: comp.lang.forth

On Sun, 29 Sep 2024 12:40:58 +0000, albert@spenarnc.xs4all.nl wrote:

[..]

Maybe I'm a purist. Indirect threaded code is a clear expression
of programmers intent. That is the ideal foundation on which to
build optimisers.

Do you mean that we can debate endlessly what Forth is exactly,
should be, or should become, while that is a non-issue when
we simply pronounce that a given and frozen ITC implementation
exactly defines Forth?

Once you have such ITC implementation it is possible to
compile to anything else, which then can be made
indistinguishable within a testable margin of error. But
if that is agreed, surely there is no need to start
from ITC.

I personally don't think one can say that ITC expresses (or
is able to express) Forth *better* than DTC, TTC, STC or
native code. Additionally, surely STC is a lot *clearer to
understand* than ITC?

-marcel
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 29 11:33:10 2024

From Newsgroup: comp.lang.forth

albert@spenarnc.xs4all.nl writes:

My own idea of purity says to use a simple interpreter and accept the
speed penalty, using CODE when needed.

Indirect threaded code is a clear expression of programmers intent.
The only requirement for an optimiser is that the results are the
same. The program can be shorter or faster. Locals are a hindrance.

Well, the philsophical idea I'm coming from is that Forth is a difficult languge that makes unusual demands on the programmer. That is a cost of
using it. In exchange it gives extreme simplicity and clarity of implementation, and the ability to host itself on very limited machines.
Those are benefits.

If you're going to implement an optimizing compiler, you've got the
machine resources to host it and the willingness to deal with its
complexity. That is, you're not really in need of Forth's benefits. So
maybe you can also bypass some of its costs.

Thus, I think of a "pure" Forth as a simple interpreter (maybe not
ITC). Once I have a bigger machine etc., I start thinking about Lisp.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 29 11:44:31 2024

From Newsgroup: comp.lang.forth

dxf <dxforth@gmail.com> writes:

Perhaps I misunderstood. So we agree Forth locals are unlikely to
ever match C locals for performance?

This I don't know. If the issue is parameter passing in registers,
maybe a fancy enough Forth compiler could do that.

I don't know whether it's possible to make forth code using locals as efficient as forth code using stack operations. What I do question is
the necessity for it and the wisdom of it.

I think in case of an interpreter, locals might be more efficient, since
as the thread title says, they treat the stack as an array. The
hardware is built to do that, so why not use it? With an optimizing
compiler, I think they should usually be equivalent in principle.

Certainly Forth Inc's early successes didn't rely on the existence of
a standard.

In those days there was only one significant implementation ;).

https://www.ultratechnology.com/antiansi.htm

I remember that from a while back and will look at again. The context
though was a Forth chip with stack hardware, being compared against a
software interpreter.

I miss Jeff but must also remember that he was sometimes prone to
hyperbole.

Do you still use blocks instead of files nowadays?
--- Synchronet 3.20a-Linux NewsLink 1.114

From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 30 16:13:43 2024

From Newsgroup: comp.lang.forth

On 30/09/2024 4:44 am, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

Perhaps I misunderstood. So we agree Forth locals are unlikely to
ever match C locals for performance?

This I don't know. If the issue is parameter passing in registers,
maybe a fancy enough Forth compiler could do that.

IMO no because C doesn't have the complication of a permanent parameter
stack. C typically pushes parameters onto the cpu stack which are the
locals, and which the calling function eventually discards. In forth
locals amount to a 2-step process - pushing parameters onto the data stack, pulling them off as locals and potentially storing them back. Contrary
to what one may imagine this is more costly than 'stack juggling' which
has become a pejorative. Forth has a data stack. It's left to the user
to optimize it, or to abuse it, as he sees fits.

I don't know whether it's possible to make forth code using locals as
efficient as forth code using stack operations. What I do question is
the necessity for it and the wisdom of it.

I think in case of an interpreter, locals might be more efficient, since
as the thread title says, they treat the stack as an array. The
hardware is built to do that, so why not use it? With an optimizing compiler, I think they should usually be equivalent in principle.

I don't understand the reference to 'interpreter'. Having an interactive environment with incremental compiler is very convenient but mostly I'm
coding for a target, the same as any C programmer.

...
Do you still use blocks instead of files nowadays?

For applications I've always used files as that's the norm for CP/M
and MS-DOS. ANS-style file functions suit this very well. For forth
source I use files organized as 'screens'. DX-Forth comes with TED -
a regular text editor that can be used within forth - but personally
I prefer screens.

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Grey Gamer
  Sat Nov 23 07:59:22 2024
  from Show Low, Az via Telnet
- Winston
  Sat Nov 23 07:59:03 2024
  from Kerrville, Tx via SSH
- Microbot
  Fri Nov 22 23:44:07 2024
  from Moore, Ok via Telnet
- Winston
  Fri Nov 22 12:08:27 2024
  from Kerrville, Tx via SSH

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (1 / 9)
Uptime:	133:11:42
Calls:	12,960
Calls today:	2
Files:	186,574
Messages:	3,266,161

Re: Avoid treating the stack as an array [Re: "Back & Forth" isback!]

Who's Online

Recent Visitors

System Info