• Re: Avoid treating the stack as an array [Re: "Back & Forth" isback!]

    From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 3 11:23:20 2024
    From Newsgroup: comp.lang.forth

    On 3/09/2024 2:03 am, Buzz McCool wrote:
    On 8/30/24 13:32, minforth wrote:

    use locals if you have too many parameters

    I like this quite a bit. Tell me if I like it too much.

    : CylVolLoop {: W: StartHeight W: FinalHeight F: Radius -- Tabular Output :} cr ." Radius " Radius fe.
    StartHeight
    begin dup FinalHeight <=
    while
    dup
    f
    fdup
    cr ." Height " fe.
    Radius
    VolOfCyl
    ." Volume " fe.
    1 +
    repeat
    drop
    cr ;


    Under VFX Forth:

    see CylVolLoop
    ...
    ( 193 bytes, 39 instructions )

    \ Without locals...

    : CylVolLoop ( StartHeight FinalHeight Radius -- )
    cr ." Radius " fdup fe.
    swap ( FinalHeight Height)
    begin 2dup >= while
    dup s>f fdup cr ." Height " fe.
    fover ( Height Radius) VolOfCyl ." Volume " fe.
    1+
    repeat 2drop fdrop
    cr ;

    see CylVolLoop
    ...
    ( 148 bytes, 27 instructions )

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Buzz McCool@buzz_mccool@yahoo.com to comp.lang.forth on Mon Sep 2 22:53:54 2024
    From Newsgroup: comp.lang.forth

    On 9/2/24 18:23, dxf wrote:
    Under VFX Forth:
    ...
    \ Without locals...

    : CylVolLoop ( StartHeight FinalHeight Radius -- )
    cr ." Radius " fdup fe.
    swap ( FinalHeight Height)
    begin 2dup >= while
    dup s>f fdup cr ." Height " fe.
    fover ( Height Radius) VolOfCyl ." Volume " fe.
    1+
    repeat 2drop fdrop
    cr ;

    see CylVolLoop
    ...
    ( 148 bytes, 27 instructions )


    Nice. I will study your technique.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 3 17:27:47 2024
    From Newsgroup: comp.lang.forth

    On 3/09/2024 3:53 pm, Buzz McCool wrote:
    On 9/2/24 18:23, dxf wrote:
    Under VFX Forth:
    ...
    \ Without locals...

    : CylVolLoop ( StartHeight FinalHeight Radius -- )
       cr ." Radius " fdup fe.
       swap ( FinalHeight Height)
       begin 2dup >= while
         dup s>f  fdup cr ." Height " fe.
         fover ( Height Radius)  VolOfCyl ." Volume " fe.
         1+
       repeat 2drop fdrop
       cr ;

    see CylVolLoop
    ...
    ( 148 bytes, 27 instructions )


    Nice. I will study your technique.

    Efficient use of the stack is Moore's technique :)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:18:07 2024
    From Newsgroup: comp.lang.forth

    On 31-08-2024 07:59, BuzzMcCool wrote:
    On 8/30/24 18:05, dxf wrote:
    On 31/08/2024 2:04 am, Buzz McCool wrote:
    ...
    Does anyone have suggestions on a better approach when you have
    several parameters and loop counts to deal with?

    I see little wrong with your example other than cosmetics - excess
    comments
    that don't add value and missing stack parameter comment in colon
    definitions.


    Thanks for the feedback. Yes I do need to work on my stack parameter comments.
    Given that the area of the circle doesn't change - why recalculate that
    every time? Ok, I changed VolOfCirc a bit, but it saves me both time and complexity. Note this only works if there is a separate FP stack. Which
    is the standard nowadays.

    Alternatives:
    1. Change the order of parameters (float last);
    2. Change the order of parameters (carnal knowledge of the size of a float);
    3. Specify the radius as an integer.

    : AreaOfCir fdup pi f* f* ;
    : VolOfCyl f* ;

    : CylVolLoop
    cr ." Radius " fdup fe.
    AreaOfCir 1+ swap ?do
    i s>f fdup cr ." Height " fe.
    fover VolOfCyl ." Volume " fe.
    loop fdrop
    ;

    Hans Bezemer
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:37:03 2024
    From Newsgroup: comp.lang.forth

    On 05-09-2024 17:18, Hans Bezemer wrote:
    On 31-08-2024 07:59, BuzzMcCool wrote:
    On 8/30/24 18:05, dxf wrote:
    On 31/08/2024 2:04 am, Buzz McCool wrote:
    ...
    Does anyone have suggestions on a better approach when you have
    several parameters and loop counts to deal with?

    I see little wrong with your example other than cosmetics - excess
    comments
    that don't add value and missing stack parameter comment in colon
    definitions.


    Thanks for the feedback. Yes I do need to work on my stack parameter
    comments.
    Given that the area of the circle doesn't change - why recalculate that every time? Ok, I changed VolOfCirc a bit, but it saves me both time and complexity. Note this only works if there is a separate FP stack. Which
    is the standard nowadays.

    Alternatives:
    1. Change the order of parameters (float last);
    2. Change the order of parameters (carnal knowledge of the size of a
    float);
    3. Specify the radius as an integer.

    This is the same routine with a shared stack. Note I used option 3. here
    - it retains the same possibilities as the original. Note this is in
    4tH. F% is followed by an FP number:

    include lib/fp2.4th
    include lib/zenconst.4th
    include 4pp/lib/float.4pp

    : AreaOfCir fdup pi f* f* ;
    aka f* VolOfCyl ( 4tH alias)

    : CylVolLoop ( radius start end --)
    >r >r cr ." Radius " fdup fe.
    AreaOfCir r> r> 1+ swap ?do
    i s>f fdup cr ." Height " fe.
    fover VolOfCyl ." Volume " fe.
    loop fdrop cr
    ;

    f% 1.2 1 20 CylVolLoop

    Radius 1.E0
    Height 1.E0 Volume 3.141592653589793238E0
    Height 2.E0 Volume 6.283185307179586476E0
    Height 3.E0 Volume 9.42477796076937971E0
    ...
    Height 19.E0 Volume 59.69026041820607152E0
    Height 20.E0 Volume 62.83185307179586476E0





    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Thu Sep 5 17:42:07 2024
    From Newsgroup: comp.lang.forth

    On 05-09-2024 17:37, Hans Bezemer wrote:
    f% 1.2 1 20 CylVolLoop

    Radius 1.E0

    Yeah, I copied the last test with the output of the fist test. My bad..
    Sorry ;-)

    Should have been: f% 1 1 20 CylVolLoop

    Hans Bezemer

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Buzz McCool@buzz_mccool@yahoo.com to comp.lang.forth on Fri Sep 6 14:03:38 2024
    From Newsgroup: comp.lang.forth

    On 9/5/2024 8:18 AM, Hans Bezemer wrote:

    Given that the area of the circle doesn't change - why recalculate that every time?

    Excellent observation.

    Would you have any videos talking about Forth locals? You and dxf are
    far more adept at stack manipulations than I. I'm thinking I can get a
    word up and working with locals and then convert to manual stack
    manipulations afterwards if necessary.

    When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Sat Sep 7 14:40:41 2024
    From Newsgroup: comp.lang.forth

    On 06-09-2024 23:03, Buzz McCool wrote:
    On 9/5/2024 8:18 AM, Hans Bezemer wrote:

    Given that the area of the circle doesn't change - why recalculate
    that every time?

    Excellent observation.

    Would you have any videos talking about Forth locals? You and dxf are
    far more adept at stack manipulations than I. I'm thinking I can get a
    word up and working with locals and then convert to manual stack manipulations afterwards if necessary.
    Oh, I talk a lot about locals: don't use them. The point is: you have
    random access to locals. So I doubt very much it will help you to
    uncover a smart way to do it without them. Basically any non-Forth
    Algol-like language will do the job.

    And that's in essence you I am opposed to them. It takes out what makes
    Forth unique - and the way thinking of Forth unique.

    When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?

    I can't really tell. In 4tH (my own implementation) the use of locals
    requires an external library - so it always consumes more instructions.
    It also heavily depends on the style and the skill of the programmer. If you're a newbie doing a lot of stack acrobatics, I doubt it.

    What bothers me most technologically is that parameters flow through the
    stack undisturbed. You break that paradigm when using locals. With
    locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Needless to say this copying, releasing and stuff takes time. Even when
    you don't use locals. In all honesty I must state that this overhead is
    not always translated to a diminished performance - at least not in the
    tests I did.

    ****
    TL;DR my objections are mostly based on pure architectural arguments,
    rather than practicality. I also don't like Python, PHP and Perl for
    those very same reasons - one because I think its paradigms are
    fundamentally flawed, the second and third because of their "have we
    thrown in the kitchen sink yet" mentality.

    I don't think there will ever be a "Back&Forth" episode on locals -
    frankly, because - apart from some demonstrations - there is only one
    single, ported program that uses locals in my repository. How can you
    teach if you never used them yourself?
    ****

    Note that 4tH features R@, R'@ and R"@ which can server very
    conveniently as "local variables" - provided you leave the Return Stack
    alone. I learned that trick from the programmer of the FIG editor.

    See: https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/lib/gcircle.4th
    for a nice example of that one.

    Hans Bezemer

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 8 14:56:01 2024
    From Newsgroup: comp.lang.forth

    On 6 Sep 2024 at 23:03:38 CEST, "Buzz McCool" <buzz_mccool@yahoo.com> wrote:

    Would you have any videos talking about Forth locals? You and dxf are
    far more adept at stack manipulations than I. I'm thinking I can get a
    word up and working with locals and then convert to manual stack manipulations afterwards if necessary.

    Don't. You will only become dependent on locals. Use of locals should
    be a considered decision.

    When is it necessary? dxf showed a word w/o locals to have ~%30 fewer instructions than a word with locals. Is that a common occurrence?

    We (MPE) converted much of our TCP/IP stack not to use locals. This
    was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    the period (say 15 years ago) were similar. Code density improved by
    about 25% and performance by about 50%.

    Stephen
    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)78 0390 3612, +34 649 662 974
    http://www.mpeforth.com
    MPE website
    http://www.vfxforth.com/downloads/VfxCommunity/
    downloads
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sun Sep 8 16:09:32 2024
    From Newsgroup: comp.lang.forth

    On Sun, 8 Sep 2024 14:56:01 +0000, Stephen Pelc wrote:

    On 6 Sep 2024 at 23:03:38 CEST, "Buzz McCool" <buzz_mccool@yahoo.com>
    wrote:

    Would you have any videos talking about Forth locals? You and dxf are
    far more adept at stack manipulations than I. I'm thinking I can get a
    word up and working with locals and then convert to manual stack
    manipulations afterwards if necessary.

    Don't. You will only become dependent on locals. Use of locals should
    be a considered decision.

    When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
    instructions than a word with locals. Is that a common occurrence?

    We (MPE) converted much of our TCP/IP stack not to use locals. This
    was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    the period (say 15 years ago) were similar. Code density improved by
    about 25% and performance by about 50%.

    These are good examples of "it depends". And also that one should never
    start optimising without profiling. I have had similar experiences in
    the
    other direction (i.e. with locals) with vector maths.

    Another observation is that many Forthers do not seem to put much
    emphasis
    on programming time and code maintainability or readability, which is
    easier to achieve by using locals. The code conversion for your TCP/IP
    stack must have taken a lot of programming time, but it must have been
    worth it because it paid off on another level.

    But when to use or avoid locals is an old argument that has long since
    been put to rest. It all depends...
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sun Sep 8 16:27:47 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    Don't. You will only become dependent on locals. Use of locals should
    be a considered decision.

    When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
    instructions than a word with locals. Is that a common occurrence?

    We (MPE) converted much of our TCP/IP stack not to use locals. This
    was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    the period (say 15 years ago) were similar. Code density improved by
    about 25% and performance by about 50%.

    So MPE (and Forth, Inc.) discourage the use of locals because they
    implement locals inefficiently, and they implement locals
    inefficiently because there are so few uses of locals around. A chicken-and-egg problem.

    Concerning the conversion of the TCP/IP stack: Have you considered the alternative of spending MPE's time on making the locals implementation
    more efficient?

    See also:

    @InProceedings{ertl22-locals,
    author = {M. Anton Ertl},
    title = {Are Locals Inevitably Slow?},
    crossref = {euroforth22},
    pages = {48--49},
    url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
    url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
    video = {https://www.youtube.com/watch?v=tPjSKetEJn0},
    OPTnote = {presentation slides},
    abstract = {Code quality of locals on two code examples on
    various systems}
    }

    An update on the table for the example:

    : 3dup.3 {: a b c :} a b c a b c ;

    instr. bytes system
    31 117 Gforth AMD64
    16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
    7 19 lxf 1.6-982-823 32-bit
    32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
    26 92 VFX Forth 64 5.11 RC2

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Mon Sep 9 17:15:32 2024
    From Newsgroup: comp.lang.forth

    On 08-09-2024 18:09, minforth wrote:
    Another observation is that many Forthers do not seem to put much
    emphasis
    on programming time and code maintainability or readability, which is
    easier to achieve by using locals.

    I won't dispute that using the "locals" shortcut *may* save some
    programming time - but to me, the moment you decide to put the whole
    shebang in locals, you enter another mindset. Because at that moment you
    cease to consider the algorithm itself, but start banging out code.

    You no longer consider "do I need that, do I need that now, do I need
    that here", you just start creating more local variables. Somehow that
    kills my train of mind..

    I do dispute that "no locals" Forth kills maintainability - or
    readability. I'm always happy to see a whole bunch of one-liners.
    Doesn't happen to me every day, but often enough. And then you can functionally comment your code. I usually comment it from column 40 on
    and at the top of a word.

    I've maintained non-trivial programs for *DECADES* without any trouble.
    I've plugged in a garbage collection module in my uBasic/4tH interpreter
    - and radically changed it later. My rule is: if you can't figure it
    out, rewrite it until you do. It happens, but not frequently.

    Hans Bezemer

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 9 17:34:03 2024
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >@InProceedings{ertl22-locals,
    author = {M. Anton Ertl},
    title = {Are Locals Inevitably Slow?},
    crossref = {euroforth22},
    pages = {48--49},
    url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
    url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
    video = {https://www.youtube.com/watch?v=tPjSKetEJn0},
    OPTnote = {presentation slides},
    abstract = {Code quality of locals on two code examples on
    various systems}
    }

    An update on the table for the example:

    : 3dup.3 {: a b c :} a b c a b c ;

    instr. bytes system
    31 117 Gforth AMD64
    16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
    7 19 lxf 1.6-982-823 32-bit
    32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
    26 92 VFX Forth 64 5.11 RC2

    And here's another update. A recent change in Gforth resulted in more
    code, and we now have reverted that change:

    instr. bytes system
    28 103 Gforth AMD64
    16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
    7 19 lxf 1.6-982-823 32-bit
    32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
    26 92 VFX Forth 64 5.11 RC2

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 9 21:16:49 2024
    From Newsgroup: comp.lang.forth

    On Mon, 9 Sep 2024 15:15:32 +0000, Hans Bezemer wrote:
    I won't dispute that using the "locals" shortcut *may* save some
    programming time - but to me, the moment you decide to put the whole
    shebang in locals, you enter another mindset. Because at that moment you cease to consider the algorithm itself, but start banging out code.

    You no longer consider "do I need that, do I need that now, do I need
    that here", you just start creating more local variables. Somehow that
    kills my train of mind..

    The thing is that your train of mind is focused on optimising the
    parameter flow via the stack. you are doing stupid work that an
    intelligent compiler does automatically today. it makes much more sense
    to focus your brainware on the algorithms or automation tasks to be
    solved.

    Since such algorithms/tasks are mostly formulated mathematically or
    logically, an almost 1:1 translation of such formulations by using
    locals
    is straightforward and less error prone. Use descriptive names and the
    code
    becomes quasi commented simultaneously.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 10 12:21:30 2024
    From Newsgroup: comp.lang.forth

    On 10/09/2024 7:16 am, minforth wrote:
    ...
    Since such algorithms/tasks are mostly formulated mathematically or logically, an almost 1:1 translation of such formulations by using
    locals
    is straightforward and less error prone. Use descriptive names and the
    code
    becomes quasi commented simultaneously.

    Mathematical formulations are typically expressed algebraically. Forth
    is stack-based and uses RPN. It's a different world. To use the latter effectively requires a different mindset. Do you really formulate or
    sketch out tasks algebraically? For me it ended when I stopped using
    BASIC.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Sep 10 12:10:06 2024
    From Newsgroup: comp.lang.forth

    In article <nnd$32690b01$49b74327@97bd85089db44cd3>,
    Hans Bezemer <the.beez.speaks@gmail.com> wrote:
    On 08-09-2024 18:09, minforth wrote:
    Another observation is that many Forthers do not seem to put much
    emphasis
    on programming time and code maintainability or readability, which is
    easier to achieve by using locals.

    I won't dispute that using the "locals" shortcut *may* save some
    programming time - but to me, the moment you decide to put the whole
    shebang in locals, you enter another mindset. Because at that moment you >cease to consider the algorithm itself, but start banging out code.

    You no longer consider "do I need that, do I need that now, do I need
    that here", you just start creating more local variables. Somehow that
    kills my train of mind..

    I do dispute that "no locals" Forth kills maintainability - or
    readability. I'm always happy to see a whole bunch of one-liners.
    Doesn't happen to me every day, but often enough. And then you can >functionally comment your code. I usually comment it from column 40 on
    and at the top of a word.

    I've maintained non-trivial programs for *DECADES* without any trouble.
    I've plugged in a garbage collection module in my uBasic/4tH interpreter
    - and radically changed it later. My rule is: if you can't figure it
    out, rewrite it until you do. It happens, but not frequently.

    I'm cleaning up the editor that I use all the time. It sports dozens of
    global variables and it is hard to see why it could dispense with them.

    LOCAL is an expensive feature, because they are re-entrant.
    Forthers may know where and why an expensive feature is used.

    Hans Bezemer

    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 10 04:26:51 2024
    From Newsgroup: comp.lang.forth

    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    What bothers me most technologically is that parameters flow through
    the stack undisturbed. You break that paradigm when using locals. With locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Forth programs very frequently end up juggling parameters and other data
    to and from the return stack, instead of using locals. Simple
    implementations of locals put them in the return stack too.
    "Destroying" the stack frame just means adjusting RP when the function
    exits. Usually a single instruction.

    Needless to say this copying, releasing and stuff takes time.

    Similar to DUP (copy) or DROP (release).

    In all honesty I must state that this overhead is not always
    translated to a diminished performance

    Right, I don't think one can assert a performance hit without
    measurements supporting the idea.

    TL;DR my objections are mostly based on pure architectural arguments,
    rather than practicality.

    Sure, that's reasonable, it's a matter of what you prefer. That's
    harder to take issue with than claims about performance.

    I also don't like Python, PHP and Perl for those very same reasons -

    Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
    compare to something like C, or a hypothetical cleaned up version of C,
    or even to Forth with locals ;).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Sep 10 23:19:29 2024
    From Newsgroup: comp.lang.forth

    On 10/09/2024 9:26 pm, Paul Rubin wrote:
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    What bothers me most technologically is that parameters flow through
    the stack undisturbed. You break that paradigm when using locals. With
    locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Forth programs very frequently end up juggling parameters and other data
    to and from the return stack, instead of using locals. Simple implementations of locals put them in the return stack too.
    "Destroying" the stack frame just means adjusting RP when the function
    exits. Usually a single instruction.
    ...

    In forth the programmer uses the return stack as a temporary holder. Not
    so locals which spill all input to the return stack and then shuffle these to/from the parameter stack. The latter is akin to a novice programmer who uses too many variables.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 11 12:03:05 2024
    From Newsgroup: comp.lang.forth

    On 10/09/2024 9:26 pm, Paul Rubin wrote:
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    What bothers me most technologically is that parameters flow through
    the stack undisturbed. You break that paradigm when using locals. With
    locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Forth programs very frequently end up juggling parameters and other data
    to and from the return stack, instead of using locals.

    Looking at an application with 154 colon definitions, only 2 were found
    to use the return stack for temporary storage. Even I was surprised :)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 11 14:32:36 2024
    From Newsgroup: comp.lang.forth

    On 11/09/2024 12:03 pm, dxf wrote:
    On 10/09/2024 9:26 pm, Paul Rubin wrote:
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    What bothers me most technologically is that parameters flow through
    the stack undisturbed. You break that paradigm when using locals. With
    locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Forth programs very frequently end up juggling parameters and other data
    to and from the return stack, instead of using locals.

    Looking at an application with 154 colon definitions, only 2 were found
    to use the return stack for temporary storage. Even I was surprised :)

    From the same app:

    dup 54
    drop 29
    swap 22
    over 16
    2drop 9
    rot 8
    2dup 3
    r 2
    2
    2swap 1
    2nip 1
    locals 0

    The easiest stack operations (DUP DROP) account for most. SWAP averaged
    1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a problem in forth?
    It doesn't appear to be.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 11:02:00 2024
    From Newsgroup: comp.lang.forth

    On 10-09-2024 13:26, Paul Rubin wrote:
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    What bothers me most technologically is that parameters flow through
    the stack undisturbed. You break that paradigm when using locals. With
    locals you *HAVE TO* create some kind of stack frame that you have to
    destroy when you exit.

    Forth programs very frequently end up juggling parameters and other data
    to and from the return stack, instead of using locals. Simple implementations of locals put them in the return stack too.
    "Destroying" the stack frame just means adjusting RP when the function exits. Usually a single instruction.

    Needless to say this copying, releasing and stuff takes time.

    Similar to DUP (copy) or DROP (release).

    In all honesty I must state that this overhead is not always
    translated to a diminished performance

    Right, I don't think one can assert a performance hit without
    measurements supporting the idea.

    TL;DR my objections are mostly based on pure architectural arguments,
    rather than practicality.

    Sure, that's reasonable, it's a matter of what you prefer. That's
    harder to take issue with than claims about performance.

    I also don't like Python, PHP and Perl for those very same reasons -

    Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
    compare to something like C, or a hypothetical cleaned up version of C,
    or even to Forth with locals ;).
    A lot depends on how solid you want to make your implementation. I got
    locals in uBasic/4tH.

    : exec_local ( --)
    [: get_exp 0 max 27 frame dup @ - + min negate cells frame + dup local <
    if E.MANYLOC throw else frame @ over ! to frame then ;]
    exec_function \ execution semantics for LOCALS()
    ;

    This one reserves room for locals. You may use up to 26 locals per
    function since there are 26 letters in the alphabet (duh!).

    : exec_param ( --)
    frame exec_local frame \ allocate locals, save pointers
    begin over over > while cell+ (pop) over ! repeat drop drop
    ;

    If the reserved room has to be initialized by the stack, it calls
    EXEC_LOCAL and then copies the values there.

    : exec_return ( --)
    get_token paren? putback if ['] get_push exec_function then
    gpop prog ! frame dup local #local 1- cells + >
    if E.NOSCOPE throw ;then @ to frame
    ;

    This one looks whether RETURN returns a value - and if it does, it
    pushes this value on the stack. Then it sets the return address. It
    checks for the sanity of the stack frame and if okay THEN it finally
    updates the stack pointer.

    You comfortable left out the initialization of the stack frame. Agreed,
    if ALL values are transferred to the return stack the overhead is
    minimal. But how often happens that?

    Those are at a totally different level than Forth, in terms of layers of implementation and runtime libraries, overhead, etc. It's better to
    compare to something like C, or a hypothetical cleaned up version of C,
    or even to Forth with locals ;).

    True - but that's not the level of abstraction I'm considering. I think
    a language should have a well designed core, surrounded by a
    constellation of extensions. Like C with its standard library and Forth
    with its word sets. For comparison - C got a few dozen keywords. PHP got
    at least two different ways to extend binary extensions alone. A full
    Python installation is scattered all over the filesystem, so you got a
    hell of a job to extract a single, transferable application. Not to
    mention the awkward syntax (although they fixed some of it in v3). In
    Perl you always have to wonder which prefix is fashionable today.

    Now, I won't say Forth doesn't have its issues. I think IN ESSENCE
    recognizers are a beautiful idea. Extend it to strings and you could
    eradicate "parsing words" and have something like:

    "lib/mylib.4th" include

    "Square" : "the square is:" print dup * cr ;

    But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
    standard. Clean up the dictionary, pump out an executable.

    Hans Bezemer


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 11:20:14 2024
    From Newsgroup: comp.lang.forth

    On 30-08-2024 22:32, minforth wrote:
    Two classic answers:
    use DO..LOOPs to hide away loop indices
    use locals if you have too many parameters
    (some technical/physical formulas are difficult
    or impossible to factorise into smaller words
    which would otherwise be the classic Forth mantra)

    Tips:
    - Use multiple Return Stack registers (R@, R'@, R"@);
    - If parameters come in duplets or triplets, use corresponding stack
    operators (3DUP, 3OVER, 3DROP);
    - Reorganize parameters at the *very start* of the program in a more
    palatable order. It saves stack juggling later on;
    - Maybe a strange one, but codify stack patterns!
    E.g. SPIN ( a b c -- c b a)
    STOW ( a b -- a a b)
    RISE ( a b c -- b a c)

    It helps you to THINK in these patterns and more easily recognize them.
    It depends highly on your coding habits, so it helps to analyze your
    legacy code to see if they often occur.

    Hans Bezemer

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Wed Sep 11 09:49:37 2024
    From Newsgroup: comp.lang.forth

    On Wed, 11 Sep 2024 9:20:14 +0000, Hans Bezemer wrote:
    Tips:
    - Use multiple Return Stack registers (R@, R'@, R"@);
    - If parameters come in duplets or triplets, use corresponding stack operators (3DUP, 3OVER, 3DROP);
    - Reorganize parameters at the *very start* of the program in a more palatable order. It saves stack juggling later on;
    - Maybe a strange one, but codify stack patterns!
    E.g. SPIN ( a b c -- c b a)
    STOW ( a b -- a a b)
    RISE ( a b c -- b a c)

    It helps you to THINK in these patterns and more easily recognize them.
    It depends highly on your coding habits, so it helps to analyze your
    legacy code to see if they often occur.

    Good advice if you can access the return stack directly.

    Otherwise, for non-trivial words, it is preferable to let the compiler recognise patterns and save your precious human time. If the compiled
    code is too bad, profile and optimise it afterwards.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Wed Sep 11 13:29:03 2024
    From Newsgroup: comp.lang.forth

    In article <nnd$545e2daa$4f8af75c@548f76d6156a46d8>,
    Hans Bezemer <the.beez.speaks@gmail.com> wrote:
    <SNIP>

    Now, I won't say Forth doesn't have its issues. I think IN ESSENCE >recognizers are a beautiful idea. Extend it to strings and you could >eradicate "parsing words" and have something like:

    "lib/mylib.4th" include

    "Square" : "the square is:" print dup * cr ;

    You have that backward, it must be:

    { "the square is:" print dup * cr } : Square

    If there is one thing to preserve in Forth that is the
    convention that defining words can parse new names in
    the dictionary by forward scanning, without those considered strings.
    Here { introduces a denotation, without being a PREFIX (" recognizer"),
    such as 0x in 0xDEADBEEF is. It is the same within a definition like
    numbers and nowadays strings.

    { "the square is:" print dup * cr } CONSTANT orang_utan
    orang_utan DUP : Square : quadrate


    But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
    standard. Clean up the dictionary, pump out an executable.

    I have create a language on that principle, e.g. meta
    accepts 2 xt's a build and a run one. meta is the mother of
    all defining words:
    { , } { @ } meta CONSTANT
    { CELL ALLOT } { } meta VARIABLE
    { 2 CELLS ALLOT } { } meta 2VARIABLE
    { } { EXECUTE } meta :
    { } { } meta DATA \ My favorite.

    CREATE DOES> is the right idea, an object with an allocation
    part and a behavior, but the syntax is akward beyond despair.

    I have a backlog, busy with preserving projects dating from the
    80's, so don't expect a publication soon.


    Hans Bezemer


    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Wed Sep 11 14:41:35 2024
    From Newsgroup: comp.lang.forth

    On 11-09-2024 11:49, minforth wrote:
    On Wed, 11 Sep 2024 9:20:14 +0000, Hans Bezemer wrote:
    Tips:
    - Use multiple Return Stack registers (R@, R'@, R"@);
    - If parameters come in duplets or triplets, use corresponding stack
    operators (3DUP, 3OVER, 3DROP);
    - Reorganize parameters at the *very start* of the program in a more
    palatable order. It saves stack juggling later on;
    - Maybe a strange one, but codify stack patterns!
       E.g. SPIN ( a b c -- c b a)
            STOW ( a b -- a a b)
            RISE ( a b c -- b a c)

    It helps you to THINK in these patterns and more easily recognize them.
    It depends highly on your coding habits, so it helps to analyze your
    legacy code to see if they often occur.

    Good advice if you can access the return stack directly.

    Otherwise, for non-trivial words, it is preferable to let the compiler recognise patterns and save your precious human time. If the compiled
    code is too bad, profile and optimise it afterwards.

    You know - in my experience these kinds of problems mostly manifest
    themselves when making my library routines - the stuff you rarely touch afterwards (and even more rarely in a fundamental way).

    Putting the application components to work doesn't affect the stack in
    the same way. I think there is where the "10x savings" actually are.

    Again - just a hunch of mine..

    Hans Bezemer
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 12 14:01:10 2024
    From Newsgroup: comp.lang.forth

    On 11/09/2024 7:20 pm, Hans Bezemer wrote:
    On 30-08-2024 22:32, minforth wrote:
    Two classic answers:
    use DO..LOOPs to hide away loop indices
    use locals if you have too many parameters
    (some technical/physical formulas are difficult
    or impossible to factorise into smaller words
    which would otherwise be the classic Forth mantra)

    Tips:
    - Use multiple Return Stack registers (R@, R'@, R"@);
    - If parameters come in duplets or triplets, use corresponding stack operators (3DUP, 3OVER, 3DROP);
    - Reorganize parameters at the *very start* of the program in a more palatable order. It saves stack juggling later on;
    - Maybe a strange one, but codify stack patterns!
      E.g. SPIN ( a b c -- c b a)
           STOW ( a b -- a a b)
           RISE ( a b c -- b a c)

    It helps you to THINK in these patterns and more easily recognize them. It depends highly on your coding habits, so it helps to analyze your legacy code to see if they often occur.

    swap rot 0
    over swap 0
    rot swap 1


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Sep 11 23:51:00 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Looking at an application with 154 colon definitions...
    From the same app:
    The easiest stack operations (DUP DROP) account for most.

    Is the code for this app available?

    SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a problem in forth? It doesn't appear to be.

    The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction
    inversion (with a smart compiler, the data ends up in registers that
    could be named by locals) or they are stack traffic whose cost has to be compared with the cost of indexed references to locals in the return
    stack. I'd agree that they aren't necessary "juggling" which evokes
    permuting stuff in the stack outside the usual FIFO order. That does
    happpen a little bit though, with OVER, ROT, etc.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Thu Sep 12 00:10:03 2024
    From Newsgroup: comp.lang.forth

    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    You comfortable left out the initialization of the stack
    frame. Agreed, if ALL values are transferred to the return stack the
    overhead is minimal. But how often happens that?

    I don't understand this. {: a b c :} transfers 3 elements from the
    parameter stack to the return stack. That has some cost, but it is
    offset by avoiding some DUP and similar operations. Is it relevant at
    all anyway? Old fashioned Forth interpreters are pretty fast, and if
    you're worrying about avoiding a stack transfer here or there, you need
    an optimizing compiler.

    Adding safety checks has a cost, but once the program appears debugged,
    I think Forth philosophy is to turn off the checks.

    True - but that's not the level of abstraction I'm considering. I
    think a language should have a well designed core, surrounded by a constellation of extensions. Like C with its standard library and
    Forth with its word sets.

    You might like Lua or Scheme for simple higher level languages with that
    style of design. C has some warts but its complexity in terms of
    keywords doesn't seem much worse than Forth's core words.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 12 18:21:43 2024
    From Newsgroup: comp.lang.forth

    On 12/09/2024 4:51 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    Looking at an application with 154 colon definitions...
    From the same app:
    The easiest stack operations (DUP DROP) account for most.

    Is the code for this app available?

    Previously posted. You may have seen it.

    https://pastebin.com/2xcRSbQW

    SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
    problem in forth? It doesn't appear to be.

    The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction inversion (with a smart compiler, the data ends up in registers that
    could be named by locals) or they are stack traffic whose cost has to be compared with the cost of indexed references to locals in the return
    stack. I'd agree that they aren't necessary "juggling" which evokes permuting stuff in the stack outside the usual FIFO order. That does
    happpen a little bit though, with OVER, ROT, etc.

    If a cost, it's one the programmer can keep to minimum. With locals there's
    an upfront cost that can't be avoided. Using registers is appealing until
    one realizes a call to an external function necessitates placing it back on
    the stack. Costs multiply in the face of many small functions. Moore touches on this in one of his speeches:

    "I keep asking that question. What is Forth? Forth is highly factored code.
    I don't know anything else to say except that Forth is definitions. If you
    have a lot of small definitions you are writing Forth. In order to write a
    lot of small definitions you have to have a stack. Stacks are not popular.
    Its strange to me that they are not. There is a just lot of pressure from
    vested interests that don't like stacks, they like registers. Stacks are not
    a solve all problems concept but they are very very useful, especially for
    information hiding and you have to have two of them." - Chuck Moore 1999

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Thu Sep 12 09:08:20 2024
    From Newsgroup: comp.lang.forth

    On Thu, 12 Sep 2024 8:21:43 +0000, dxf wrote:
    If a cost, it's one the programmer can keep to minimum. With locals
    there's
    an upfront cost that can't be avoided. Using registers is appealing
    until
    one realizes a call to an external function necessitates placing it back
    on
    the stack. Costs multiply in the face of many small functions.

    This is history (or your archaic compiler). Modern compilers try to pass
    most parameters through registers.

    https://langdev.stackexchange.com/questions/2584/are-modern-compilers-passing-parameters-in-registers-instead-of-on-the-stack
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Thu Sep 12 10:11:36 2024
    From Newsgroup: comp.lang.forth

    On Thu, 12 Sep 2024 9:08:20 +0000, minforth wrote:

    This is history (or your archaic compiler). Modern compilers try to pass
    most parameters through registers.

    The rules are very complicated, though. One has to account for there
    being
    too many parameters, for different architectures with different register assignments, for integer and floating-point type parameters, and under
    some
    circumstances both the registers *and* the stack must be used, where
    some
    extra 'working space' may, or may not, be needed.

    I was very happy when it finally worked on all of our target OSes.

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Thu Sep 12 08:55:26 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin <no.email@nospam.invalid> writes:
    The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction >inversion (with a smart compiler, the data ends up in registers that
    could be named by locals)

    I don't see an inversion here. The programmer-visible stack abstracts (ideally) the registers in one way, the programmer-visible locals
    abstracts them in a different way.

    And if we look at the VICHECK example from Nick Nelson's Better Values <http://www.euroforth.org/ef22/papers/nelson-values-slides.pdf> the
    version with locals, followed by the version that eliminates the
    locals:

    : VICHECK {: pindex paddr -- pindex' paddr :} \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    pindex 0 paddr @ WITHIN IF \ Index is valid
    pindex paddr
    ELSE \ Index is invalid
    Z" Invalid index " pindex ZFORMAT Z+
    Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
    Z" length " Z+ paddr @ ZFORMAT Z+
    ERROR
    0 paddr \ Use zeroth index
    THEN ;

    : VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    over 0 2 pick @ WITHIN 0= IF \ Index is invalid
    Z" Invalid index " 2 PICK ZFORMAT Z+
    Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
    Z" length " Z+ OVER @ ZFORMAT Z+
    ERROR
    NIP 0 SWAP \ Use zeroth index
    THEN ;

    So by keeping the values on the stack you not just eliminate their
    repeated mention, but also eliminate one branch of the IF. With a
    more capable Forth system a synthesis of the two approaches is
    possible:

    : VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    over 0 2 pick @ WITHIN 0= IF \ Index is invalid
    {: pindex paddr :}
    Z" Invalid index " pindex ZFORMAT Z+
    Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
    Z" length " Z+ paddr @ ZFORMAT Z+
    ERROR
    0 paddr \ Use zeroth index
    THEN ;

    Or one could factor out the code between IF and THEN and stay within
    the confines of VFX:

    : VIERROR {: pindex paddr -- 0 paddr :}
    Z" Invalid index " pindex ZFORMAT Z+
    Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
    Z" length " Z+ paddr @ ZFORMAT Z+
    ERROR
    0 paddr \ Use zeroth index
    ;

    : VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    over 0 2 pick @ WITHIN 0= IF \ Index is invalid
    VIERROR
    THEN ;

    The check can be simplified, which also simplifies the stack handling:

    : VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    2dup @ u>= IF \ Index is invalid
    VIERROR
    THEN ;

    or they are stack traffic whose cost has to be
    compared with the cost of indexed references to locals in the return
    stack.

    That check often results in the code without locals winning, but that
    is, for a large part, due to suboptimal implementations of locals.
    Ideally a perfect compiler will produce the same code for code using
    locals and for equivalent code using stack manipulation words, because
    the data flow is the same. This actually works out in the case of lxf processing various implementations of 3DUP, including a locals-based
    one; see <2024Apr10.090038@mips.complang.tuwien.ac.at>. However, in
    general Forth systems do not produce perfect results.

    I have now looked at what happens for the first two variants of
    VICHECK; I have defined the non-standard words as follows to make it
    possible to compile the code:

    defer dummy
    : z" [char] " parse 2drop postpone dummy ; immediate
    defer zformat
    defer z+
    defer >name
    defer error

    I looked at 3 systems: Gforth (because I work on it); lxf (because it
    produces the best results in the 3DUP case); VFX (because it's the
    system Nick Nelson uses). The numbers below are the number of bytes
    of native code:

    locals stack
    401 336 gforth-fast (AMD64)
    179 132 lxf 1.6-982-823 (IA-32)
    182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
    241 159 VFX Forth 64 5.43 (AMD64)

    I'd agree that they aren't necessary "juggling" which evokes
    permuting stuff in the stack outside the usual FIFO order. That does
    happpen a little bit though, with OVER, ROT, etc.

    In particular, in Starting Forth ROT is illustrated with a juggler
    (you see the juggling balls right beside her), and the swap dragon
    comments: "I hate jugglers".

    https://www.forth.com/wp-content/uploads/2015/03/ch2-rot.gif

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Thu Sep 12 10:19:03 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Using registers is appealing until
    one realizes a call to an external function necessitates placing it back on >the stack.

    Not if the stack item does not live across the call. And even if it
    lives across the call and cannot be placed in a callee-saved register,
    the save before and restore after the call is amortized typically
    across more than one register access on each side of the call.

    Register allocation is one of the most effective optimizations in
    compilers. That's also true of Forth.

    Costs multiply in the face of many small functions.

    Register allocation is also effective for small functions.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Thu Sep 12 10:31:44 2024
    From Newsgroup: comp.lang.forth

    I can well imagine that. Some wheels are particularly difficult
    to reinvent. For desktop systems, it can therefore make sense
    to use an IR (e.g. LLVM or WASM, or simply C) and use the
    optimisation functions of proven compilers for this IR.

    Sometimes a much simpler solution: use code inlining.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Sep 13 09:37:29 2024
    From Newsgroup: comp.lang.forth

    On 12/09/2024 8:19 pm, Anton Ertl wrote:
    dxf <dxforth@gmail.com> writes:
    Using registers is appealing until
    one realizes a call to an external function necessitates placing it back on >> the stack.

    Not if the stack item does not live across the call. And even if it
    lives across the call and cannot be placed in a callee-saved register,
    the save before and restore after the call is amortized typically
    across more than one register access on each side of the call.

    Register allocation is one of the most effective optimizations in
    compilers. That's also true of Forth.

    Costs multiply in the face of many small functions.

    Register allocation is also effective for small functions.

    Moore talked about registers. It's worth repeating for those who may be new
    to forth.

    "But such registers raises the question of local variables. There is a lot of
    discussion about local variables. That is another aspect of your application
    where you can save 100% of the code. I remain adamant that local variables
    are not only useless, they are harmful. If you are writing code that needs
    them you are writing, non-optimal code" - Chuck Moore 1999

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Sep 13 07:56:37 2024
    From Newsgroup: comp.lang.forth

    On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

    On 12/09/2024 8:19 pm, Anton Ertl wrote:
    Register allocation is one of the most effective optimizations in
    compilers. That's also true of Forth.

    Costs multiply in the face of many small functions.

    Register allocation is also effective for small functions.

    Moore talked about registers. It's worth repeating for those who may be
    new
    to forth.

    "But such registers raises the question of local variables. There is a
    lot of
    discussion about local variables. That is another aspect of your application
    where you can save 100% of the code. I remain adamant that local
    variables
    are not only useless, they are harmful. If you are writing code that
    needs
    them you are writing, non-optimal code" - Chuck Moore 1999

    The only thing that can be deduced from this is that back in 1999
    this was Moore's opinion in the specific context of his work.

    Besides, the world has changed a wee bit since then...
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Sep 13 19:47:46 2024
    From Newsgroup: comp.lang.forth

    On 13/09/2024 5:56 pm, minforth wrote:
    On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

    On 12/09/2024 8:19 pm, Anton Ertl wrote:
    Register allocation is one of the most effective optimizations in
    compilers.  That's also true of Forth.

    Costs multiply in the face of many small functions.

    Register allocation is also effective for small functions.

    Moore talked about registers.  It's worth repeating for those who may be
    new
    to forth.

    "But such registers raises the question of local variables.  There is a
    lot of
     discussion about local variables.  That is another aspect of your
    application
     where you can save 100% of the code.  I remain adamant that local
    variables
     are not only useless, they are harmful.  If you are writing code that
    needs
     them you are writing, non-optimal code" - Chuck Moore 1999

    The only thing that can be deduced from this is that back in 1999
    this was Moore's opinion in the specific context of his work.

    Besides, the world has changed a wee bit since then...

    Claims made in respect of locals in forth - ease of use, better performance through less 'stack juggling', better readability/maintainability - were all made in the 1980's. What has changed? Forthers today are more willing to believe, to accept the word of authority, lack the interest to discover the truth for themselves? If so, that would be a pity.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Fri Sep 13 03:38:51 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    "I remain adamant that local variables  are not only useless, they
    are harmful. If you are writing code that needs  them you are
    writing, non-optimal code" - Chuck Moore 1999 ...

    Claims made in respect of locals in forth - ease of use, better
    performance through less 'stack juggling', better
    readability/maintainability - were all made in the 1980's. What has
    changed? Forthers today are more willing to believe, to accept the
    word of authority, lack the interest to discover the truth for
    themselves?

    Is avoiding locals because of the Chuck Moore quote not an example of
    accepting the word of authority? And how often do even you care whether
    your code is optimal? It's likely difficult to get any interpreted
    Forth code to run at better than 1/5th the speed of assembly code. So
    if optimization is your main concern, why use Forth to begin with?

    I would say that the claim of better performance from locals depends on
    the implementation and in any case has to be scrutinized if it matters,
    but even if there's a performance loss, that might be an acceptable
    trade if the programmer finds offsetting gains in the other areas.

    My main programming language for random hacking is Python, which is
    possibly 10x slower than interpreted Forth or 50x slower than compiled
    Forth or C. Yet it usually doesn't matter unless I'm trying to do
    something unusually compute intensive. Once the program is fast enough
    to not be annoying to use, I don't need to optimize it more.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Fri Sep 13 13:07:32 2024
    From Newsgroup: comp.lang.forth

    In article <66e40a42$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:
    On 13/09/2024 5:56 pm, minforth wrote:
    On Thu, 12 Sep 2024 23:37:29 +0000, dxf wrote:

    On 12/09/2024 8:19 pm, Anton Ertl wrote:
    Register allocation is one of the most effective optimizations in
    compilers.  That's also true of Forth.

    Costs multiply in the face of many small functions.

    Register allocation is also effective for small functions.

    Moore talked about registers.  It's worth repeating for those who may be >>> new
    to forth.

    "But such registers raises the question of local variables.  There is a >>> lot of
     discussion about local variables.  That is another aspect of your
    application
     where you can save 100% of the code.  I remain adamant that local
    variables
     are not only useless, they are harmful.  If you are writing code that >>> needs
     them you are writing, non-optimal code" - Chuck Moore 1999

    The only thing that can be deduced from this is that back in 1999
    this was Moore's opinion in the specific context of his work.

    Besides, the world has changed a wee bit since then...

    Claims made in respect of locals in forth - ease of use, better performance >through less 'stack juggling', better readability/maintainability - were all >made in the 1980's. What has changed? Forthers today are more willing to >believe, to accept the word of authority, lack the interest to discover the >truth for themselves? If so, that would be a pity.

    I object to locals because it introduce a superfluous extra concept.
    It is foreign to a stack oriented language.
    Also there are numerous conflicting notations, and giving a name to a
    single cell, isn't sufficient. You need not local doubles, floats and structures.
    There are people fond of their information hiding aspect, that can
    easily be done with normal data and an addition like marking
    some words private.
    The remaining argument is re-entrancy, an overrated argument.

    I am also fond of Algol68/go. A different end of the spectrum,
    but it has a common feature that Forth has: consistency.
    Local variables break that.

    I don't take Moore's word for gospel, but I pay attention, because
    he is an accomplished individual.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jan Coombs@jan4comp.lang.forth@murray-microft.co.uk to comp.lang.forth on Fri Sep 13 13:07:32 2024
    From Newsgroup: comp.lang.forth

    On Fri, 13 Sep 2024 03:38:51 -0700
    Paul Rubin <no.email@nospam.invalid> wrote:

    I would say that the claim of better performance from locals depends
    on the implementation[...]

    Absolutely. As Chucks prime target of interest (hardware) uses LIFO
    registers for stacks, only the top top one, or so, R stack items could
    be used for restricted local storage (which is also common practice).

    I accept that locals are useful, and would like to see hardware stack
    engine implementations that support this better while retaining the
    performance advantage of a stack cache implemented as LIFO registers
    rather than in RAM.

    Jan Coombs
    --

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 01:12:13 2024
    From Newsgroup: comp.lang.forth

    On 13/09/2024 8:38 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    "I remain adamant that local variables  are not only useless, they
    are harmful. If you are writing code that needs  them you are
    writing, non-optimal code" - Chuck Moore 1999 ...

    Claims made in respect of locals in forth - ease of use, better
    performance through less 'stack juggling', better
    readability/maintainability - were all made in the 1980's. What has
    changed? Forthers today are more willing to believe, to accept the
    word of authority, lack the interest to discover the truth for
    themselves?

    Is avoiding locals because of the Chuck Moore quote not an example of accepting the word of authority?

    Or I've yet to hear a convincing argument from the locals authorities :)

    You have the source to my app. Perhaps you can nominate where locals
    could have been used to better effect.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Fri Sep 13 17:59:27 2024
    From Newsgroup: comp.lang.forth

    Jan Coombs <jan4comp.lang.forth@murray-microft.co.uk> writes:
    Absolutely. As Chucks prime target of interest (hardware) uses LIFO >registers for stacks, only the top top one, or so, R stack items could
    be used for restricted local storage (which is also common practice).

    I accept that locals are useful, and would like to see hardware stack
    engine implementations that support this better while retaining the >performance advantage of a stack cache implemented as LIFO registers
    rather than in RAM.

    AFAIK Chuck Moore implements the stack as SRAM indexed with his stack
    pointer; maybe the stack pointer is a rotating shift register with
    only one bit set, don't remember.

    He also uses an A register in addition to R and the data TOS last I
    looked. So much for Chuck Moore denouncing registers. When he
    introduced A, some people played with the idea to add A and possibly
    more registers to Forth.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Fri Sep 13 18:07:34 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Claims made in respect of locals in forth - ease of use, better performance >through less 'stack juggling', better readability/maintainability - were all >made in the 1980's.

    Where can I find claims about better performance? All I have read is
    claims about worse performance.

    What has changed? Forthers today are more willing to
    believe, to accept the word of authority

    Is that why you cite Chuck Moore on locals rather than arguing from
    facts?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 12:48:45 2024
    From Newsgroup: comp.lang.forth

    On 14/09/2024 4:07 am, Anton Ertl wrote:
    dxf <dxforth@gmail.com> writes:
    Claims made in respect of locals in forth - ease of use, better performance >> through less 'stack juggling', better readability/maintainability - were all >> made in the 1980's.

    Where can I find claims about better performance? All I have read is
    claims about worse performance.

    'Eliminate stack juggling' sounds like an argument for better performance.
    It's a catch cry that's become synonymous with locals. Identify something wrong with forth and introduce a solution is the gameplay.

    What has changed? Forthers today are more willing to
    believe, to accept the word of authority

    Is that why you cite Chuck Moore on locals rather than arguing from
    facts?

    The facts AFAICT is locals are an appeal to prejudice. If locals were a bona- fide extension it ought to be crystal clear when to apply them and when not. Vague statements about readability and maintainability don't cut it. The fact is locals challenge and contradict forth - why I'm vitally interested in getting
    at the truth of it. The best way I knew of doing that is see whether I needed locals in practice. When the result is good forth coding can stand on its own, why shouldn't I quote Moore.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sat Sep 14 05:47:11 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 2:48:45 +0000, dxf wrote:
    The facts AFAICT is locals are an appeal to prejudice.

    This is one of the best sentences ever uttered on this forum! :-)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 06:19:52 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    On 14/09/2024 4:07 am, Anton Ertl wrote:
    Where can I find claims about better performance? All I have read is
    claims about worse performance.

    'Eliminate stack juggling' sounds like an argument for better performance.

    Not to me. To me it sounds like a statement about the ease of writing
    and reading the code.

    The performance of locals vs. stack juggling depends on the
    implementation. I know no implementation that performs register
    allocation of locals or stack items (except the TOS) to registers
    across basic block boundaries. This seems to hurt code with locals
    more than code that keeps everything on the stacks. Here's the data
    from an earlier posting <2024Sep12.105526@mips.complang.tuwien.ac.at>,
    now including data from iForth:

    locals stack
    401 336 gforth-fast (AMD64)
    179 132 lxf 1.6-982-823 (IA-32)
    182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
    241 159 VFX Forth 64 5.43 (AMD64)
    163 175 iforth-5.1 mini (AMD64)

    The data from iForth is the outlier here, let's look at the code:

    Source code:
    defer dummy
    : z" [char] " parse 2drop postpone dummy ; immediate
    defer zformat
    defer z+
    defer >name
    defer error

    : VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    pindex 0 paddr @ WITHIN IF \ Index is valid
    pindex paddr
    ELSE \ Index is invalid
    Z" Invalid index " pindex ZFORMAT Z+
    Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
    Z" length " Z+ paddr @ ZFORMAT Z+
    ERROR
    0 paddr \ Use zeroth index
    THEN ;

    : VICHECK2 ( pindex paddr -- pindex' paddr ) \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    over 0 2 pick @ WITHIN 0= IF \ Index is invalid
    Z" Invalid index " 2 PICK ZFORMAT Z+
    Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
    Z" length " Z+ OVER @ ZFORMAT Z+
    ERROR
    NIP 0 SWAP \ Use zeroth index
    THEN ;

    One difference is that VICHECK2 does not just replace the locals with
    stack stuff and eliminate the first branch of the IF, but also
    replaces ">NAME 1+" with "CELL- @".

    Disassembled code:
    VICHECK1 VICHECK2
    pop rbx pop rbx
    lea rsi, [rsi #-16 +] qword mov rdi, [rsp] qword
    mov [esi] dword, rbx push rbx
    pop rbx push rdi
    lea rsi, [rsi #-16 +] qword push 0 b#
    mov [esi] dword, rbx mov rbx, [rsp #16 +] qword
    mov rbx, [rsi #16 +] qword pop rdi
    mov rbx, [rbx] qword mov rax, rdi
    mov rdi, [rsi] qword sub rax, [rbx] qword
    cmp rbx, rdi neg rax
    jbe $10227337 offset NEAR pop rbx
    push [rsi] qword sub rbx, rdi
    push [rsi #16 +] qword cmp rax, rbx
    jmp $10227395 offset NEAR seta bl
    call $10226600 qword-offset movzx rbx, bl
    push [rsi] qword neg rbx
    call $10226E90 qword-offset cmp rbx, 0 b#
    call $10226EB0 qword-offset jne $10227465 offset NEAR
    call $10226600 qword-offset call $10226600 qword-offset
    call $10226EB0 qword-offset mov rbx, [rsp #16 +] qword
    push [rsi #16 +] qword push rbx
    call $10226ED0 qword-offset call $10226E90 qword-offset
    pop rbx call $10226EB0 qword-offset
    lea rbx, [rbx 1 +] qword call $10226600 qword-offset
    push rbx call $10226EB0 qword-offset
    call $10226EB0 qword-offset pop rbx
    call $10226600 qword-offset mov rdi, [rsp] qword
    call $10226EB0 qword-offset push rbx
    mov rbx, [rsi #16 +] qword push [rdi -8 +] qword
    push [rbx] qword call $10226EB0 qword-offset
    call $10226E90 qword-offset call $10226600 qword-offset
    call $10226EB0 qword-offset call $10226EB0 qword-offset
    call $10226EF0 qword-offset pop rbx
    push 0 b# mov rdi, [rsp] qword
    push [rsi #16 +] qword push rbx
    add rsi, #32 b# push [rdi] qword
    ; call $10226E90 qword-offset
    call $10226EB0 qword-offset
    call $10226EF0 qword-offset
    pop rbx
    pop rdi
    mov rdi, 0 d#
    mov rcx, rdi
    push rcx
    push rbx
    ;

    iForth 5.1-mini does not even keep the TOS in a register on basic
    block boundaries, which results in pops and pushes at all the
    boundaries, especially for the stack-only code. However, in the
    actual application (where Z", ZFORMAT etc. don't compile as deferred
    words) it would probably inline many of these words which might result
    in better code for the stack variant. It does not keep locals in
    stack items, either, but accesses them in memory through a separate
    stack pointer.

    The code at the start of VICHECK2 does not suffer from basic block
    boundaries, yet makes less use of registers than I expected. By
    contrast, in VICHECK1 iforth discovers that "0 paddr @ within" is
    equivalent to "paddr @ u<", while for "0 2 pick @ within" it fails to
    make the equivalent discovery.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 18:40:53 2024
    From Newsgroup: comp.lang.forth

    On 14/09/2024 4:19 pm, Anton Ertl wrote:
    dxf <dxforth@gmail.com> writes:
    On 14/09/2024 4:07 am, Anton Ertl wrote:
    Where can I find claims about better performance? All I have read is
    claims about worse performance.

    'Eliminate stack juggling' sounds like an argument for better performance.

    Not to me. To me it sounds like a statement about the ease of writing
    and reading the code.

    The performance of locals vs. stack juggling depends on the
    implementation.
    ...

    Surely you mean locals vs. forth. The easiest way to achieve performance
    in forth is making your stack operations efficient. 'Stack juggling' is
    a visual cue that it's not. I'm sorry that you feel forth isn't readable.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 14 01:56:20 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    You have the source to my app. Perhaps you can nominate where locals
    could have been used to better effect.

    : EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

    could be written:

    : EMITS {: n char -- :} n 0 ?do char emit loop ;
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Sep 14 21:56:41 2024
    From Newsgroup: comp.lang.forth

    On 14/09/2024 6:56 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    You have the source to my app. Perhaps you can nominate where locals
    could have been used to better effect.

    : EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

    could be written:

    : EMITS {: n char -- :} n 0 ?do char emit loop ;

    Compiling under DX-Forth resulted in a code size of 23 and 26 bytes respectively. Under VFX ...

    ( 71 bytes, 18 instructions )

    ( 102 bytes, 28 instructions )

    Not only were you able to read forth code, the result was more efficient. Perhaps locals in forth were meant to be clever? That would explain the interest however it's high price to pay.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 12:32:07 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    On 12/09/2024 4:51 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    https://pastebin.com/2xcRSbQW

    SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
    problem in forth? It doesn't appear to be.

    : ARG ( n -- adr len -1 | 0 )
    >r 0 0 cmdtail r> 0 ?do
    2nip
    bl skip 2dup bl scan
    rot over - -rot
    loop 2drop
    dup if -1 end and ;

    The heavy use of global variables in this program also does not
    support the idea that proper usage of the stacks makes locals
    unnecessary.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 14:52:59 2024
    From Newsgroup: comp.lang.forth

    Hi,
    In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
    as:

    mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
    (c-x)/(c-b) for b <= x < c,
    0e elsewere.

    defining it with locals:

    : tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e
    ;

    But defining it without locals ????!!!!!

    : tri_mf() ( f: x a b c -- mv) ....

    How?

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Sep 14 15:08:36 2024
    From Newsgroup: comp.lang.forth

    melahi_ahmed@yahoo.fr (Ahmed) writes:
    Hi,
    In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
    as:

    mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
    (c-x)/(c-b) for b <= x < c,
    0e elsewere.

    defining it with locals:

    : tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e
    ;

    But defining it without locals ????!!!!!

    : tri_mf() ( f: x a b c -- mv) ....

    How?

    I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
    that tends to get passed around without changing it. In that case
    defining it as a structure in memory and accessing its members there
    might be a solution.

    But OTOH, unless you see programming in Forth as a religious exercise,
    why worry, as long as your solution works.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 14 09:10:58 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Compiling under DX-Forth resulted in a code size of 23 and 26 bytes respectively. Under VFX ...

    I can't help it if those compilers generate worse code for the locals
    version. Can you conveniently try lxf?

    Not only were you able to read forth code, the result was more
    efficient.

    Sometimes it isn't too hard to read, sometimes it takes head scratching,
    and sometimes I can't make any sense of it. The function Anton posted
    was an example that didn't make sense. I remember thinking I might sit
    down and try to figure it out to rewrite it, but it doesn't seem worth
    the effort.

    Anyway, if efficiency was important for that example, I'd use CODE.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:13:51 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 15:08:36 +0000, Anton Ertl wrote:

    I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
    that tends to get passed around without changing it. In that case
    defining it as a structure in memory and accessing its members there
    might be a solution.

    a, b and are the parameters of the membership function.
    Yes, we can use structures, arrays ...



    But OTOH, unless you see programming in Forth as a religious exercise,
    why worry, as long as your solution works.

    I did it without locals as an exercise. Here it is:


    Without locals:

    : tri_mf: ( f: a b c )
    create frot f, fswap f, f,
    does> ( ad_a) ( f: x)
    dup fdup ( ad_a ad_a) ( f: x x)
    f@ ( ad_a) ( f: x x a)
    f>= ( ad_a -1|0) ( f: x)
    over float+ ( ad_a -1|0 ad_b) ( f: x)
    fdup f@ ( ad_a -1|0) ( f: x x b)
    f< and if ( ad_a) ( f: x)
    dup f@ f- ( ad_a) ( f: x-a)
    dup f@ ( ad_a) ( f: x-a a)
    float+ ( ad_b) ( f: x-a a)
    f@ fswap f- ( f: x-a b-a)
    f/ ( f: [x-a]/[b-a])
    exit
    then
    float+ ( ad_b) ( f: x)
    dup fdup ( ad_b ad_b) ( f: x x)
    f@ ( ad_b) ( f: x x b)
    f>= ( ad_b -1|0) ( f: x)
    over float+ ( ad_b -1|0 ad_c) ( f: x)
    fdup f@ ( ad_b -1|0) ( f: x x c)
    f< and if ( ad_b) ( f: x)
    dup float+ f@ ( ad_b) ( f: x c)
    f- ( ad_b) ( f: x-c)
    dup float+ ( ad_b ad_c) ( f: x-c)
    swap f@ f@ f- ( f: x-c b-c)
    f/ ( f: [x-c]/[b-c])
    exit
    then
    drop fdrop
    0e
    ;


    -1e309 -1e 0e tri_mf: neg_big
    -1e 0e 1e tri_mf: zero
    0e 1e 1e309 tri_mf: pos_big

    : fuzzify ( f: x)
    fdup neg_big cr f.
    fdup zero cr f.
    pos_big cr f.
    ;

    Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
    0.7e, 1e, 20e}
    -10e fuzzify and so on.

    \ ---------------

    With locals:
    : tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e
    ;

    : neg_big -1e309 -1e 0e tri_mf() ;
    : zero -1e 0e 1e tri_mf() ;
    : pos_big 0e 1e 1e309 tri_mf() ;


    : fuzzify { f: x }
    x neg_big cr f.
    x zero cr f.
    x pos_big cr f.
    ;


    Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
    0.7e, 1e, 20e}
    -10e fuzzify and so on.

    I notice a great difference in readibality and simplicity when using
    locals.

    Using gforth under WSL (Windows Subsystem for Linux):

    utime 0.1e neg_big utime d- dnegate d.
    with locals: about 19 ms
    without locals: about 18 ms

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:43:52 2024
    From Newsgroup: comp.lang.forth

    Oops.
    Please read micro seconds (us) instead of milli seconds (ms).

    Without locals: about 18 us
    with locals: about 19 us

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 17:41:23 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

    utime 0.1e neg_big utime d- dnegate d.
    with locals: about 19 ms
    without locals: about 18 ms

    Ahmed
    Oops.

    Please read micro seconds (us) instead of milli seconds (ms).

    with locals: about 19 us
    without locals: about 18 us

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sat Sep 14 18:54:46 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 17:41:23 +0000, Ahmed wrote:

    On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

    utime 0.1e neg_big utime d- dnegate d.
    with locals: about 19 ms
    without locals: about 18 ms

    Ahmed
    Oops.

    Please read micro seconds (us) instead of milli seconds (ms).

    with locals: about 19 us
    without locals: about 18 us

    That can't be correct.

    In iForth I used dfloats instead of floats
    ( 4.9ns instead of 7.3ns).
    Using structs is not a great idea in this case.

    anew -testlocals

    : tri_mf: ( f: a b c )
    create frot df, fswap df, df,
    does> ( F: x -- y )
    ( ad_a) ( f: x)
    dup fdup ( ad_a ad_a) ( f: x x)
    df@ ( ad_a) ( f: x x a)
    f>= ( ad_a -1|0) ( f: x)
    over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
    fdup df@ ( ad_a -1|0) ( f: x x b)
    f< and if ( ad_a) ( f: x)
    dup df@ f- ( ad_a) ( f: x-a)
    dup df@ ( ad_a) ( f: x-a a)
    dfloat+ ( ad_b) ( f: x-a a)
    f@ fswap f- ( f: x-a b-a)
    f/ ( f: [x-a]/[b-a])
    exit
    then
    dfloat+ ( ad_b) ( f: x)
    dup fdup ( ad_b ad_b) ( f: x x)
    df@ ( ad_b) ( f: x x b)
    f>= ( ad_b -1|0) ( f: x)
    over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
    fdup df@ ( ad_b -1|0) ( f: x x c)
    f< and if ( ad_b) ( f: x)
    dup dfloat+ df@ ( ad_b) ( f: x c)
    f- ( ad_b) ( f: x-c)
    dup dfloat+ ( ad_b ad_c) ( f: x-c)
    swap df@ df@ f- ( f: x-c b-c)
    f/ ( f: [x-c]/[b-c])
    exit
    then
    drop fdrop
    0e
    ;

    -1e309 -1e 0e tri_mf: nol_neg_big

    : (tri_mf) ( f: x a b c -- mv)
    FLOCALS| c b a x |
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e ;

    : loc_neg_big -1e309 -1e 0e (tri_mf) ;
    : .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

    : tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e nol_neg_big FDROP LOOP .timing
    CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e loc_neg_big FDROP LOOP .timing ;

    FORTH> tnb
    \ no locals: 4.9ns/call.
    \ locals: 21.3ns/call. ok

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sat Sep 14 19:19:25 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 18:54:46 +0000, mhx wrote:
    That can't be correct.
    You are right.
    I find with gforth:

    : go 0 do -0.1e neg_big fdrop loop ;

    without locals:
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8
    times: (67.62 ns)

    and with locals:
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
    1e8 times: (99.61 ns)

    I missused the timing in the previous post.
    Thanks for the correction.


    FORTH> tnb
    \ no locals: 4.9ns/call.
    \ locals: 21.3ns/call. ok

    -marcel

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 15:17:20 2024
    From Newsgroup: comp.lang.forth

    On 15/09/2024 2:10 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    Compiling under DX-Forth resulted in a code size of 23 and 26 bytes
    respectively. Under VFX ...

    I can't help it if those compilers generate worse code for the locals version. Can you conveniently try lxf?

    Windows NT/Forth (32 bit):

    ( 67 bytes, 19 instructions )
    ( 87 bytes, 24 instructions )

    Not only were you able to read forth code, the result was more
    efficient.

    Sometimes it isn't too hard to read, sometimes it takes head scratching,
    and sometimes I can't make any sense of it. The function Anton posted
    was an example that didn't make sense. I remember thinking I might sit
    down and try to figure it out to rewrite it, but it doesn't seem worth
    the effort.

    It would be no different were locals used. It would still require one to
    sit down and figure out what the code did. The more experienced one is in
    the language the easier it is.

    Going back to the EMITS example:

    - despite lack of comments you quickly deduced what it did
    - stack operations were few and simple and still you didn't like it
    - your ideal is that every stack operation should go, which is what
    you did

    If one takes from forth that which makes it efficient, then one takes away
    its reason for existence. Unfortunately for forth, this is what locals
    users are doing, whether they're aware of it or not.

    Anyway, if efficiency was important for that example, I'd use CODE.

    In other words forth is not important to you. I understand. You've stated Python is your language of preference. Forth is mine and I'll program it
    the best way I know how.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 15:28:24 2024
    From Newsgroup: comp.lang.forth

    On 14/09/2024 10:32 pm, Anton Ertl wrote:
    dxf <dxforth@gmail.com> writes:
    On 12/09/2024 4:51 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    https://pastebin.com/2xcRSbQW

    SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a >>>> problem in forth? It doesn't appear to be.

    : ARG ( n -- adr len -1 | 0 )
    >r 0 0 cmdtail r> 0 ?do
    2nip
    bl skip 2dup bl scan
    rot over - -rot
    loop 2drop
    dup if -1 end and ;

    I believe it's well written and efficient.

    : 2nip 2swap 2drop ;
    : end postpone exit postpone then ; immediate
    defer cmdtail ( -- adr len)

    : ARG ( n -- adr len -1 | 0 )
    >r 0 0 cmdtail r> 0 ?do
    2nip
    bl skip 2dup bl scan
    rot over - -rot
    loop 2drop
    dup if -1 end and ;

    VFX:

    ( 180 bytes, 44 instructions )

    The heavy use of global variables in this program also does not
    support the idea that proper usage of the stacks makes locals
    unnecessary.

    I see many small colon definitions and very few variables - global or
    local:

    integer #TERMS \ number of terminals in DTA file
    integer TERM \ working terminal#
    variable #DIGIT
    variable LEN
    integer MAXCHR

    The first two are necessarily global and would exist regardless.
    The remaining three are used by a group of functions with the view of
    keeping them simple. The alternative would be to carry them around as parameters shuffling them from one function to another. That seems
    worse to me.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Sun Sep 15 06:17:18 2024
    From Newsgroup: comp.lang.forth

    On Sat, 14 Sep 2024 19:19:25 +0000, Ahmed wrote:
    You are right.
    I find with gforth:

    : go 0 do -0.1e neg_big fdrop loop ;

    without locals:
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8 times: (67.62 ns)

    and with locals:
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
    1e8 times: (99.61 ns)

    I missused the timing in the previous post.
    Thanks for the correction.

    So with gforth it's about 30 nanosecs runtime disadvantage.
    IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

    While the locals version was easy to code, pretty straightforward and
    probably bug-free out of the box, how long did it take to code and debug
    the stack juggling version?

    Say 10 minutes longer. Break-even point would be around 2*10^10 runs,
    and the dubious assumption that CPU time is as valuable as human time.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 07:30:24 2024
    From Newsgroup: comp.lang.forth

    On Sun, 15 Sep 2024 6:17:18 +0000, minforth wrote:

    So with gforth it's about 30 nanosecs runtime disadvantage.
    IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

    I think you mean: if you run the code 3*10^8 times it adds up to 1 sec disadvantage.


    While the locals version was easy to code, pretty straightforward and probably bug-free out of the box, how long did it take to code and debug
    the stack juggling version?

    It took me several tries and corrections (and time).

    Perhaps, one can factor the code in the does> part.


    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 07:35:14 2024
    From Newsgroup: comp.lang.forth

    On Sun, 15 Sep 2024 7:30:24 +0000, Ahmed wrote:

    On Sun, 15 Sep 2024 6:17:18 +0000, minforth wrote:

    So with gforth it's about 30 nanosecs runtime disadvantage.
    IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

    I think you mean: if you run the code 3*10^8 times it adds up to 1 sec disadvantage.



    Oops!
    You are right. 3*10^7 times running the code gives about 1 sec
    disadvantage.

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 15 18:14:17 2024
    From Newsgroup: comp.lang.forth

    On 15/09/2024 3:13 am, Ahmed wrote:
    On Sat, 14 Sep 2024 15:08:36 +0000, Anton Ertl wrote:

    I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
    that tends to get passed around without changing it.  In that case
    defining it as a structure in memory and accessing its members there
    might be a solution.

    a, b and are the parameters of the membership function.
    Yes, we can use structures, arrays ...



    But OTOH, unless you see programming in Forth as a religious exercise,
    why worry, as long as your solution works.

    I did it without locals as an exercise. Here it is:


    Without locals:

    : tri_mf: ( f: a b c )
        create frot f, fswap f, f,
        does>             ( ad_a)           ( f: x)       dup fdup        ( ad_a ad_a)      ( f: x x)
          f@              ( ad_a)           ( f: x x a)
          f>=             ( ad_a -1|0)      ( f: x)       over float+     ( ad_a -1|0 ad_b) ( f: x)
          fdup f@         ( ad_a -1|0)      ( f: x x b)       f< and if       ( ad_a)           ( f: x)         dup f@ f-     ( ad_a)           ( f: x-a)         dup f@        ( ad_a)           ( f: x-a a)         float+        ( ad_b)           ( f: x-a a)         f@ fswap f-                     ( f: x-a b-a)
            f/                              ( f: [x-a]/[b-a])
            exit
          then
          float+          ( ad_b)           ( f: x)       dup fdup        ( ad_b ad_b)      ( f: x x)
          f@              ( ad_b)           ( f: x x b)
          f>=             ( ad_b -1|0)      ( f: x)       over float+     ( ad_b -1|0 ad_c) ( f: x)
          fdup f@         ( ad_b -1|0)      ( f: x x c)       f< and if       ( ad_b)           ( f: x)         dup float+ f@ ( ad_b)           ( f: x c)         f-            ( ad_b)           ( f: x-c)         dup float+    ( ad_b ad_c)      ( f: x-c)         swap f@ f@ f-                   ( f: x-c b-c)
            f/                              ( f: [x-c]/[b-c])
            exit
          then
          drop fdrop
          0e
    ;

    That appears no better than FVALUEs ...

    0e fvalue a
    0e fvalue b
    0e fvalue c
    0e fvalue x

    : tri_mf() ( f: x a b c -- mv)
    to c to b to a to x
    x a f>=
    x b f< and if
    x a f- b a f- f/ exit
    then
    x b f>=
    x c f< and if
    c x f- c b f- f/ exit
    then
    0e
    ;


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Sun Sep 15 08:58:20 2024
    From Newsgroup: comp.lang.forth

    On Sun, 15 Sep 2024 8:14:17 +0000, dxf wrote:

    That appears no better than FVALUEs ...

    0e fvalue a
    0e fvalue b
    0e fvalue c
    0e fvalue x

    : tri_mf() ( f: x a b c -- mv)
    to c to b to a to x
    x a f>=
    x b f< and if
    x a f- b a f- f/ exit
    then
    x b f>=
    x c f< and if
    c x f- c b f- f/ exit
    then
    0e
    ;


    I knew about this solution and also the use of fvariables,
    I wanted tri_mf() to be used in defining for example:
    neg_big, zero and pos_big like this:

    : neg_big -1e309 -1e 0e tri_mf() ;
    : zero -1e 0e 1e tri_mf() ;
    : pos_big 0e 1e 1e309 tri_mf() ;

    It is ok.

    Here the fvalues a, b and c are shared between these words without
    problem.

    Using the same test to estimate the speed (gforth under wsl) gives about
    88 ns/call.
    : go 0 do -0.1e neg_big fdrop loop ; ok

    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08933806 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08499321 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08958042 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09034804 ok

    And with fvariables, the timing gives about 86 ns/call

    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08831171 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08438598 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08442013 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08619858 ok



    ( with locals: 99 ns/call,
    without locals and no fvalues nor fvariables: 67 ns/call) (see
    previous posts)



    So naming (cells, ...) ( locals, values, variables, ...) simplifies the elaboration of the solution (code) leaving away heavy stack juggling but
    with a loss in speed (not so much).

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:14:53 2024
    From Newsgroup: comp.lang.forth

    In article <87cyl6396z.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    dxf <dxforth@gmail.com> writes:
    You have the source to my app. Perhaps you can nominate where locals
    could have been used to better effect.

    : EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

    could be written:

    : EMITS {: n char -- :} n 0 ?do char emit loop ;

    I think TYPE should be the primitive and EMIT should
    be handle a 1 char string.

    : EMIT DSP@ 1 TYPE DROP ;

    Imagine that you have concurrent tasks and one will write
    in red, the other in blue. You could lock up the terminal
    with undefined escape sequence.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:20:17 2024
    From Newsgroup: comp.lang.forth

    In article <e29088cacf765cd0da6519e333fa78f1@www.novabbs.com>,
    Ahmed <melahi_ahmed@yahoo.fr> wrote:
    Hi,
    In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
    as:

    mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
    (c-x)/(c-b) for b <= x < c,
    0e elsewere.

    defining it with locals:

    : tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e
    ;

    But defining it without locals ????!!!!!

    : tri_mf() ( f: x a b c -- mv) ....

    How?

    locals doesn't help here. flocals maybe, but that
    is the whole point. You are halfway through the rabbit hole
    if you demand flocals dlocals ..


    Ahmed
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:42:26 2024
    From Newsgroup: comp.lang.forth

    In article <90389fea385c08c72f39d4fdef04d076@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    On Sat, 14 Sep 2024 17:41:23 +0000, Ahmed wrote:

    On Sat, 14 Sep 2024 17:13:51 +0000, Ahmed wrote:

    utime 0.1e neg_big utime d- dnegate d.
    with locals: about 19 ms
    without locals: about 18 ms

    Ahmed
    Oops.

    Please read micro seconds (us) instead of milli seconds (ms).

    with locals: about 19 us
    without locals: about 18 us

    That can't be correct.

    In iForth I used dfloats instead of floats
    ( 4.9ns instead of 7.3ns).
    Using structs is not a great idea in this case.

    anew -testlocals

    : tri_mf: ( f: a b c )
    create frot df, fswap df, df,
    does> ( F: x -- y )
    ( ad_a) ( f: x)
    dup fdup ( ad_a ad_a) ( f: x x)
    df@ ( ad_a) ( f: x x a)
    f>= ( ad_a -1|0) ( f: x)
    over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
    fdup df@ ( ad_a -1|0) ( f: x x b)
    f< and if ( ad_a) ( f: x)
    dup df@ f- ( ad_a) ( f: x-a)
    dup df@ ( ad_a) ( f: x-a a)
    dfloat+ ( ad_b) ( f: x-a a)
    f@ fswap f- ( f: x-a b-a)
    f/ ( f: [x-a]/[b-a])
    exit
    then
    dfloat+ ( ad_b) ( f: x)
    dup fdup ( ad_b ad_b) ( f: x x)
    df@ ( ad_b) ( f: x x b)
    f>= ( ad_b -1|0) ( f: x)
    over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
    fdup df@ ( ad_b -1|0) ( f: x x c)
    f< and if ( ad_b) ( f: x)
    dup dfloat+ df@ ( ad_b) ( f: x c)
    f- ( ad_b) ( f: x-c)
    dup dfloat+ ( ad_b ad_c) ( f: x-c)
    swap df@ df@ f- ( f: x-c b-c)
    f/ ( f: [x-c]/[b-c])
    exit
    then
    drop fdrop
    0e
    ;

    -1e309 -1e 0e tri_mf: nol_neg_big

    : (tri_mf) ( f: x a b c -- mv)
    FLOCALS| c b a x |
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e ;

    : loc_neg_big -1e309 -1e 0e (tri_mf) ;
    : .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

    : tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e nol_neg_big FDROP LOOP .timing
    CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e loc_neg_big FDROP LOOP .timing ;

    This captures the meaning of the problem not good.
    Anton Ertl is right that you have to bound a b c
    into something, that is more than its parts.

    0E0 FDUP FDUP class triangle-function
    M: a F@ M; F,
    M: b F@ M; F,
    M: c F@ M; F,

    M: fx ( f1 -- f1 )
    FDUP a f>= FDUP b f< and if a f- b a f- f/ exit then
    FDUP b f>= FDUP c f< and if c FSWAP f- c b f- f/ exit then
    0e M;
    endclass

    5E0 3E0 1E0 triangle-function orang-utan

    orang-utan
    2E0 fx F.
    4E0 fx F.

    Note that I have not introduced anything special, only classes
    that you need anyway. These classes are straightforward
    generalisation of the CREATE DOES> construct,minus the
    awkward syntax.
    Note that x is passed as it should, volatile in Forth fashion.
    Passing 4 parameters is c-style.

    NOTE:
    These are presentation of ideas, nothing is tested.


    FORTH> tnb
    \ no locals: 4.9ns/call.
    \ locals: 21.3ns/call. ok

    -marcel
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 15 11:53:09 2024
    From Newsgroup: comp.lang.forth

    In article <66e67077$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:
    On 14/09/2024 10:32 pm, Anton Ertl wrote:
    The heavy use of global variables in this program also does not
    support the idea that proper usage of the stacks makes locals
    unnecessary.

    I see many small colon definitions and very few variables - global or
    local:

    integer #TERMS \ number of terminals in DTA file
    integer TERM \ working terminal#
    variable #DIGIT
    variable LEN
    integer MAXCHR

    The first two are necessarily global and would exist regardless.
    The remaining three are used by a group of functions with the view of
    keeping them simple. The alternative would be to carry them around as >parameters shuffling them from one function to another. That seems
    worse to me.

    One anecdote. I had a project that consisted of squashing bugs.
    Proud to say that I accurately predicted the timing of each bug
    separately and I was not 5 % off for the total.
    One bug I refused to get a timing estimate on.
    This program was written in c by lispers, and they didn't understand
    that some variables are group-local, i.e. in fact global.
    There was a variable ERROR , and once set the second time there
    was an error this was inspected, and the program was supposed to give up.

    The lispers went recursively about it and kept defining new ERROR
    that were initialised to false. In case of an error,
    this program never stopped.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Sep 15 09:58:23 2024
    From Newsgroup: comp.lang.forth

    This unearthed a "bug": -1e309 does not fit in a dfloat,
    it prints as -Inf.

    anew -testlocals

    0e dfvalue a PRIVATE
    0e dfvalue b PRIVATE
    0e dfvalue c PRIVATE

    ( based on dxf's outline )
    : gv_tri_mf ( f: x a b c -- mv )
    to c to b to a
    fdup a f>= fdup b f< and if a f- b a f- f/ exit endif
    fdup b f>= fdup c f< and if c fswap f- c b f- f/ exit endif
    0e ;

    : gv_neg_big -1e308 ( ! ) -1e 0e gv_tri_mf ;

    : tri_mf: ( f: a b c )
    create frot df, fswap df, df,
    does> ( F: x -- y )
    ( ad_a) ( f: x)
    dup fdup ( ad_a ad_a) ( f: x x)
    df@ ( ad_a) ( f: x x a)
    f>= ( ad_a -1|0) ( f: x)
    over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
    fdup df@ ( ad_a -1|0) ( f: x x b)
    f< and if ( ad_a) ( f: x)
    dup df@ f- ( ad_a) ( f: x-a)
    dup df@ ( ad_a) ( f: x-a a)
    dfloat+ ( ad_b) ( f: x-a a)
    f@ fswap f- ( f: x-a b-a)
    f/ ( f: [x-a]/[b-a])
    exit
    then
    dfloat+ ( ad_b) ( f: x)
    dup fdup ( ad_b ad_b) ( f: x x)
    df@ ( ad_b) ( f: x x b)
    f>= ( ad_b -1|0) ( f: x)
    over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
    fdup df@ ( ad_b -1|0) ( f: x x c)
    f< and if ( ad_b) ( f: x)
    dup dfloat+ df@ ( ad_b) ( f: x c)
    f- ( ad_b) ( f: x-c)
    dup dfloat+ ( ad_b ad_c) ( f: x-c)
    swap df@ df@ f- ( f: x-c b-c)
    f/ ( f: [x-c]/[b-c])
    exit
    then
    drop fdrop
    0e
    ;

    -1e309 -1e 0e tri_mf: nol_neg_big

    : (tri_mf) ( f: x a b c -- mv)
    FLOCALS| c b a x |
    x a f>= x b f< and if x a f- b a f- f/ exit then
    x b f>= x c f< and if c x f- c b f- f/ exit then
    0e ;

    : loc_neg_big -1e309 -1e 0e (tri_mf) ;

    : .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

    : tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e nol_neg_big FDROP LOOP .timing
    CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e loc_neg_big FDROP LOOP .timing
    CR ." \ globals: " TIMER-RESET #10000000 ( 1e7 times )
    0 DO -10e gv_neg_big FDROP LOOP .timing ;

    FORTH> tnb
    \ no locals: 4.9ns/call.
    \ locals: 21.4ns/call.
    \ globals: 6.2ns/call. ok

    Surprisingly, there is hardly a difference between no locals and
    global variables. The stack juggling in tri_mf: is merely an
    intellectual exercise (in this case).

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (ahmed) to comp.lang.forth on Sun Sep 15 12:06:52 2024
    From Newsgroup: comp.lang.forth

    On Sun, 15 Sep 2024 9:58:23 +0000, mhx wrote:

    This unearthed a "bug": -1e309 does not fit in a dfloat,
    it prints as -Inf.

    In practice, the universe of discourse of x is bounded [xmin, xmax].
    I use normalized univers of discours [-1, 1].
    So to get neg_big I just use a big value (absolute value) for the
    parameter a (for example: -1e6)

    -1e6 -1e 0e tri_mf: neg_big
    -1e 0e 1e6 tri_mf: pos_big

    and this gives: x is between -2e and 2e for example
    neg_big(x) equals approximately 1 for all x less than -1.
    pos_big(x) equals approximately 1 for all x greater than 1.

    So I don't use 1e309 or -1e309.

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 15 15:04:11 2024
    From Newsgroup: comp.lang.forth

    On 14 Sep 2024 at 08:19:52 CEST, "Anton Ertl" <Anton Ertl> wrote:

    locals stack
    401 336 gforth-fast (AMD64)
    179 132 lxf 1.6-982-823 (IA-32)
    182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
    241 159 VFX Forth 64 5.43 (AMD64)
    163 175 iforth-5.1 mini (AMD64)

    There are design decisions within locals that can impact optimisation.
    The design of locals in VFX was influenced by Don Colburn's Forth's
    and by a desire to use locals to simplify source code when interfacing
    to a host operating system. Many operating systems return data
    to the caller by passing the address of a variable/buffer as an input parameter. Locals that can have an accessible address make such
    code much easier to read and write. The example below comes from
    early system access code in VFX (see kernel/386Lin/syspatch.fth).
    The locals design dates from long before ANS.

    $541B equ FIONREAD

    : (OS_key?) { | nread[ cell ] -- flag }
    ?PrepTerm nread[ off
    nread[ FIONREAD stdin @ dll_ioctl @ 3 nxcall -1 = if
    0 \ Error return from ioctl
    else
    nread[ @ 0<>
    then
    ;

    : (OS_Key) \ -- key ; SFP003
    { | iobuff[ cell ] -- char }
    ?PrepTerm
    1 iobuff[ stdin @ dll_ReadFile @ 3 nxcall drop
    iobuff[ c@
    ;

    Code such as this has been around for a very long time and the use
    of addresses of locals, and of local buffers, has proven itself over
    time. Yes, we could put in a great effort to improve the performance
    of locals, but this is Forth and there are other optimisations that may
    produce bigger changes to application performance. In the last
    decade or so there has been very little customer demand for
    faster code. However, higher level source code has been much
    in demand. An example is Nick Nelson's value flavoured structures,
    which are of particular merit when converting code from 32 bit to
    64 bit host Forths.

    Just because many of the Forth applications visible to the Forth
    community now run on CPUs with 16 or 32 address registers
    does not mean that all systems can implement the compiler
    techniques required for high-performance locals.

    I can buy a lot of CPU cycles for the cost of one day of programmer
    time. I am reminded when looking at locals that a client's Forth
    engine is currently at 4GHz on a 12nm process. The performance
    was detuned to 4GHz becuase the machine was more than fast
    enough.

    Stephen
    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)78 0390 3612, +34 649 662 974
    http://www.mpeforth.com
    MPE website
    http://www.vfxforth.com/downloads/VfxCommunity/
    downloads
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 09:52:24 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Going back to the EMITS example:

    - despite lack of comments you quickly deduced what it did
    - stack operations were few and simple and still you didn't like it
    - your ideal is that every stack operation should go, which is what
    you did

    It was the first word in the program that used any stack operations at
    all. I saw that it was more concise and imho more readable without
    them. Other words there were much harder to read.

    If one takes from forth that which makes it efficient, then one takes away its reason for existence. Unfortunately for forth, this is what locals
    users are doing, whether they're aware of it or not.

    I'm not persuaded that the stack ops make Forth efficient. Certainly
    not as much as advanced compilers do, and yet one of the big attractions
    of Forth has been very simple interpreters.

    On my x86-64 laptop, gcc -c -S -Os on

    void emit(char);
    void emits(char c, int n) {
    while (n-- > 0) emit(c);
    }

    gives me 27 bytes, 15 instructions, beating all of the Forth examples.
    Several of the 14 instructions seem related to passing parameters in
    registers. Passing on the stack like in old fashioned systems would
    save a few more, at the expense of some speed. So if I want efficiency,
    I should use C.

    Anyway, if efficiency was important for that example, I'd use CODE.
    In other words forth is not important to you.

    I would say efficiency is usually not very important to me, whether in
    forth or any other language. It's the usual story of programs having
    hot spots. Aim for efficiency in the hot spots and readability and ease
    of implementation everywhere else.

    Also, you define "forth" as using stack ops instead of locals. I don't
    define it that way. Forth with locals is still Forth. They are in the standard after all.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 09:56:42 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    That appears no better than FVALUEs ...

    Those are essentially global variables, with all of their issues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sun Sep 15 16:16:34 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    On 14 Sep 2024 at 08:19:52 CEST, "Anton Ertl" <Anton Ertl> wrote:

    locals stack
    401 336 gforth-fast (AMD64)
    179 132 lxf 1.6-982-823 (IA-32)
    182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
    241 159 VFX Forth 64 5.43 (AMD64)
    163 175 iforth-5.1 mini (AMD64)

    There are design decisions within locals that can impact optimisation.
    The design of locals in VFX was influenced by Don Colburn's Forth's
    and by a desire to use locals to simplify source code when interfacing
    to a host operating system. Many operating systems return data
    to the caller by passing the address of a variable/buffer as an input >parameter. Locals that can have an accessible address make such
    code much easier to read and write.

    Gforth has had variable-flavoured locals from the start, and
    implemented VFX's local-buffer syntax some time ago without problems,
    so Gforth's design decisions are obviously compatible with these
    requirements.

    Now Gforth's numbers above are the worst of all Forth systems, so why
    would Gforth be relevant? The native code for locals by iForth seems
    to be very much in the same spirit: A separate locals stack, and
    locals are accessed relative to the locals-stack pointer; and iForth
    has the best locals code size of all (but looking at the VFX code, my
    guess is that this happens to be in the present case mainly because
    iForth uses RSP for the data stack and some other stack for the return
    stack). Actually, even with your approach of keeping the locals on
    the return stack, and having a separate locals-frame pointer, I don't
    see why the locals code should be worse. But looking at the start of
    the VFX64 code for VICHECK1, there is a bit of superfluous work:

    : VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
    \ paddr is the address of the data, the first cell of which contains
    \ the array size
    pindex 0 paddr @ WITHIN IF \ Index is valid

    VICHECK1
    ( 0050A460 488BD4 ) MOV RDX, RSP
    ( 0050A463 48FF7500 ) PUSH QWORD [RBP]
    ( 0050A467 53 ) PUSH RBX
    ( 0050A468 52 ) PUSH RDX
    ( 0050A469 57 ) PUSH RDI
    ( 0050A46A 488BFC ) MOV RDI, RSP
    ( 0050A46D 4881EC00000000 ) SUB RSP, # 00000000
    ( 0050A474 488B5D08 ) MOV RBX, [RBP+08]
    ( 0050A478 488D6D10 ) LEA RBP, [RBP+10]
    ( 0050A47C 488B5710 ) MOV RDX, [RDI+10]
    ( 0050A480 488B12 ) MOV RDX, 0 [RDX]
    ( 0050A483 B900000000 ) MOV ECX, # 00000000
    ( 0050A488 482BD1 ) SUB RDX, RCX
    ( 0050A48B 488B4718 ) MOV RAX, [RDI+18]
    ( 0050A48F 482BC1 ) SUB RAX, RCX
    ( 0050A492 483BC2 ) CMP RAX, RDX
    ( 0050A495 0F8319000000 ) JNB/AE 0050A4B4

    It's not clear to me why you push so much on the return stack at the
    start, instead of just the two values pindex and paddr (which you do
    in 0050A463 and 0050A467). Ok, you also push old locals-frame pointer
    RDI in 0050A469, which is a result of having the locals on the return
    stack instead of in a separate stack, but why push the old return
    stack pointer? You know the size of your locals, just adjust RSP by
    that much in the end.

    The instruction at 0050A46D seems superfluous. My guess is that it's
    there for the possible | part in the locals definition.

    The next two instructions refill the TOS register RBX and adjust the
    data stack pointer RBP. That completes the code for the locals
    definition. From then on locals are loaded from memory, as
    in iforth. Let's also inspect the end:

    0 paddr \ Use zeroth index
    THEN ;

    ( 0050A535 488D6DF0 ) LEA RBP, [RBP+-10]
    ( 0050A539 48C7450000000000 ) MOV QWord [RBP], # 00000000
    ( 0050A541 48895D08 ) MOV [RBP+08], RBX
    ( 0050A545 488B5F10 ) MOV RBX, [RDI+10]
    ( 0050A549 488B6708 ) MOV RSP, [RDI+08]
    ( 0050A54D 488B3F ) MOV RDI, 0 [RDI]
    ( 0050A550 C3 ) RET/NEXT

    The THEN is right before 0050A549. The code before THEN pushes 0 and paddr
    on the data stack, and stores the former TOS in memory before loading
    the new TOS. The three instructions after the THEN restore the return
    stack and locals-frame pointer and return.

    So there is a little bit that can be done without much effort, but not
    much.

    I always thought that a separate locals stack is a thing I did in
    Gforth out of lazyness, and pay for it by having to maintain a
    separate stack pointer, but it turns out that with locals on the
    return stack, you still need an extra register for locals in memory,
    and you spend additional overhead.

    In the last
    decade or so there has been very little customer demand for
    faster code.

    See below.

    However, higher level source code has been much
    in demand. An example is Nick Nelson's value flavoured structures,
    which are of particular merit when converting code from 32 bit to
    64 bit host Forths.

    Gforth has worked on 64-bit hosts since early 1996, and I found that
    Forth code tends to have fewer portability problems between 32-bit and
    64-bit platforms than C code, and that's not just my code, the
    applications in appbench and many others are also quite portable.

    A major merit for value-flavoured structures is that you can change
    the field size (e.g, from 1 byte to 2 bytes or vice versa) without
    changing all the code accessing those fields. That's independent of
    cell size.

    Just because many of the Forth applications visible to the Forth
    community now run on CPUs with 16 or 32 address registers
    does not mean that all systems can implement the compiler
    techniques required for high-performance locals.

    It's obvious that hardly any Forth system implements register
    allocation of locals, with the exception being lxf, which uses an
    architecture with 8 general-purpose registers (address registers
    recall bad memories from the 68000 days); and for lxf, register
    allocation is limited to basic blocks or less.

    I can buy a lot of CPU cycles for the cost of one day of programmer
    time.

    Some guy called Stephen Pelc (must be a different one) recentlu posted <vbkdu0$1v8lq$1@dont-email.me>:

    |We (MPE) converted much of our TCP/IP stack not to use locals. This
    |was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    |the period (say 15 years ago) were similar. Code density improved by
    |about 25% and performance by about 50%.

    How much time did that conversion cost? And this Stephen Pelc
    suggested that Buzz McCool (and probably everyone else) should also
    spend their time on avoiding and eliminating locals from their code.

    I am with you here, not with the other Stephen Pelc: Programmers
    should use locals liberally if it saves them time, even in the face of
    slow locals implementations, because you can buy a lot of CPU cycles
    for the additional programming cost of avoiding locals.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 12:39:28 2024
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    So by keeping the values on the stack you not just eliminate their
    repeated mention, but also eliminate one branch of the IF.

    Is the repeated mention just a matter of DRY, assuming the compiler puts
    the locals in registers so that the extra mention doesn't transfer them
    between stacks a second time? I do prefer your version where you factor
    out VIERROR.

    I wonder whether Moore's 1999 aversion to locals had something to do
    with his hardware designs of that era, where having more registers
    (besides T and N) connected to the ALU would have cost silicon and
    created timing bottlenecks. Today's mainstream processors have GPR's
    anyway, but I wonder what the real problem was with stack caches like
    the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

    Commenters there say CRISP failed basically because its early
    implementation was buggy, it lost an important design win because of the
    bugs, and AT&T management then gave up on it.

    I remember the SPARC had "register windows" but I don't know if that's
    similar or what went wrong with them.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sun Sep 15 21:35:00 2024
    From Newsgroup: comp.lang.forth

    On 15 Sep 2024 at 18:16:34 CEST, "Anton Ertl" <Anton Ertl> wrote:

    I can buy a lot of CPU cycles for the cost of one day of programmer
    time.

    Some guy called Stephen Pelc (must be a different one) recentlu posted <vbkdu0$1v8lq$1@dont-email.me>:

    |We (MPE) converted much of our TCP/IP stack not to use locals. This
    |was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    |the period (say 15 years ago) were similar. Code density improved by
    |about 25% and performance by about 50%.

    How much time did that conversion cost? And this Stephen Pelc
    suggested that Buzz McCool (and probably everyone else) should also
    spend their time on avoiding and eliminating locals from their code.

    I am with you here, not with the other Stephen Pelc: Programmers
    should use locals liberally if it saves them time, even in the face of
    slow locals implementations, because you can buy a lot of CPU cycles
    for the additional programming cost of avoiding locals.

    What you ignore is that the constraints of embedded systems with small
    alow CPUs (by comparison with desktop CPUs) are very different from
    those of desktop CPUs. Converting the TCP/IP stack was driven by the
    client requirement to fit a TCP/IP app into 128k/256k Flash and 16k RAM.

    I would not make that trade off today.

    So there's only one Stephen Pelc but two application domains.

    Stephen
    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)78 0390 3612, +34 649 662 974
    http://www.mpeforth.com
    MPE website
    http://www.vfxforth.com/downloads/VfxCommunity/
    downloads
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 14:45:22 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    I would not make that trade off today.
    So there's only one Stephen Pelc but two application domains.

    I wonder how much effort de-localizing the TCP/IP stack took, compared
    to hypothetically updating the compiler to optimize locals more. If the
    TCP/IP stack code can compile with iForth or lxf, is there a way to
    compare the code size with VFX's? I can understand wanting to use VFX
    for actual delivery, of course.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 12:46:43 2024
    From Newsgroup: comp.lang.forth

    On 16/09/2024 2:52 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    Going back to the EMITS example:

    - despite lack of comments you quickly deduced what it did
    - stack operations were few and simple and still you didn't like it
    - your ideal is that every stack operation should go, which is what
    you did

    It was the first word in the program that used any stack operations at
    all. I saw that it was more concise and imho more readable without
    them. Other words there were much harder to read.

    If one takes from forth that which makes it efficient, then one takes away >> its reason for existence. Unfortunately for forth, this is what locals
    users are doing, whether they're aware of it or not.

    I'm not persuaded that the stack ops make Forth efficient.

    That's been the evidence thus far.

    Certainly
    not as much as advanced compilers do, and yet one of the big attractions
    of Forth has been very simple interpreters.

    On my x86-64 laptop, gcc -c -S -Os on

    void emit(char);
    void emits(char c, int n) {
    while (n-- > 0) emit(c);
    }

    gives me 27 bytes, 15 instructions, beating all of the Forth examples. Several of the 14 instructions seem related to passing parameters in registers. Passing on the stack like in old fashioned systems would
    save a few more, at the expense of some speed. So if I want efficiency,
    I should use C.

    Yes - if you want efficiency with locals use C since C is built upon a
    locals paradigm. Also modern cpu's are optimized for the likes of C.

    But just because C can beat forth on a benchmark is no reason to dismiss
    either Forth or efficient programming. The weak links are the programmer
    and the tools he's given. All I ever seem to hear about other languages
    is how they make life easy for the programmer. And this is what some are trying to bring to forth. To hell with what they offer I say. The universe gave me a brain. I intend to use it.


    Anyway, if efficiency was important for that example, I'd use CODE.
    In other words forth is not important to you.

    I would say efficiency is usually not very important to me, whether in
    forth or any other language. It's the usual story of programs having
    hot spots. Aim for efficiency in the hot spots and readability and ease
    of implementation everywhere else.

    Also, you define "forth" as using stack ops instead of locals. I don't define it that way. Forth with locals is still Forth. They are in the standard after all.

    I don't believe in religion - the priests, the holy books, the promises.
    I'll take what is and make the best of it.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 14:11:35 2024
    From Newsgroup: comp.lang.forth

    On 16/09/2024 2:56 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    That appears no better than FVALUEs ...

    Those are essentially global variables, with all of their issues.

    With apparently little issue for the case presented. The push is
    to write idiot-proof code that can be used anywhere. Moore calls
    that 'solving the general problem' - which he eschews.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 15 23:32:28 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    With apparently little issue for the case presented. The push is
    to write idiot-proof code that can be used anywhere. Moore calls
    that 'solving the general problem' - which he eschews.

    Didn't one of the Chuck Moore quotes you posted say using the stacks was
    better for information hiding than using globals? That includes the
    return or locals stack, of course. Your computer hardware has the
    capability of accessing inside the stack randomly, and Forth has words
    like 2ROT which reach up to 6 levels deep in the parameter stack.
    What's wrong with being able to give names to the cells? I don't
    understand the obsession with refusing to use those capabilities of your hardware.

    The central idea of Forth to me is its traditional implementation as a
    threaded interpreter with its extremely simple one-pass compiler. That
    made it possible to make a complete interactive development environment
    on a 1970s minicomputer with a floppy disc. All the language features
    like the stack oriented VM are just incidental affordances on the route
    to that simple interpreter. To the extent that there is a cult of the
    stack machine, I don't belong to it.

    Moore calls that 'solving the general problem' - which he eschews.

    The idea as I saw it was don't do extra work to solve the general
    problem, if a simpler approach solves the immediate problem at hand.

    If the general solution takes LESS work then the limited one, then doing
    the extra work for the limited solution is just masochism.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 16 08:48:10 2024
    From Newsgroup: comp.lang.forth

    Twisting even simple problem solutions to fit the stack machine model
    just to make code execution easier in the stack machime falls into
    Knuth's famous "Premature Optimization is the Root of all Evil".

    There are many parallels with some Forth coding styles: https://www.geeksforgeeks.org/premature-optimization/
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 09:13:24 2024
    From Newsgroup: comp.lang.forth

    Hi,
    Here is another version (no locals (flocals), no fvalues, no
    fvariables).
    I tried to factor the code little bit.
    It gives about 81 ns/call (gforth under wsl).

    : x_a_b ( f: x a b c -- x a b c x a b)
    3 fpick 3 fpick 3 fpick
    ;

    : x_b_c ( f: x a b c -- x a b c x b c)
    3 fpick 2 fpick 2 fpick
    ;


    : fwithin ( f: x r s --) ( -- -1|0)
    frot ftuck
    f>= f< and
    ;

    : mv ( f: x r s -- mv)
    fover f- ( f: x r s-r)
    frot frot f- ( f: s-r x-r)
    fswap f/
    ;

    : 4fdrop fdrop fdrop fdrop fdrop ;

    : tri_mf ( f: x a b c -- mv)
    x_a_b fwithin if fdrop mv exit then
    x_b_c fwithin if frot fdrop fswap mv exit then
    4fdrop 0e
    ;

    : neg_big -1e308 -1e 0e tri_mf ;
    : zero -1e 0e 1e tri_mf ;
    : pos_big 0e 1e 1e308 tri_mf ;

    : fuzzify ( f: x)
    fdup neg_big cr f.
    fdup zero cr f.
    pos_big cr f.
    ;


    : go 0 do -0.1e neg_big fdrop loop ;

    utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08081444 ok
    utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.0806888 ok
    utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08064737 ok
    utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08140588 ok
    utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08233884 ok


    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 20:01:32 2024
    From Newsgroup: comp.lang.forth

    On 16/09/2024 4:32 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    With apparently little issue for the case presented. The push is
    to write idiot-proof code that can be used anywhere. Moore calls
    that 'solving the general problem' - which he eschews.

    Didn't one of the Chuck Moore quotes you posted say using the stacks was better for information hiding than using globals?

    He didn't elaborate what he meant by 'information hiding'. OTOH he did
    say "It is necessary to have variables".

    That includes the
    return or locals stack, of course. Your computer hardware has the
    capability of accessing inside the stack randomly, and Forth has words
    like 2ROT which reach up to 6 levels deep in the parameter stack.
    What's wrong with being able to give names to the cells? I don't
    understand the obsession with refusing to use those capabilities of your hardware.

    2ROT assumes '3 pairs' of cells on the stack. But even then, how often is
    it used? I can't imagine juggling 6 items - though I can imagine a locals
    user doing it.

    The central idea of Forth to me is its traditional implementation as a threaded interpreter with its extremely simple one-pass compiler. That
    made it possible to make a complete interactive development environment
    on a 1970s minicomputer with a floppy disc. All the language features
    like the stack oriented VM are just incidental affordances on the route
    to that simple interpreter. To the extent that there is a cult of the
    stack machine, I don't belong to it.

    So you are free of all external influences?

    Moore calls that 'solving the general problem' - which he eschews.

    The idea as I saw it was don't do extra work to solve the general
    problem, if a simpler approach solves the immediate problem at hand.

    If the general solution takes LESS work then the limited one, then doing
    the extra work for the limited solution is just masochism.

    When is a general solution less work? There may be a supposition it
    will result in less work in the future but that's far from guaranteed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 10:13:19 2024
    From Newsgroup: comp.lang.forth

    [..]
    FORTH> tnb
    \ no locals: 5ns/call.
    \ locals: 18.2ns/call.
    \ globals: 6ns/call.
    \ no locals2: 21.9ns/call. ok

    This appears not to be a good idea.
    The root cause is piling up too many
    items on the F-stack (exceeding the
    hardware FPU stack limits).

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 10:36:38 2024
    From Newsgroup: comp.lang.forth

    Thanks for the information.
    So the best is clear.

    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Mon Sep 16 12:19:25 2024
    From Newsgroup: comp.lang.forth

    On 15 Sep 2024 at 23:45:22 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

    Stephen Pelc <stephen@vfxforth.com> writes:
    I would not make that trade off today.
    So there's only one Stephen Pelc but two application domains.

    I wonder how much effort de-localizing the TCP/IP stack took, compared
    to hypothetically updating the compiler to optimize locals more. If the TCP/IP stack code can compile with iForth or lxf, is there a way to
    compare the code size with VFX's? I can understand wanting to use VFX
    for actual delivery, of course.

    On modern desktop CPUs, I would probably spend the effort on
    optimising locals more. However, the ability to provide the address
    of a local is essential in our world. I have not inspected our code
    base to see how many uses of a local declaration of a buffer
    : bah {: ... | FOO[ cell ] ... -- :}
    there are compared to the use of the ADDR (address) operator
    applied to a normally defined local
    : bah {: ... | FOO ... -- :}
    ...
    addr FOO

    Local buffers are remarkably useful.
    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)78 0390 3612, +34 649 662 974
    http://www.mpeforth.com
    MPE website
    http://www.vfxforth.com/downloads/VfxCommunity/
    downloads
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 16 22:47:10 2024
    From Newsgroup: comp.lang.forth

    On 16/09/2024 8:13 pm, mhx wrote:
    [..]
    FORTH> tnb
    \ no locals:  5ns/call.
    \    locals:  18.2ns/call.
    \    globals: 6ns/call.
    \ no locals2: 21.9ns/call. ok

    This appears not to be a good idea.
    The root cause is piling up too many
    items on the F-stack (exceeding the
    hardware FPU stack limits).

    FVALUEs may be the way to go for hardware stack.
    Is this any better?

    : tri_mf ( f: x a b c -- mv)
    3 fpick ( x) 3 fpick ( x a) f>=
    3 fpick ( x) 2 fpick ( x b) f< and if
    fdrop \ x a b
    frot 2 fpick f- \ a b x-a
    fswap frot f- \ x-a b-a
    f/ exit
    then
    3 fpick ( x) 2 fpick ( x b) f>=
    3 fpick ( x) 1 fpick ( x c) f< and if
    frot fdrop \ x b c
    frot fover fswap f- \ b c c-x
    fswap frot f- \ c-x c-b
    f/ exit
    then
    fdrop fdrop fdrop fdrop 0e
    ;



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From melahi_ahmed@melahi_ahmed@yahoo.fr (Ahmed) to comp.lang.forth on Mon Sep 16 13:21:19 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 12:47:10 +0000, dxf wrote:

    On 16/09/2024 8:13 pm, mhx wrote:
    [..]
    FORTH> tnb
    \ no locals:  5ns/call.
    \    locals:  18.2ns/call.
    \    globals: 6ns/call.
    \ no locals2: 21.9ns/call. ok

    This appears not to be a good idea.
    The root cause is piling up too many
    items on the F-stack (exceeding the
    hardware FPU stack limits).

    FVALUEs may be the way to go for hardware stack.
    Is this any better?

    : tri_mf ( f: x a b c -- mv)
    3 fpick ( x) 3 fpick ( x a) f>=
    3 fpick ( x) 2 fpick ( x b) f< and if
    fdrop \ x a b
    frot 2 fpick f- \ a b x-a
    fswap frot f- \ x-a b-a
    f/ exit
    then
    3 fpick ( x) 2 fpick ( x b) f>=
    3 fpick ( x) 1 fpick ( x c) f< and if
    frot fdrop \ x b c
    frot fover fswap f- \ b c c-x
    fswap frot f- \ c-x c-b
    f/ exit
    then
    fdrop fdrop fdrop fdrop 0e
    ;


    Your solution gives the best speed compared to others. With gforth under
    wsl, I find 59ns/call


    Here is the code:
    \ here is your definition

    : tri_mf ( f: x a b c -- mv)
    3 fpick ( x) 3 fpick ( x a) f>=
    3 fpick ( x) 2 fpick ( x b) f< and if
    fdrop \ x a b
    frot 2 fpick f- \ a b x-a
    fswap frot f- \ x-a b-a
    f/ exit
    then
    3 fpick ( x) 2 fpick ( x b) f>=
    3 fpick ( x) 1 fpick ( x c) f< and if
    frot fdrop \ x b c
    frot fover fswap f- \ b c c-x
    fswap frot f- \ c-x c-b
    f/ exit
    then
    fdrop fdrop fdrop fdrop 0e
    ;

    \ and then the code
    : neg_big -1e308 -1e 0e tri_mf ;
    : zero -1e 0e 1e tri_mf ;
    : pos_big 0e 1e 1e308 tri_mf ;

    : fuzzify ( f: x)
    fdup neg_big cr f.
    fdup zero cr f.
    pos_big cr f.
    ;

    : go 0 do -0.1e neg_big fdrop loop ;

    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05871598 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05926772 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05896149 ok
    utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05899284 ok


    Ahmed
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 13:33:53 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 12:47:10 +0000, dxf wrote:

    [..]

    FVALUEs may be the way to go for hardware stack.
    Is this any better?

    : tri_mf ( f: x a b c -- mv)
    3 fpick ( x) 3 fpick ( x a) f>=
    3 fpick ( x) 2 fpick ( x b) f< and if
    fdrop \ x a b
    frot 2 fpick f- \ a b x-a
    fswap frot f- \ x-a b-a
    f/ exit
    then
    3 fpick ( x) 2 fpick ( x b) f>=
    3 fpick ( x) 1 fpick ( x c) f< and if
    frot fdrop \ x b c
    frot fover fswap f- \ b c c-x
    fswap frot f- \ c-x c-b
    f/ exit
    then
    fdrop fdrop fdrop fdrop 0e
    ;

    No, it (no locals3) is worse. FPICK is a
    problem for iForth because in principle
    there can be many values on the FPU stack.
    The easy way out was to flush to memory
    (assuming real Forthers would balk at
    PICK and ROLL anyway).

    The title of this thread is quite
    appropriate: don't pile on the stack,
    don't try to grow it, sparingly re-arrange
    and then consume items with operators
    that do real work.

    FORTH> tnb
    \ no locals: 4.9ns/call.
    \ locals: 18.3ns/call.
    \ globals: 6ns/call.
    \ no locals2: 21.9ns/call.
    \ no locals3: 23.5ns/call. ok

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Mon Sep 16 14:37:50 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 12:19:25 +0000, Stephen Pelc wrote:
    Local buffers are remarkably useful.

    True. In addition, to pass the address of normal locals
    to other words or to external library functions
    (pass-by-reference instead of pass-by-value)
    I borrowed the address operator & from C, like in:

    : FUNC { f: a b -- badr f: aval }
    ... a \ push value of a to fp-stack
    ... &b \ push address of b to stack
    ... ;
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 16:26:51 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    So by keeping the values on the stack you not just eliminate their
    repeated mention, but also eliminate one branch of the IF.

    Is the repeated mention just a matter of DRY, assuming the compiler puts
    the locals in registers so that the extra mention doesn't transfer them >between stacks a second time?

    That, too, but the elimination of the ELSE has more weight with me.

    In the VICHECK ( pindex paddr -- pindex' paddr ) case this favours the locals-less code. For a word that is similar in having an IF where
    only one side has to do something other than to make sure that the
    stack effect is satisfied, but with the stack effect ( x1 x2 -- ), the advantage s with locals code:

    : WORD1 {: x1 x2 -- :}
    ... ( f ) if ( )
    ... x1 ... x2 ...
    then ;

    : WORD2 ( x1 x2 -- )
    ... ( f ) if ( x1 x2 )
    ...
    else
    2drop
    then ;

    Forth has a special word ?DUP for one specific variant of this
    situation, but it helps only in specific cases.

    I wonder whether Moore's 1999 aversion to locals had something to do
    with his hardware designs of that era, where having more registers
    (besides T and N) connected to the ALU would have cost silicon and
    created timing bottlenecks.

    I think he had the aversion long before he did such hardware designs.
    He has been quoted as thinking that humans should do all they can to
    make the computer's work easier (or something like that). While his
    sayings, like any religious text, are sufficiently fuzzy to be
    interpretable in many ways, his denouncing of locals over the years
    makes it clear that he thinks that humans should invest time to write
    code with stack manipulation words and globals, so that the compiler
    does not need to be bloated by the code for dealing with locals.

    Today's mainstream processors have GPR's
    anyway, but I wonder what the real problem was with stack caches like
    the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

    I don't think that the CRISP lived long enough for the real problems
    to become big: In contrast to GPRs or the stacks of Chuck Moore's
    chips, the stack accesses in CRISP alias with potentially all memory
    accesses, so every load of a C variable on a stack may potentially
    have to produce the result of a preceding store (and it often actually
    is the result of the previous instruction). In the last four decades,
    CPU designers have invented a number of techniques for predicting when
    loads don't alias earlier stores, and for fast store-to-load
    forwarding when they do, but these techniques are not cheap. Even
    today, a CPU can do maybe 3 loads and two stores, while they can deal
    with a dozen or so input operands in registers, and maybe 6 output
    operands in registers. The CRISP's successors would have been
    uncompetetive soon after introduction, and I doubt that they would
    ever have reached competetive performance.

    I remember the SPARC had "register windows" but I don't know if that's >similar or what went wrong with them.

    Not at all similar. Register windows were a window into a larger
    register file, no aliasing with memory at all; that was treated as a
    stack of register windows.

    In a similar vein (all heritage of Berkeley RISC) were the AMD 29K's
    and the IA-64's register stack. It's interesting that Forthers were
    never excited about that; the register stack allows to push or pop
    individual registers instead of register windows. I think the pushing
    and popping is not a cheap operation, so you would want to use it only
    at the call, but you could have used it for one of the Forth stacks,
    and avoided some memory accesses that way.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 17:29:20 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    On 15 Sep 2024 at 18:16:34 CEST, "Anton Ertl" <Anton Ertl> wrote:

    I can buy a lot of CPU cycles for the cost of one day of programmer
    time.

    Some guy called Stephen Pelc (must be a different one) recentlu posted
    <vbkdu0$1v8lq$1@dont-email.me>:

    |We (MPE) converted much of our TCP/IP stack not to use locals. This
    |was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
    |the period (say 15 years ago) were similar. Code density improved by
    |about 25% and performance by about 50%.

    How much time did that conversion cost? And this Stephen Pelc
    suggested that Buzz McCool (and probably everyone else) should also
    spend their time on avoiding and eliminating locals from their code.

    I am with you here, not with the other Stephen Pelc: Programmers
    should use locals liberally if it saves them time, even in the face of
    slow locals implementations, because you can buy a lot of CPU cycles
    for the additional programming cost of avoiding locals.

    What you ignore is that the constraints of embedded systems with small
    alow CPUs (by comparison with desktop CPUs) are very different from
    those of desktop CPUs. Converting the TCP/IP stack was driven by the
    client requirement to fit a TCP/IP app into 128k/256k Flash and 16k RAM.

    I would not make that trade off today.

    Interesting. So why mention it in <vbkdu0$1v8lq$1@dont-email.me>
    without adding that? And why do you write "What you ignore is [...]"
    if the situation has vanished.

    In any case, if such a situation still exists or reappears, and/or
    customers who want more performance or smaller code appear, it seems
    to me that the better (more general, i.e., the ultimate evil in the
    eyes of some) solution is a native-code compiler that tries to keep
    all values in registers, whether from the data, return, or FP stack or
    in locals, and tries to do that throughout the definition, not just in
    a basic block.

    I should have found the time to do that long ago, maybe some day I
    will.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 17:37:19 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    On 15 Sep 2024 at 23:45:22 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

    Stephen Pelc <stephen@vfxforth.com> writes:
    I would not make that trade off today.
    So there's only one Stephen Pelc but two application domains.

    I wonder how much effort de-localizing the TCP/IP stack took, compared
    to hypothetically updating the compiler to optimize locals more. If the
    TCP/IP stack code can compile with iForth or lxf, is there a way to
    compare the code size with VFX's? I can understand wanting to use VFX
    for actual delivery, of course.

    On modern desktop CPUs, I would probably spend the effort on
    optimising locals more. However, the ability to provide the address
    of a local is essential in our world. I have not inspected our code
    base to see how many uses of a local declaration of a buffer
    : bah {: ... | FOO[ cell ] ... -- :}
    there are compared to the use of the ADDR (address) operator
    applied to a normally defined local
    : bah {: ... | FOO ... -- :}
    ...
    addr FOO

    Yes, that's why Gforth does not support ADDR for locals by default:

    : bah {: ... | FOO ... -- :}
    ...
    addr foo
    *the terminal*:3:8: error: Unsupported operation
    addr >>>foo<<<

    If you want that, there are two options: Either make it explicit with
    WA: which local should support ADDR:

    : bah {: ... | wa: FOO ... -- :}
    ...
    addr foo
    ;

    compiles without error. Alternatively, you can force slow mode on all
    locals with DEFAULT-WA:. So

    default-wa:

    : bah {: ... | FOO ... -- :}
    ...
    addr foo
    ;

    compiles without error.

    One intermediate option is to warn about ADDR applied to locals
    defined without WA: FA: DA: CA:. Once the program compiles without
    any of these warnings, you can set

    DEFAULT-W:

    to gain the full speed for all the other locals.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 18:58:22 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 17:37:19 +0000, Anton Ertl wrote:

    [..]
    Yes, that's why Gforth does not support ADDR for locals by default:

    iForth supports getting the address of any type local with " 'OF a ".
    This indeed has a negative effect on execution time.

    The experimental PARAMS| a | construct does not support 'OF and tries
    to keep integer locals in a register. It is not successful when there
    are too many locals. Maybe I'll repair that with the next major
    revision.

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Mon Sep 16 12:16:09 2024
    From Newsgroup: comp.lang.forth

    mhx@iae.nl (mhx) writes:
    This appears not to be a good idea. The root cause is piling up too
    many items on the F-stack (exceeding the hardware FPU stack limits).

    I wonder if any Forth compilers use SSE instead of the x86 FPU stack.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 19:26:01 2024
    From Newsgroup: comp.lang.forth

    mhx@iae.nl (mhx) writes:
    The experimental PARAMS| a | construct does not support 'OF and tries
    to keep integer locals in a register.

    Great. And using the same order as {: ... :} is also great. Now if
    only (LOCAL) (which is used by the reference implementation of {:
    ... :}) used the same mechanism.

    I just tried VICHECK1 with PARAMS| ... | instead of {: ... :}

    401 336 gforth-fast (AMD64)
    179 132 lxf 1.6-982-823 (IA-32)
    182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
    241 159 VFX Forth 64 5.43 (AMD64)
    163 175 iforth-5.1 mini (AMD64)
    182 iforth-5.1 mini using PARAMS|

    Looking at the code, you store these registered locals on the locals
    stack before the IF, and then load them into registers again after the
    IF, and then reload them after every call (so apparently the registers
    you use for them are caller-saved in iforth). And the problem in this
    code is that ever local is used at most once between calls, so storing
    it in a caller-saved register results in no better code than storing
    it in memory.

    Let's see how the 3DUP.3 example fares:

    instr. bytes system
    28 103 Gforth AMD64
    16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
    8 11 iforth 5.0.27 PARAMS| (plus 20 bytes entry and return code)
    7 19 lxf 1.6-982-823 32-bit
    32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
    26 92 VFX Forth 64 5.11 RC2

    Yes, in the right setting PARAMS| is very nice, too bad it's not used
    for (LOCAL) (or directly for {:).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Mon Sep 16 19:55:29 2024
    From Newsgroup: comp.lang.forth

    Paul Rubin <no.email@nospam.invalid> writes:
    mhx@iae.nl (mhx) writes:
    This appears not to be a good idea. The root cause is piling up too
    many items on the F-stack (exceeding the hardware FPU stack limits).

    I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

    Gforth 0.7.9_20240821
    [...]
    see f+
    Code f+
    55AF6580BDC1: add rbx,$08
    55AF6580BDC5: mov rax,r12
    55AF6580BDC8: lea r12,$08[r12]
    55AF6580BDCD: addsd xmm15,$08[rax]
    55AF6580BDD3: mov rax,[rbx]
    55AF6580BDD6: jmp eax

    VFX Forth 64 5.11 RC2 [build 0112] 2021-05-02 for Linux x64
    [...]
    see f+
    F+
    ( 004C4100 F2450F584500 ) ADDSD XMM8, [R13]
    ( 004C4106 4983C508 ) ADD R13, # 08
    ( 004C410A C3 ) RET/NEXT
    ( 11 bytes, 3 instructions )

    But:

    VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
    [...]
    see f+
    F+
    ( 00505620 DEC1 ) FADDP ST(1), ST
    ( 00505622 C3 ) RET/NEXT
    ( 3 bytes, 2 instructions )

    The customers of VFX preferred the 80-bit floats.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Mon Sep 16 21:43:13 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 19:16:09 +0000, Paul Rubin wrote:

    mhx@iae.nl (mhx) writes:
    This appears not to be a good idea. The root cause is piling up too
    many items on the F-stack (exceeding the hardware FPU stack limits).

    I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

    iForth would, if my tests had showed any positive effect.
    (The effect has to be substantial to outweigh the advantage of 80-bit
    floats whenever accuracy counts.)

    I wrote routines to process 4 floats. For unfathomable reasons, they
    are not nearly as good a pre-packaged library code. There is only
    limited potential for standard FP code to benefit from SSE. If
    parallelism can't be exploited, SSE does not seem to bring
    anything over the old FPU. But maybe my hardware was not
    good enough a few years back.

    With SSE I need a substantial library for special functions,
    which then become relatively slow DLL calls.

    The only thing wrong with the FPU is that the special stack
    overflow interrupts don't work.

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Tue Sep 17 06:53:43 2024
    From Newsgroup: comp.lang.forth

    On Mon, 16 Sep 2024 19:26:01 +0000, Anton Ertl wrote:

    mhx@iae.nl (mhx) writes:
    The experimental PARAMS| a | construct does not support 'OF and tries
    to keep integer locals in a register.
    [..]
    Yes, in the right setting PARAMS| is very nice, too bad it's not used
    for (LOCAL) (or directly for {:).

    I thought at the time it needed a multi-pass compiler, and the
    implications of that looked dark with respect to my goals (my
    spare time was limited).

    With multiple passes and on-the-fly compilation I think I can do
    better (pre-tests in iSPICE).

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Tue Sep 17 08:43:53 2024
    From Newsgroup: comp.lang.forth

    On 16 Sep 2024 at 21:16:09 CEST, "Paul Rubin" <no.email@nospam.invalid> wrote:

    mhx@iae.nl (mhx) writes:
    This appears not to be a good idea. The root cause is piling up too
    many items on the F-stack (exceeding the hardware FPU stack limits).

    I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

    The current VFX 64 bit systems for x64 allow you to select float packs for
    80x87 8 item internal stack
    hfp87 80x87 external stack
    SSE external stack

    The external function interface adapts automagically to the pack in use.

    Stephen
    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)78 0390 3612, +34 649 662 974
    http://www.mpeforth.com
    MPE website
    http://www.vfxforth.com/downloads/VfxCommunity/
    downloads
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Tue Sep 17 10:47:40 2024
    From Newsgroup: comp.lang.forth

    In article <930672243542a2e04c6cd13d83108af9@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    On Mon, 16 Sep 2024 19:16:09 +0000, Paul Rubin wrote:

    mhx@iae.nl (mhx) writes:
    This appears not to be a good idea. The root cause is piling up too
    many items on the F-stack (exceeding the hardware FPU stack limits).

    I wonder if any Forth compilers use SSE instead of the x86 FPU stack.

    iForth would, if my tests had showed any positive effect.
    (The effect has to be substantial to outweigh the advantage of 80-bit
    floats whenever accuracy counts.)

    I wrote routines to process 4 floats. For unfathomable reasons, they
    are not nearly as good a pre-packaged library code. There is only
    limited potential for standard FP code to benefit from SSE. If
    parallelism can't be exploited, SSE does not seem to bring
    anything over the old FPU. But maybe my hardware was not
    good enough a few years back.

    With SSE I need a substantial library for special functions,
    which then become relatively slow DLL calls.

    The only thing wrong with the FPU is that the special stack
    overflow interrupts don't work.

    In ciforth:
    I added floating point support using the FPU with relatively
    little work, especially because the transcendentals are easy.
    I suspect that it might be not standard. E.g. F+ exhibits
    more precision in 80 bits, and we are supposed to use
    either IEEE 32 or 64 bits. Apparently I'm in good company
    (iforth and vfxforth).
    What does the language lawyers say?


    -marcel
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Tue Sep 17 09:12:40 2024
    From Newsgroup: comp.lang.forth

    On Tue, 17 Sep 2024 8:47:40 +0000, albert@spenarnc.xs4all.nl wrote:
    In ciforth:
    I added floating point support using the FPU with relatively
    little work, especially because the transcendentals are easy.
    I suspect that it might be not standard. E.g. F+ exhibits
    more precision in 80 bits, and we are supposed to use
    either IEEE 32 or 64 bits. Apparently I'm in good company
    (iforth and vfxforth).
    What does the language lawyers say?

    Your are in luck: the internal representation of fp numbers is implementation-defined.

    However fp-alignment restrictions must be observed on affected
    systems, which in itself is a rather superfluous requirement
    since most such systems would crash anyway.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Tue Sep 17 12:18:38 2024
    From Newsgroup: comp.lang.forth

    On 16-09-2024 18:26, Anton Ertl wrote:
    That, too, but the elimination of the ELSE has more weight with me.

    : WORD1 {: x1 x2 -- :}
    ... ( f ) if ( )
    ... x1 ... x2 ...
    then ;

    : WORD2 ( x1 x2 -- )
    ... ( f ) if ( x1 x2 )
    ...
    else
    2drop
    then ;


    You mean - like this?

    : WORD2 ( x1 x2 -- )
    ... ( f ) if ( x1 x2 )
    ...
    exit
    then 2drop ;

    Forth has a special word ?DUP for one specific variant of this
    situation, but it helps only in specific cases.
    That's one of the reasons I don't like it - and don't support it
    natively. The horror of returning two different stack diagrams..

    I loved it when I introduced ;THEN. When doing short words, it allowed
    the 4tH optimizer to kick in and make not one, but *TWO* tail call jumps.

    Hans Bezemer

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 17 07:24:12 2024
    From Newsgroup: comp.lang.forth

    Stephen Pelc <stephen@vfxforth.com> writes:
    The current VFX 64 bit systems for x64 allow you to select float packs for
    80x87 8 item internal stack
    hfp87 80x87 external stack
    SSE external stack

    I guess the next thing is to run that same benchmark with the SSE pack.

    With SSE if you want to do transcendentals, is it usual to use a
    software library that does the numerics? It seems easier for the
    library to use the x87 FPU when it is available.

    Maybe some processors will start supporting IEEE 128 bit floating point someday. I know that the RISC-V architecture contains instructions for
    it, but I don't know of any processor hardware that does it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 18 13:08:39 2024
    From Newsgroup: comp.lang.forth

    On 17/09/2024 2:26 am, Anton Ertl wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    ...
    I wonder whether Moore's 1999 aversion to locals had something to do
    with his hardware designs of that era, where having more registers
    (besides T and N) connected to the ALU would have cost silicon and
    created timing bottlenecks.

    I think he had the aversion long before he did such hardware designs.
    He has been quoted as thinking that humans should do all they can to
    make the computer's work easier (or something like that). While his
    sayings, like any religious text, are sufficiently fuzzy to be
    interpretable in many ways, his denouncing of locals over the years
    makes it clear that he thinks that humans should invest time to write
    code with stack manipulation words and globals, so that the compiler
    does not need to be bloated by the code for dealing with locals.

    When has Moore required humans to do anything? Did he stand up saying
    'Follow me. I'll make you a better programmer, more productive. I'll
    provide you with compilers and a Standard.'? No. That was others doing.
    When the latter had attracted enough of a following they were self-
    sufficient - didn't need Moore, other than perhaps his presence. What differentiates Moore and the group promoting Forth (their version of it),
    is Moore has never changed his position, switched his tune, introduced
    locals and mega-compilers - as the latter do today in an attempt to
    maintain the interest, maintain a following. Of what use are leaders
    without followers.

    "Let me use a tool which I appreciate and if everyone can't use this
    tool well, sorry, but that is not my goal." - C.M.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Tue Sep 17 22:39:51 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    ...Moore has never changed his position, switched his tune, introduced
    locals and mega-compilers - as the latter do today in an attempt to
    maintain the interest, maintain a following.

    Weren't we just quibbling about small (few percent) efficency
    differences between using locals and using stack words? You get far
    greater efficiency gains by using optimizing compilers. If you feel
    like the optimizing compiler is needless bloat and want to use an
    interpreter instead, that's fine, it just means that (like most of us),
    you've found that code speed isn't that important for whatever you're
    doing. Stephen Pelc posted a few days ago that in the past decade, his customers have stopped asking for faster code.

    I think that is a better takeaway than "well we can give up the
    optimizing compiler, because using stack words instead of locals
    recovers a few percent of the lost speed".
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Wed Sep 18 17:30:30 2024
    From Newsgroup: comp.lang.forth

    On 18/09/2024 3:39 pm, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    ...Moore has never changed his position, switched his tune, introduced
    locals and mega-compilers - as the latter do today in an attempt to
    maintain the interest, maintain a following.

    Weren't we just quibbling about small (few percent) efficency
    differences between using locals and using stack words? You get far
    greater efficiency gains by using optimizing compilers. If you feel
    like the optimizing compiler is needless bloat and want to use an
    interpreter instead, that's fine, it just means that (like most of us), you've found that code speed isn't that important for whatever you're
    doing. Stephen Pelc posted a few days ago that in the past decade, his customers have stopped asking for faster code.

    I think that is a better takeaway than "well we can give up the
    optimizing compiler, because using stack words instead of locals
    recovers a few percent of the lost speed".

    I think we're retreading old ground. Orders of 30% reduction in code
    size were in respect of optimizing compilers (VFX). It's consistently
    the case. There may be less to gain for floating point but even there
    locals don't have much over globals. Can't speak for MPE customers
    but neither can I ignore what I see. I can assure you I don't find
    using stack operators a burden. Indeed I find them reassuring as it
    puts me in control. Forth is a niche language. If there's success to
    be had, it will be on its own merits and not ideas imported from other languages.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Wed Sep 18 13:10:31 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    I think we're retreading old ground. Orders of 30% reduction in code
    size were in respect of optimizing compilers (VFX).

    That 30% difference was because VFX doesn't attempt to optimize locals.
    If two pieces of code are obviously equivalent (the locals and no-locals version of EMITS) then a fancier optimizing compiler is likely to
    generate the same code for both.

    What I was getting at though is that VFX even using locals will still
    beat the pants off any interpreter, even without locals. So if you have interpreted Forth code using locals and want it to be faster, you get
    far more gain compiling it with VFX than you would get by undoing the
    locals. If you're already using VFX then yes, you can squeeze out a bit
    more performance by not using locals, but that just tells me that the
    VFX optimizer is still a work in progress (which is fine).

    I can assure you I don't find using stack operators a burden. Indeed
    I find them reassuring as it puts me in control.

    It's hard for me to understand that. If you're using VFX, the stack
    operations are transformed by compiler gyrations to register ops so
    SWAP, ROT, etc. generate no code at all, but this is completely out of
    sight and you have no control over it. Locals on the other hand (in an interpreter) are equivalent to RPICK at specific offsets in obvious
    ways, so there is no loss of control. That also happens with locals in
    VFX but it's only because VFX (for now) hasn't pursued optimizing them.

    That example using FVALUE just seems to be a loss: the storage cells are constantly tied up even when not active. If that function can be used
    in a multitasking environment, you might even need a separate copy for
    each task. Significant efficiency loss.

    Forth is a niche language. If there's success to be had, it will be
    on its own merits and not ideas imported from other languages.

    That seems to support looking at any particular feature on its merits.
    Adding to that a dislike of standardization, it would seem to be up to
    the programmer, with most choices being legitmate for any particular programmer.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Sep 19 14:14:48 2024
    From Newsgroup: comp.lang.forth

    On 19/09/2024 6:10 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    I think we're retreading old ground. Orders of 30% reduction in code
    size were in respect of optimizing compilers (VFX).

    That 30% difference was because VFX doesn't attempt to optimize locals.
    If two pieces of code are obviously equivalent (the locals and no-locals version of EMITS) then a fancier optimizing compiler is likely to
    generate the same code for both.

    What's the evidence? My observation is compilers do not generate native
    code independently of the language. Parameter passing strategies differ between C and Forth and this necessarily affects the code compilers lay
    down.

    ...
    Forth is a niche language. If there's success to be had, it will be
    on its own merits and not ideas imported from other languages.

    That seems to support looking at any particular feature on its merits.
    Adding to that a dislike of standardization, it would seem to be up to
    the programmer, with most choices being legitmate for any particular programmer.

    For me it comes down why have I chosen to use Forth. The philosophy of
    it appeals to me in a way other languages don't. There's the question
    which forth - because forth has essentially split down two paths with
    rather incompatible motivations.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sat Sep 28 13:49:46 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    That 30% difference was because VFX doesn't attempt to optimize locals.
    What's the evidence? My observation is compilers do not generate
    native code independently of the language. Parameter passing
    strategies differ between C and Forth and this necessarily affects the
    code compilers lay down.

    1) comparisons between VFX and other compilers like iForth, 2) the
    observation that there is any difference at all between the generated
    code for the two versions of EMITS under VFX.

    This isn't a question of C vs Forth. It's two equivalent pieces of
    Forth code being compiled by the same optimizing Forth compiler, one
    version resulting in worse code instead of identical code.

    For me it comes down why have I chosen to use Forth. The philosophy
    of it appeals to me in a way other languages don't. There's the
    question which forth - because forth has essentially split down two
    paths with rather incompatible motivations.

    I gather that one path is industrial users who want there to be a
    standard with well-supported commercial implementations, and who want to
    run development projects with large teams of programmers (the Saudi
    airport being the classic example).

    I guess the other path is something like solo practitioners who don't
    really care about standardization, perhaps because they just want the
    most direct way to an end result. Philosophical appeal is another such motivation. That's fine too, but partly a matter of personal taste.

    What I'm unclear about is what the philosophical purist path has to say
    about optimizing compilers. I think anyone wanting to reject locals for reasons of code efficiency, probably should be using a VFX-style
    compiler. My own idea of purity says to use a simple interpreter and
    accept the speed penalty, using CODE when needed.

    FWIW, most of the code I write these days doesn't spend much time on computation. It might spend 100ms retrieving something over the
    network, and then 1ms computing. So if the computing part somehow sped
    up by 1000x, I wouldn't notice or care about the difference.

    FWIW 2, I suspect most computing operations in the real world right now
    are spent in GPU kernels or large parallel batch jobs, rather than in
    ordinary single-CPU programs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sat Sep 28 22:36:12 2024
    From Newsgroup: comp.lang.forth

    On Sat, 28 Sep 2024 20:49:46 +0000, Paul Rubin wrote:
    [..]
    FWIW 2, I suspect most computing operations in the real world right
    now are spent in GPU kernels or large parallel batch jobs, rather
    than in ordinary single-CPU programs.

    Analog ( IC ) simulation can't be sped up with parallel tricks
    like that. Unless the simulator is written from scratch.

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Sep 29 14:22:26 2024
    From Newsgroup: comp.lang.forth

    On 29/09/2024 6:49 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    That 30% difference was because VFX doesn't attempt to optimize locals.
    What's the evidence? My observation is compilers do not generate
    native code independently of the language. Parameter passing
    strategies differ between C and Forth and this necessarily affects the
    code compilers lay down.

    1) comparisons between VFX and other compilers like iForth, 2) the observation that there is any difference at all between the generated
    code for the two versions of EMITS under VFX.

    This isn't a question of C vs Forth.

    Perhaps I misunderstood. So we agree Forth locals are unlikely to ever
    match C locals for performance?

    It's two equivalent pieces of
    Forth code being compiled by the same optimizing Forth compiler, one
    version resulting in worse code instead of identical code.

    I don't know whether it's possible to make forth code using locals as
    efficient as forth code using stack operations. What I do question is
    the necessity for it and the wisdom of it.

    For me it comes down why have I chosen to use Forth. The philosophy
    of it appeals to me in a way other languages don't. There's the
    question which forth - because forth has essentially split down two
    paths with rather incompatible motivations.

    I gather that one path is industrial users who want there to be a
    standard with well-supported commercial implementations, and who want to
    run development projects with large teams of programmers (the Saudi
    airport being the classic example).

    According to Elizabeth polyFORTH was used for that project. When c.l.f.
    was aflame with 200x standards discussions, I recall asking how it was
    no commercial programmers seemed to be participating. She replied words
    to the effect they were busy programming. Certainly Forth Inc's early successes didn't rely on the existence of a standard.

    I guess the other path is something like solo practitioners who don't
    really care about standardization, perhaps because they just want the
    most direct way to an end result. Philosophical appeal is another such motivation. That's fine too, but partly a matter of personal taste.

    FWIW here's Jeff Fox' take on the topic:

    https://www.ultratechnology.com/antiansi.htm
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sun Sep 29 14:40:58 2024
    From Newsgroup: comp.lang.forth

    In article <87h69zcxlh.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    <SNIP>
    What I'm unclear about is what the philosophical purist path has to say
    about optimizing compilers. I think anyone wanting to reject locals for >reasons of code efficiency, probably should be using a VFX-style
    compiler. My own idea of purity says to use a simple interpreter and
    accept the speed penalty, using CODE when needed.

    Maybe I'm a purist. Indirect threaded code is a clear expression
    of programmers intent. That is the ideal foundation on which to
    build optimisers. The only requirement for an optimiser is that
    the results are the same. The program can be shorter or faster.
    Locals are a hindrance.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From mhx@mhx@iae.nl (mhx) to comp.lang.forth on Sun Sep 29 16:53:11 2024
    From Newsgroup: comp.lang.forth

    On Sun, 29 Sep 2024 12:40:58 +0000, albert@spenarnc.xs4all.nl wrote:

    [..]
    Maybe I'm a purist. Indirect threaded code is a clear expression
    of programmers intent. That is the ideal foundation on which to
    build optimisers.

    Do you mean that we can debate endlessly what Forth is exactly,
    should be, or should become, while that is a non-issue when
    we simply pronounce that a given and frozen ITC implementation
    exactly defines Forth?

    Once you have such ITC implementation it is possible to
    compile to anything else, which then can be made
    indistinguishable within a testable margin of error. But
    if that is agreed, surely there is no need to start
    from ITC.

    I personally don't think one can say that ITC expresses (or
    is able to express) Forth *better* than DTC, TTC, STC or
    native code. Additionally, surely STC is a lot *clearer to
    understand* than ITC?

    -marcel
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 29 11:33:10 2024
    From Newsgroup: comp.lang.forth

    albert@spenarnc.xs4all.nl writes:
    My own idea of purity says to use a simple interpreter and accept the
    speed penalty, using CODE when needed.

    Indirect threaded code is a clear expression of programmers intent.
    The only requirement for an optimiser is that the results are the
    same. The program can be shorter or faster. Locals are a hindrance.

    Well, the philsophical idea I'm coming from is that Forth is a difficult languge that makes unusual demands on the programmer. That is a cost of
    using it. In exchange it gives extreme simplicity and clarity of implementation, and the ability to host itself on very limited machines.
    Those are benefits.

    If you're going to implement an optimizing compiler, you've got the
    machine resources to host it and the willingness to deal with its
    complexity. That is, you're not really in need of Forth's benefits. So
    maybe you can also bypass some of its costs.

    Thus, I think of a "pure" Forth as a simple interpreter (maybe not
    ITC). Once I have a bigger machine etc., I start thinking about Lisp.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Sun Sep 29 11:44:31 2024
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    Perhaps I misunderstood. So we agree Forth locals are unlikely to
    ever match C locals for performance?

    This I don't know. If the issue is parameter passing in registers,
    maybe a fancy enough Forth compiler could do that.

    I don't know whether it's possible to make forth code using locals as efficient as forth code using stack operations. What I do question is
    the necessity for it and the wisdom of it.

    I think in case of an interpreter, locals might be more efficient, since
    as the thread title says, they treat the stack as an array. The
    hardware is built to do that, so why not use it? With an optimizing
    compiler, I think they should usually be equivalent in principle.

    Certainly Forth Inc's early successes didn't rely on the existence of
    a standard.

    In those days there was only one significant implementation ;).

    https://www.ultratechnology.com/antiansi.htm

    I remember that from a while back and will look at again. The context
    though was a Forth chip with stack hardware, being compared against a
    software interpreter.

    I miss Jeff but must also remember that he was sometimes prone to
    hyperbole.

    Do you still use blocks instead of files nowadays?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dxf@dxforth@gmail.com to comp.lang.forth on Mon Sep 30 16:13:43 2024
    From Newsgroup: comp.lang.forth

    On 30/09/2024 4:44 am, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    Perhaps I misunderstood. So we agree Forth locals are unlikely to
    ever match C locals for performance?

    This I don't know. If the issue is parameter passing in registers,
    maybe a fancy enough Forth compiler could do that.

    IMO no because C doesn't have the complication of a permanent parameter
    stack. C typically pushes parameters onto the cpu stack which are the
    locals, and which the calling function eventually discards. In forth
    locals amount to a 2-step process - pushing parameters onto the data stack, pulling them off as locals and potentially storing them back. Contrary
    to what one may imagine this is more costly than 'stack juggling' which
    has become a pejorative. Forth has a data stack. It's left to the user
    to optimize it, or to abuse it, as he sees fits.

    I don't know whether it's possible to make forth code using locals as
    efficient as forth code using stack operations. What I do question is
    the necessity for it and the wisdom of it.

    I think in case of an interpreter, locals might be more efficient, since
    as the thread title says, they treat the stack as an array. The
    hardware is built to do that, so why not use it? With an optimizing compiler, I think they should usually be equivalent in principle.

    I don't understand the reference to 'interpreter'. Having an interactive environment with incremental compiler is very convenient but mostly I'm
    coding for a target, the same as any C programmer.

    ...
    Do you still use blocks instead of files nowadays?

    For applications I've always used files as that's the norm for CP/M
    and MS-DOS. ANS-style file functions suit this very well. For forth
    source I use files organized as 'screens'. DX-Forth comes with TED -
    a regular text editor that can be used within forth - but personally
    I prefer screens.

    --- Synchronet 3.20a-Linux NewsLink 1.114