On 08/10/2024 09:28, Anton Ertl wrote:.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library. (Standard
library implementations don't need to be portable, and can rely on
extensions or other compiler-specific features.)
On Wed, 9 Oct 2024 8:24:34 +0000, David Brown wrote:
On 08/10/2024 09:28, Anton Ertl wrote:.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library. (Standard
library implementations don't need to be portable, and can rely on
extensions or other compiler-specific features.)
Somebody has to write memmove() and they want to use C to do it.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never".
On Wed, 9 Oct 2024 8:24:34 +0000, David Brown wrote:
On 08/10/2024 09:28, Anton Ertl wrote:.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library. (Standard
library implementations don't need to be portable, and can rely on
extensions or other compiler-specific features.)
Somebody has to write memmove() and they want to use C to do it.
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library memmove() function!).
On 09/10/2024 18:28, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 8:24:34 +0000, David Brown wrote:
On 08/10/2024 09:28, Anton Ertl wrote:.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library. (Standard
library implementations don't need to be portable, and can rely on
extensions or other compiler-specific features.)
Somebody has to write memmove() and they want to use C to do it.
They don't have to write it in standard, portable C. Standard libraries will, sometimes, use "magic" - they can be in assembly, or use compiler extensions, or target-specific features, or "-fsecret-flag-for-std-lib" compiler flags, or implementation-dependent features, or whatever they
want.
You will find that most implementations of memmove() are done by
converting the pointers to a unsigned integer type and comparing those values. The type chosen may be implementation-dependent, or it may be "uintptr_t" (even if you are using C90 for your code, the library
writers can use C99 for theirs).
Such implementations will not be portable to all systems. They won't
work on a target that has some kind of "fat" pointers or segmented
pointers that can't be translated properly to integers.
That's okay, of course. For targets that have such complications, that standard library function will be written in a different way.
The avrlibc library used by gcc for the AVR has its memmove()
implemented in assembly for speed, as does musl for some architectures.
There are lots of parts of the standard C library that cannot be written completely in portable standard C. (How would you write a function that handles files? You need non-portable OS calls.) That's why these
things are in the standard library in the first place.
On 10/9/2024 1:20 PM, David Brown wrote:
There are lots of parts of the standard C library that cannot be written
completely in portable standard C. (How would you write a function that
handles files?
--- Synchronet 3.20a-Linux NewsLink 1.114You need non-portable OS calls.) That's why these
things are in the standard library in the first place.
On 10/9/2024 1:20 PM, David Brown wrote:
There are lots of parts of the standard C library that cannot be
written completely in portable standard C. (How would you write a
function that handles files? You need non-portable OS calls.) That's
why these things are in the standard library in the first place.
I agree with everything you say up until the last sentence. There are several languages, mostly older ones like Fortran and COBOL, where the
file handling/I/O are defined portably within the language proper, not
in a separate library. It just moves the non-portable stuff from the library writer (as in C) to the compiler writer (as in Fortran, COBOL,
etc.)
On Wed, 9 Oct 2024 21:52:39 +0000, Stephen Fuld wrote:
On 10/9/2024 1:20 PM, David Brown wrote:
There are lots of parts of the standard C library that cannot be written >>> completely in portable standard C. (How would you write a function that >>> handles files?
Do you mean things other than open(), close(), read(), write(), lseek()
??
You need non-portable OS calls.) That's why these
things are in the standard library in the first place.
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can
implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects? >>>>> For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can
implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects? >>>>>> For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can
implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you write
an efficient memmove() in standard C. That's why I said there was no >connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up that
is proportionally more costly for small transfers. Often that can be >eliminated when the compiler optimises the functions inline - when the >compiler knows the size of the move/copy, it can optimise directly.
The use of wider register sizes can help to some extent, but not once
you have reached the width of the internal buses or cache bandwidth.
In general, there will be many aspects of a C compiler's code generator,
its run-time support library, and C standard libraries that can work
better if they are optimised for each new generation of processor.
Sometimes you just need to re-compile the library with a newer compiler
and appropriate flags, other times you need to modify the library source >code. None of this is specific to memmove().
But it is true that you get an easier and more future-proof memmove()
and memcopy() if you have an ISA that supports scalable vector
processing of some kind, such as ARM and RISC-V have, rather than
explicitly sized SIMD registers.
David Brown <david.brown@hesbynett.no> writes:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said there
was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly
or using inline assembly, rather than in non-portable C (which is
the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that
can be eliminated when the compiler optimises the functions inline -
when the compiler knows the size of the move/copy, it can optimise >directly.
The use of wider register sizes can help to some extent, but not
once you have reached the width of the internal buses or cache
bandwidth.
In general, there will be many aspects of a C compiler's code
generator, its run-time support library, and C standard libraries
that can work better if they are optimised for each new generation
of processor. Sometimes you just need to re-compile the library with
a newer compiler and appropriate flags, other times you need to
modify the library source code. None of this is specific to
memmove().
But it is true that you get an easier and more future-proof
memmove() and memcopy() if you have an ISA that supports scalable
vector processing of some kind, such as ARM and RISC-V have, rather
than explicitly sized SIMD registers.
Note that ARMv8 (via FEAT_MOPS) does offer instructions that handle
memcpy and memset.
They're three-instruction sets; prolog/body/epilog. There are
separate sets for forward vs. forward-or-backward copies.
The prolog instruction preconditions the copy and copies
an IMPDEF portion.
The body instruction performs an IMPDEF Portion and
the epilog instruction finalizes the copy.
The three instructions are issued consecutively.
On Thu, 10 Oct 2024 20:00:29 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
David Brown <david.brown@hesbynett.no> writes:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said there
was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly
or using inline assembly, rather than in non-portable C (which is
the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that
can be eliminated when the compiler optimises the functions inline -
when the compiler knows the size of the move/copy, it can optimise
directly.
The use of wider register sizes can help to some extent, but not
once you have reached the width of the internal buses or cache
bandwidth.
In general, there will be many aspects of a C compiler's code
generator, its run-time support library, and C standard libraries
that can work better if they are optimised for each new generation
of processor. Sometimes you just need to re-compile the library with
a newer compiler and appropriate flags, other times you need to
modify the library source code. None of this is specific to
memmove().
But it is true that you get an easier and more future-proof
memmove() and memcopy() if you have an ISA that supports scalable
vector processing of some kind, such as ARM and RISC-V have, rather
than explicitly sized SIMD registers.
Note that ARMv8 (via FEAT_MOPS) does offer instructions that handle
memcpy and memset.
They're three-instruction sets; prolog/body/epilog. There are
separate sets for forward vs. forward-or-backward copies.
The prolog instruction preconditions the copy and copies
an IMPDEF portion.
The body instruction performs an IMPDEF Portion and
the epilog instruction finalizes the copy.
The three instructions are issued consecutively.
People that have more clue about Arm Inc schedule than myself
expect Arm Cortex cores that implement these instructions to be
announced next May and to appear in actual [expensive] phones in 2026.
Which probably means 2027 at best for Neoverse cores.
It's hard to make an educated guess about schedule of other Arm core >designers.
The existence of a dedicated assembly instruction does not let you write an efficient memmove() in standard C. That's why I said there was no connection
between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or using inline assembly, rather than in non-portable C (which is the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and memcpy() on large transfers, and the overhead in setting things up that is proportionally
more costly for small transfers. Often that can be eliminated when the compiler optimises the functions inline - when the compiler knows the size of
the move/copy, it can optimise directly.
The use of wider register sizes can help to some extent, but not once you have
reached the width of the internal buses or cache bandwidth.
In general, there will be many aspects of a C compiler's code generator, its run-time support library, and C standard libraries that can work better if they
are optimised for each new generation of processor. Sometimes you just need to
re-compile the library with a newer compiler and appropriate flags, other times
you need to modify the library source code. None of this is specific to memmove().
But it is true that you get an easier and more future-proof memmove() and memcopy() if you have an ISA that supports scalable vector processing of some
kind, such as ARM and RISC-V have, rather than explicitly sized SIMD registers.
On 10/10/2024 20:38, MitchAlsup1 wrote:
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you write
an efficient memmove() in standard C.
That's why I said there was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up that
is proportionally more costly for small transfers.
Often that can be eliminated when the compiler optimises the functions inline - when the compiler knows the size of the move/copy, it can optimise directly.
On 10/10/24 2:21 PM, David Brown wrote:
[ SNIP]
If the compiler generates the memmove instruction, then one doesn't
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said there
was no connection between the two concepts.
have to write memmove() is C - it is never called/used.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that
can be eliminated when the compiler optimises the functions inline -
when the compiler knows the size of the move/copy, it can optimise
directly.
The use of wider register sizes can help to some extent, but not once
you have reached the width of the internal buses or cache bandwidth.
In general, there will be many aspects of a C compiler's code
generator, its run-time support library, and C standard libraries that
can work better if they are optimised for each new generation of
processor. Sometimes you just need to re-compile the library with a
newer compiler and appropriate flags, other times you need to modify
the library source code. None of this is specific to memmove().
But it is true that you get an easier and more future-proof memmove()
and memcopy() if you have an ISA that supports scalable vector
processing of some kind, such as ARM and RISC-V have, rather than
explicitly sized SIMD registers.
Not applicable.
On Thu, 10 Oct 2024 19:21:20 +0000, David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you write
an efficient memmove() in standard C.
{
memmove( p, q, size );
}
Where the compiler produces the MM instruction itself. Looks damn
close to standard C to me !!
OR
for( int i = 0, i < size; i++ )
p[i] = q[i];
Which gets compiled to memcpy()--also looks to be standard C.
OR
p_struct = q_struct;
gets compiled to::
memmove( &p_struct, &q_struct, sizeof( q_struct ) );
also looks to be std C.
On 10/10/2024 23:19, Brian G. Lucas wrote:
Not applicable.
I don't understand what you mean by that. /What/ is not applicable
to /what/ ?
On Fri, 11 Oct 2024 13:37:03 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 10/10/2024 23:19, Brian G. Lucas wrote:
Not applicable.
I don't understand what you mean by that. /What/ is not applicable
to /what/ ?
Brian probably meant to say that that it is not applicable to his my66k
LLVM back end.
But I am pretty sure that what you suggest is applicable, but bad idea
for memcpy/memmove routine that targets Arm+SVE.
Dynamic dispatch based on concrete core features/identification, i.e.
exactly the same mechanism that is done on "non-scalable"
architectures, would provide better performance. And memcpy/memmove is certainly sufficiently important to justify an additional development
effort.
On 10/9/2024 1:20 PM, David Brown wrote:
There are lots of parts of the standard C library that cannot be
written completely in portable standard C. (How would you write
a function that handles files? You need non-portable OS calls.)
That's why these things are in the standard library in the first
place.
I agree with everything you say up until the last sentence. There
are several languages, mostly older ones like Fortran and COBOL,
where the file handling/I/O are defined portably within the
language proper, not in a separate library. It just moves the
non-portable stuff from the library writer (as in C) to the
compiler writer (as in Fortran, COBOL, etc.)
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I know you
are not clueless.
This discussion has become pointless.
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different
objects? For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they
can implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard
library memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I know you
are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch prediction is easier and the code is shorter.
Though I guess forwarding a const is probably a thing today to improve
branch prediction, which is normally HORRIBLE for short branch counts.
On 10/12/24 12:06 AM, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:Yes.
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch
prediction is easier and the code is shorter.
#include <string.h>
void memmoverr(char to[], char fm[], size_t cnt)
{
memmove(to, fm, cnt);
}
void memmoverd(char to[], char fm[])
{
memmove(to, fm, 0x100000000);
}
Yields:
memmoverr: ; @memmoverr
mm r1,r2,r3
ret
memmoverd: ; @memmoverd
mm r1,r2,#4294967296
ret
Though I guess forwarding a const is probably a thing today to improve
branch prediction, which is normally HORRIBLE for short branch counts.
On 12/10/2024 01:32, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I know you
are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
Again, I have to ask - do you bother to read the posts you reply to?
Are you interested in replying, and engaging in the discussion? Or are
you just looking for a chance to promote your own architecture, no
matter how tenuous the connection might be to other posts?
Again, let me say that I agree with what you are saying - I agree that
an ISA should have instructions that are efficient for what people
actually want to do. I agree that it is a good thing to have
instructions that let performance scale with advances in hardware
ideally without needing changes in compiled binaries, and at least
without needing changes in source code.
I believe there is an interesting discussion to be had here, and I would enjoy hearing about comparisons of different ways things functions like memcpy() and memset() can be implemented in different architectures and optimised for different sizes, or how scalable vector instructions can
work in comparison to fixed-size SIMD instructions.
But at the moment, this potential is lost because you are posting total shite about implementing memmove() in standard C. It is disappointing
that someone with your extensive knowledge and experience cannot see
this. I am finding it all very frustrating.
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch prediction is easier and the code is shorter.
Though I guess forwarding a const is probably a thing today to improve
branch prediction, which is normally HORRIBLE for short branch counts.
Brian G. Lucas <bagel99@gmail.com> wrote:
On 10/12/24 12:06 AM, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:Yes.
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch
prediction is easier and the code is shorter.
#include <string.h>
void memmoverr(char to[], char fm[], size_t cnt)
{
memmove(to, fm, cnt);
}
void memmoverd(char to[], char fm[])
{
memmove(to, fm, 0x100000000);
}
Yields:
memmoverr: ; @memmoverr
mm r1,r2,r3
ret
memmoverd: ; @memmoverd
mm r1,r2,#4294967296
ret
Excellent!
Though I guess forwarding a const is probably a thing today to improve
branch prediction, which is normally HORRIBLE for short branch counts.
What is the default virtual loop count if the register count is not available?
Worst case the source and dest are in cache, and the count is 150 cycles
away in memory. So hundreds of chars could be copied until the value is loaded and that count value could be say 5.
Lots of work and time--- Synchronet 3.20a-Linux NewsLink 1.114
discarded, so you play the odds, perhaps to the low side and over
prefetch to cover being wrong.
On Sat, 12 Oct 2024 18:17:18 +0000, Brett wrote:
Brian G. Lucas <bagel99@gmail.com> wrote:
On 10/12/24 12:06 AM, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:Yes.
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch
prediction is easier and the code is shorter.
#include <string.h>
void memmoverr(char to[], char fm[], size_t cnt)
{
memmove(to, fm, cnt);
}
void memmoverd(char to[], char fm[])
{
memmove(to, fm, 0x100000000);
}
Yields:
memmoverr: ; @memmoverr
mm r1,r2,r3
ret
memmoverd: ; @memmoverd
mm r1,r2,#4294967296
ret
Excellent!
Though I guess forwarding a const is probably a thing today to improve >>>> branch prediction, which is normally HORRIBLE for short branch counts.
What is the default virtual loop count if the register count is not
available?
There is always a count available; it can come from a register or an immediate.
Worst case the source and dest are in cache, and the count is 150 cycles
away in memory. So hundreds of chars could be copied until the value is
loaded and that count value could be say 5.
The instruction cannot start until the count in known. You don't start
an FMAC until all 3 operands are ready, either.
Lots of work and time
discarded, so you play the odds, perhaps to the low side and over
prefetch to cover being wrong.
On Sat, 12 Oct 2024 18:17:18 +0000, Brett wrote:[snip]
Worst case the source and dest are in cache, and the count is
150 cycles
away in memory. So hundreds of chars could be copied until the
value is
loaded and that count value could be say 5.
The instruction cannot start until the count in known. You don't
start
an FMAC until all 3 operands are ready, either.
David Brown <david.brown@hesbynett.no> wrote:
On 12/10/2024 01:32, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I know you >>>> are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
Again, I have to ask - do you bother to read the posts you reply to?
Are you interested in replying, and engaging in the discussion? Or are
you just looking for a chance to promote your own architecture, no
matter how tenuous the connection might be to other posts?
Again, let me say that I agree with what you are saying - I agree that
an ISA should have instructions that are efficient for what people
actually want to do. I agree that it is a good thing to have
instructions that let performance scale with advances in hardware
ideally without needing changes in compiled binaries, and at least
without needing changes in source code.
I believe there is an interesting discussion to be had here, and I would
enjoy hearing about comparisons of different ways things functions like
memcpy() and memset() can be implemented in different architectures and
optimised for different sizes, or how scalable vector instructions can
work in comparison to fixed-size SIMD instructions.
But at the moment, this potential is lost because you are posting total
shite about implementing memmove() in standard C. It is disappointing
that someone with your extensive knowledge and experience cannot see
this. I am finding it all very frustrating.
In short your complaints are wrong headed in not understanding what
hardware memcpy can do.
On Sat, 12 Oct 2024 5:06:05 +0000, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
Can R3 be a const, that causes issues for restartability, but branch prediction is easier and the code is shorter.
The 3rd Operand can, indeed, be a constant.
That causes no restartability problem when you have a place to
store the current count==index, so that when control returns
and you re-execute MM, it sees that x amount has already been
done, and C-X is left.
On 11/10/2024 14:13, Michael S wrote:
On Fri, 11 Oct 2024 13:37:03 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 10/10/2024 23:19, Brian G. Lucas wrote:
Not applicable.
I don't understand what you mean by that. /What/ is not applicable
to /what/ ?
Brian probably meant to say that that it is not applicable to his
my66k LLVM back end.
But I am pretty sure that what you suggest is applicable, but bad
idea for memcpy/memmove routine that targets Arm+SVE.
Dynamic dispatch based on concrete core features/identification,
i.e. exactly the same mechanism that is done on "non-scalable" architectures, would provide better performance. And memcpy/memmove
is certainly sufficiently important to justify an additional
development effort.
That explanation helps a little, but only a little. I wasn't
suggesting anything - or if I was, it was several posts ago and the
context has long since been snipped.
Can you be more explicit about
what you think I was suggesting, and why it might not be a good idea
for targeting a "my66k" ISA? (That is not a processor I have heard
of, so you'll have to give a brief summary of any particular features
that are relevant here.)
On 2024-10-12 21:33, Brett wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 12/10/2024 01:32, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I
know you are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
Again, I have to ask - do you bother to read the posts you reply
to? Are you interested in replying, and engaging in the
discussion? Or are you just looking for a chance to promote your
own architecture, no matter how tenuous the connection might be to
other posts?
Again, let me say that I agree with what you are saying - I agree
that an ISA should have instructions that are efficient for what
people actually want to do. I agree that it is a good thing to
have instructions that let performance scale with advances in
hardware ideally without needing changes in compiled binaries, and
at least without needing changes in source code.
I believe there is an interesting discussion to be had here, and I
would enjoy hearing about comparisons of different ways things
functions like memcpy() and memset() can be implemented in
different architectures and optimised for different sizes, or how
scalable vector instructions can work in comparison to fixed-size
SIMD instructions.
But at the moment, this potential is lost because you are posting
total shite about implementing memmove() in standard C. It is
disappointing that someone with your extensive knowledge and
experience cannot see this. I am finding it all very frustrating.
[ snip discussion of HW ]
In short your complaints are wrong headed in not understanding what hardware memcpy can do.
I think your reply proves David's complaint: you did not read, or did
not understand, what David is frustrated about. The true fact that
David is defending is that memmove() cannot be implemented
"efficiently" in /standard/ C source code, on /any/ HW, because it
would require comparing /C pointers/ that point to potentially
different /C objects/, which is not defined behavior in standard C,
whether compiled to machine code, or executed by an interpreter of C
code, or executed by a human programmer performing what was called
"desk testing" in the 1960s.
Obviously memmove() can be implemented efficently in non-standard C
where such pointers can be compared, or by sequences of ordinary ALU instructions, or by dedicated instructions such as Mitch's MM, and
David is not disputing that. But Mitch seems not to understand or not
to see the issue about standard C vs memmove().
On Sun, 13 Oct 2024 10:31:49 +0300
Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:
On 2024-10-12 21:33, Brett wrote:
David Brown <david.brown@hesbynett.no> wrote:
But at the moment, this potential is lost because you are posting
total shite about implementing memmove() in standard C. It is
disappointing that someone with your extensive knowledge and
experience cannot see this. I am finding it all very frustrating.
[ snip discussion of HW ]
In short your complaints are wrong headed in not understanding what
hardware memcpy can do.
I think your reply proves David's complaint: you did not read, or did
not understand, what David is frustrated about. The true fact that
David is defending is that memmove() cannot be implemented
"efficiently" in /standard/ C source code, on /any/ HW, because it
would require comparing /C pointers/ that point to potentially
different /C objects/, which is not defined behavior in standard C,
whether compiled to machine code, or executed by an interpreter of C
code, or executed by a human programmer performing what was called
"desk testing" in the 1960s.
Obviously memmove() can be implemented efficently in non-standard C
where such pointers can be compared, or by sequences of ordinary ALU
instructions, or by dedicated instructions such as Mitch's MM, and
David is not disputing that. But Mitch seems not to understand or not
to see the issue about standard C vs memmove().
Sufficiently advanced compiler can recognize patterns and replace them
with built-in sequences.
In case of memmove() the most easily recognizable pattern in 100%
standard C99 appears to be:
void *memmove( void *dest, const void *src, size_t count)
{
if (count > 0) {
char tmp[count];
memcpy(tmp, src, count);
memcpy(dest, tmp, count);
}
return dest;
}
I don't suggest that real implementation in Brian's compiler is like
that. Much more likely his implementation uses non-standard C and looks approximately like:
void *memmove(void *dest, const void *src, size_t count {
return __builtin_memmove(dest, src, count);
}
However, implementing the first variant efficiently is well within
abilities of good compiler.
On 12.10.24 17:16, David Brown wrote:
[snip rant]
You are aware that this is c.arch, not c.lang.c?
David Brown <david.brown@hesbynett.no> wrote:
On 12/10/2024 01:32, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I know you >>>> are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
Again, I have to ask - do you bother to read the posts you reply to?
Are you interested in replying, and engaging in the discussion? Or are
you just looking for a chance to promote your own architecture, no
matter how tenuous the connection might be to other posts?
Again, let me say that I agree with what you are saying - I agree that
an ISA should have instructions that are efficient for what people
actually want to do. I agree that it is a good thing to have
instructions that let performance scale with advances in hardware
ideally without needing changes in compiled binaries, and at least
without needing changes in source code.
I believe there is an interesting discussion to be had here, and I would
enjoy hearing about comparisons of different ways things functions like
memcpy() and memset() can be implemented in different architectures and
optimised for different sizes, or how scalable vector instructions can
work in comparison to fixed-size SIMD instructions.
But at the moment, this potential is lost because you are posting total
shite about implementing memmove() in standard C. It is disappointing
that someone with your extensive knowledge and experience cannot see
this. I am finding it all very frustrating.
There are only two decisions to make in memcpy, are the copies less than
copy sized aligned, and do the pointers overlap in copy size.
For hardware this simplifies down to perhaps two types of copies, easy and hard.
If you make hard fast, and you will, then two versions is all you need, not the dozens of choices with 1k of code you need in C.
Often you know which of the two you want at compile time from the pointer type.
In short your complaints are wrong headed in not understanding what
hardware memcpy can do.
On Fri, 11 Oct 2024 16:54:13 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 11/10/2024 14:13, Michael S wrote:
On Fri, 11 Oct 2024 13:37:03 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 10/10/2024 23:19, Brian G. Lucas wrote:
Not applicable.
I don't understand what you mean by that. /What/ is not applicable
to /what/ ?
Brian probably meant to say that that it is not applicable to his
my66k LLVM back end.
But I am pretty sure that what you suggest is applicable, but bad
idea for memcpy/memmove routine that targets Arm+SVE.
Dynamic dispatch based on concrete core features/identification,
i.e. exactly the same mechanism that is done on "non-scalable"
architectures, would provide better performance. And memcpy/memmove
is certainly sufficiently important to justify an additional
development effort.
That explanation helps a little, but only a little. I wasn't
suggesting anything - or if I was, it was several posts ago and the
context has long since been snipped.
You suggested that "scalable" vector extension are preferable for memcpy/memmove implementation over "non-scalable" SIMD.
Can you be more explicit about
what you think I was suggesting, and why it might not be a good idea
for targeting a "my66k" ISA? (That is not a processor I have heard
of, so you'll have to give a brief summary of any particular features
that are relevant here.)
The proper spelling appears to be My 66000.
For starter, My 66000 has no SIMD. It does not even have dedicated FP register file. Both FP and Int share common 32x64bit register space.
More importantly, it has dedicate instruction with exactly the same
semantics as memmove(). Pretty much the same as ARM64. In both cases instruction is defined, but not yet implemented in production silicon.
The difference is that in case of ARM64 we can be reasonably sure that eventually it will be implemented in production silicon. Which means
that in at least several out of multitude of implementations it will
suck.
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library.
An interesting case is the Forth standard. It specifies "contiguous
regions", which correspond to objects in C, but in Forth each address
is a cell and can be added, subtracted, compared, etc. irrespective of
where it came from. So Forth really has a flat-memory model. It has
had that since its origins in the 1970s. Some of the 8086
implementations had some extensions for dealing with more than 64KB,
but they were never standardized and are now mostly forgotten.
Forth does not require a flat memory model in the hardware, as far as I
am aware, any more than C does. (I appreciate that your knowledge of
Forth is /vastly/ greater than mine.) A Forth implementation could >interpret part of the address value as the segment or other memory block >identifier and part of it as an index into that block, just as a C >implementation can.
On Sat, 12 Oct 2024 18:32:48 +0000[snip memory copy instruction]
mitchalsup@aol.com (MitchAlsup1) wrote:
The 3rd Operand can, indeed, be a constant.
That causes no restartability problem when you have a place to
store the current count==index, so that when control returns
and you re-execute MM, it sees that x amount has already been
done, and C-X is left.
I don't understand this paragraph.
Does constant as a 3rd operand cause restartability problem?
Or does it not?
If it does not, then how?
Do you have a private field in thread state? Saved on stack by by
interrupt uCode ?
OS people would not like it. They prefer to have full control even when
they don't use it 99.999% of the time.
On 10/10/2024 20:38, MitchAlsup1 wrote:What you are missing here David is the fact that Mitch's MM is a single instruction which does the entire memmove() operation, and has the
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different objects? >>>>>>> For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,
rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred
while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can
rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can >>>>> implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library >>>>> memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you write
an efficient memmove() in standard C. That's why I said there was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up that
is proportionally more costly for small transfers. Often that can be eliminated when the compiler optimises the functions inline - when the > compiler knows the size of the move/copy, it can optimise directly.
On 12/10/2024 19:26, Bernd Linsel wrote:
On 12.10.24 17:16, David Brown wrote:
[snip rant]
You are aware that this is c.arch, not c.lang.c?
Absolutely, yes.
But in a thread branch discussing C, details of C are relevant.
I don't expect any random regular here to know "language lawyer" details
of the C standards. I don't expect people here to care about them.
People in comp.lang.c care about them - for people here, the main
interest in C is for programs to run on the computer architectures that
are the real interest.
But if someone engages in a conversation about C, I /do/ expect them to understand some basics, and I /do/ expect them to read and think about
what other posters write. The point under discussion was that you
cannot implement an efficient "memmove()" function in fully portable standard C. That's a fact - it is a well-established fact. Another
clear and inarguable fact is that particular ISAs or implementations are completely irrelevant to fully portable standard C - that is both the advantage and the disadvantage of writing code in portable standard C.
All I am asking Mitch to do is to understand this, and to stop saying
silly things (such as implementing memmove() by calling memmove(), or
that the /reason/ you can't implement memmove() efficiently in portable standard C is weaknesses in the x86 ISA), so that we can clear up his misunderstandings and move on to the more interesting computer
architecture discussions.
David Brown <david.brown@hesbynett.no> wrote:<snip>
All I am asking Mitch to do is to understand this, and to stop saying
silly things (such as implementing memmove() by calling memmove(), or
that the /reason/ you can't implement memmove() efficiently in portable
standard C is weaknesses in the x86 ISA), so that we can clear up his
misunderstandings and move on to the more interesting computer
architecture discussions.
My 66000 only has one MM instruction because when you throw enough hardware >at the problem, one instruction is all you need.
And it also covers MemCopy, and yes there is a backwards copy version.
I detailed the hardware to do this several years ago on Real World Tech.
Brett <ggtgp@yahoo.com> writes:
David Brown <david.brown@hesbynett.no> wrote:
<snip>All I am asking Mitch to do is to understand this, and to stop
saying silly things (such as implementing memmove() by calling
memmove(), or that the /reason/ you can't implement memmove()
efficiently in portable standard C is weaknesses in the x86 ISA),
so that we can clear up his misunderstandings and move on to the
more interesting computer architecture discussions.
My 66000 only has one MM instruction because when you throw enough
hardware at the problem, one instruction is all you need.
And it also covers MemCopy, and yes there is a backwards copy
version.
I detailed the hardware to do this several years ago on Real World
Tech.
Such hardware (memcpy/memmove/memfill) was available in 1965 on the
Burroughs medium systems mainframes. In the 80s, support was added
for hashing strings as well.
It's not a new concept. In fact, there were some tricks that could
be used with overlapping source and destination buffers that would
replicate chunks of data).
On Sun, 13 Oct 2024 10:31:49 +0300
Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:
On 2024-10-12 21:33, Brett wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 12/10/2024 01:32, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:
On 11/10/2024 20:55, MitchAlsup1 wrote:
On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:
Do you think you can just write this :
void * memmove(void * s1, const void * s2, size_t n)
{
return memmove(s1, s2, n);
}
in your library's source?
.global memmove
memmove:
MM R2,R1,R3
RET
sure !
You are either totally clueless, or you are trolling. And I
know you are not clueless.
This discussion has become pointless.
The point is that there are a few things that may be hard to do
with {decode, pipeline, calculations, specifications...}; but
because they are so universally needed; these, too, should
"get into ISA".
One good reason to put them in ISA is to preserve the programmers
efforts over decades, so they don't have to re-write libc every-
time a new set of instructions come out.
Moving an arbitrary amount of memory from point a to point b
happens to fall into that universal need. Setting an arbitrary
amount of memory to a value also falls into that universal
need.
Again, I have to ask - do you bother to read the posts you reply
to? Are you interested in replying, and engaging in the
discussion? Or are you just looking for a chance to promote your
own architecture, no matter how tenuous the connection might be to
other posts?
Again, let me say that I agree with what you are saying - I agree
that an ISA should have instructions that are efficient for what
people actually want to do. I agree that it is a good thing to
have instructions that let performance scale with advances in
hardware ideally without needing changes in compiled binaries, and
at least without needing changes in source code.
I believe there is an interesting discussion to be had here, and I
would enjoy hearing about comparisons of different ways things
functions like memcpy() and memset() can be implemented in
different architectures and optimised for different sizes, or how
scalable vector instructions can work in comparison to fixed-size
SIMD instructions.
But at the moment, this potential is lost because you are posting
total shite about implementing memmove() in standard C. It is
disappointing that someone with your extensive knowledge and
experience cannot see this. I am finding it all very frustrating.
[ snip discussion of HW ]
In short your complaints are wrong headed in not understanding what
hardware memcpy can do.
I think your reply proves David's complaint: you did not read, or did
not understand, what David is frustrated about. The true fact that
David is defending is that memmove() cannot be implemented
"efficiently" in /standard/ C source code, on /any/ HW, because it
would require comparing /C pointers/ that point to potentially
different /C objects/, which is not defined behavior in standard C,
whether compiled to machine code, or executed by an interpreter of C
code, or executed by a human programmer performing what was called
"desk testing" in the 1960s.
Obviously memmove() can be implemented efficently in non-standard C
where such pointers can be compared, or by sequences of ordinary ALU
instructions, or by dedicated instructions such as Mitch's MM, and
David is not disputing that. But Mitch seems not to understand or not
to see the issue about standard C vs memmove().
Sufficiently advanced compiler can recognize patterns and replace them
with built-in sequences.
In case of memmove() the most easily recognizable pattern in 100%
standard C99 appears to be:
void *memmove( void *dest, const void *src, size_t count)
{
if (count > 0) {
char tmp[count];
memcpy(tmp, src, count);
memcpy(dest, tmp, count);
}
return dest;
}
I don't suggest that real implementation in Brian's compiler is like
that. Much more likely his implementation uses non-standard C and looks approximately like:
void *memmove(void *dest, const void *src, size_t count {
return __builtin_memmove(dest, src, count);
}
However, implementing the first variant efficiently is well within
abilities of good compiler.
David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different
objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers, >>>>>>> rather than having only a valid pointer or NULL. A compiler, >>>>>>> for example, might want to store the fact that an error occurred >>>>>>> while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can >>>>>>> rely on what application programmers cannot, their implementation >>>>>>> details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can >>>>>> implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library >>>>>> memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said there
was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that
can be eliminated when the compiler optimises the functions inline -
when the compiler knows the size of the move/copy, it can optimise
directly.
What you are missing here David is the fact that Mitch's MM is a single instruction which does the entire memmove() operation, and has the
inside knowledge about cache (residency at level x? width in
bytes)/memory ranges/access rights/etc needed to do so in a very close
to optimal manner, for both short and long transfers.
I.e. totally removing the need for compiler tricks or wide register operations.
Also apropos the compiler library issue:
You start by teaching the compiler about the MM instruction, and to recognize common patterns (just as most compilers already do today), and then the memmove() calls will usually be inlined.
On 13/10/2024 21:21, Terje Mathisen wrote:David, you and Mitch are among my most cherished writers here on c.arch, I really don't think any of us really disagree, it is just that we have
David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:For example, if ISA contains an MM instruction which is the
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different >>>>>>>>> objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers,>>>>>>>> rather than having only a valid pointer or NULL. A compiler,
for example, might want to store the fact that an error occurred>>>>>>>> while parsing a subexpression as a special pointer constant.
Compilers often have the unfair advantage, though, that they can>>>>>>>> rely on what application programmers cannot, their implementation
details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they can >>>>>>> implement an efficient memmove() even though a pure standard C
programmer cannot (other than by simply calling the standard library >>>>>>> memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA. >>>>
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said there >>> was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly or
using inline assembly, rather than in non-portable C (which is the
common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that
can be eliminated when the compiler optimises the functions inline - >>> when the compiler knows the size of the move/copy, it can optimise
directly.
What you are missing here David is the fact that Mitch's MM is a
single instruction which does the entire memmove() operation, and has >> the inside knowledge about cache (residency at level x? width in
bytes)/memory ranges/access rights/etc needed to do so in a very close
to optimal manner, for both short and long transfers.
I am not missing that at all. And I agree that an advanced hardware MM instruction could be a very efficient way to implement both memcpy and > memmove. (For my own kind of work, I'd worry about such looping
instructions causing an unbounded increased in interrupt latency, but
that too is solvable given enough hardware effort.)
And I agree that once you have an "MM" (or similar) instruction, you
don't need to re-write the implementation for your memmove() and
memcpy() library functions for every new generation of processors of a > given target family.
What I /don't/ agree with is the claim that you /do/ need to keep
re-writing your implementations all the time. You will /sometimes/ get benefits from doing so, but it is not as simple as Mitch made out.
I.e. totally removing the need for compiler tricks or wide register
operations.
Also apropos the compiler library issue:
You start by teaching the compiler about the MM instruction, and to
recognize common patterns (just as most compilers already do today),
and then the memmove() calls will usually be inlined.
The original compile library issue was that it is impossible to write an efficient memmove() implementation using pure portable standard C. That
is independent of any ISA, any specialist instructions for memory moves,
and any compiler optimisations. And it is independent of the fact that some good compilers can inline at least some calls to memcpy() and
memmove() today, using whatever instructions are most efficient for the target.
David Brown <david.brown@hesbynett.no> writes:
When would you ever /need/ to compare pointers to different objects?
For almost all C programmers, the answer is "never". Pretty much the
only example people ever give of needing such comparisons is to
implement memmove() efficiently - but you don't need to implement
memmove(), because it is already in the standard library.
When you implements something like, say
vsum(double *a, double *b, double *c, size_t n);
where a, b, and c may point to arrays in different objects, or may
point to overlapping parts of the same object, and the result vector c
in the overlap case should be the same as in the no-overlap case
(similar to memmove()), being able to compare pointers to possibly
different objects also comes in handy.
Another example is when the programmer uses the address as a key in,
e.g., a binary search tree. And, as you write, casting to intptr_t is
not guarenteed to work by the C standard, either.
An example that probably compares pointers to the same object as far
as the C standard is concerned, but feel like different objects to the programmer, is logic variables (in, e.g., a Prolog implementation).
When you have two free variables, and you unify them, in the
implementation one variable points to the other one. Now which should
point to which? The younger variable should point to the older one,
because it will die sooner. How do you know which variable is
younger? You compare the addresses; the variables reside on a stack,
so the younger one is closer to the top.
If that stack is one object as far as the C standard is concerned,
there is no problem with that solution. If the stack is implemented
as several objects (to make it easier growable; I don't know if there
is a Prolog implementation that does that), you first have to check in
which piece it is (maybe with a binary search), and then possibly
compare within the stack piece at hand.
An interesting case is the Forth standard. It specifies "contiguous
regions", which correspond to objects in C, but in Forth each address
is a cell and can be added, subtracted, compared, etc. irrespective of
where it came from. So Forth really has a flat-memory model. It has
had that since its origins in the 1970s. Some of the 8086
implementations had some extensions for dealing with more than 64KB,
but they were never standardized and are now mostly forgotten.
Forth does not require a flat memory model in the hardware, as far as I
am aware, any more than C does. (I appreciate that your knowledge of
Forth is /vastly/ greater than mine.) A Forth implementation could
interpret part of the address value as the segment or other memory block
identifier and part of it as an index into that block, just as a C
implementation can.
I.e., what you are saying is that one can simulate a flat-memory model
on a segmented memory model.
Certainly. In the case of the 8086 (and
even more so on the 286) the costs of that are so high that no
widely-used Forth system went there.
One can also simulate segmented memory (a natural fit for many
programming languages) on flat memory. In this case the cost is much smaller, plus it gives the maximum flexibility about segment/object
sizes and numbers. That is why flat memory has won.
David Brown wrote:
On 13/10/2024 21:21, Terje Mathisen wrote:
David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:For example, if ISA contains an MM instruction which is the
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
When would you ever /need/ to compare pointers to different >>>>>>>>>> objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in pointers, >>>>>>>>> rather than having only a valid pointer or NULL. A compiler, >>>>>>>>> for example, might want to store the fact that an error occurred >>>>>>>>> while parsing a subexpression as a special pointer constant. >>>>>>>>>
Compilers often have the unfair advantage, though, that they can >>>>>>>>> rely on what application programmers cannot, their implementation >>>>>>>>> details. (Some do not, such as f2c).
Standard library authors have the same superpowers, so that they >>>>>>>> can
implement an efficient memmove() even though a pure standard C >>>>>>>> programmer cannot (other than by simply calling the standard
library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of libc
writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the ISA. >>>>>
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let you
write an efficient memmove() in standard C. That's why I said
there was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in assembly
or using inline assembly, rather than in non-portable C (which is
the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things up
that is proportionally more costly for small transfers. Often that >>>> can be eliminated when the compiler optimises the functions inline -
when the compiler knows the size of the move/copy, it can optimise
directly.
What you are missing here David is the fact that Mitch's MM is a
single instruction which does the entire memmove() operation, and has
the inside knowledge about cache (residency at level x? width in
bytes)/memory ranges/access rights/etc needed to do so in a very
close to optimal manner, for both short and long transfers.
I am not missing that at all. And I agree that an advanced hardware
MM instruction could be a very efficient way to implement both memcpy
and memmove. (For my own kind of work, I'd worry about such looping
instructions causing an unbounded increased in interrupt latency, but
that too is solvable given enough hardware effort.)
And I agree that once you have an "MM" (or similar) instruction, you
don't need to re-write the implementation for your memmove() and
memcpy() library functions for every new generation of processors of a
given target family.
What I /don't/ agree with is the claim that you /do/ need to keep
re-writing your implementations all the time. You will /sometimes/
get benefits from doing so, but it is not as simple as Mitch made out.
I.e. totally removing the need for compiler tricks or wide register
operations.
Also apropos the compiler library issue:
You start by teaching the compiler about the MM instruction, and to
recognize common patterns (just as most compilers already do today),
and then the memmove() calls will usually be inlined.
The original compile library issue was that it is impossible to write
an efficient memmove() implementation using pure portable standard C.
That is independent of any ISA, any specialist instructions for memory
moves, and any compiler optimisations. And it is independent of the
fact that some good compilers can inline at least some calls to
memcpy() and memmove() today, using whatever instructions are most
efficient for the target.
David, you and Mitch are among my most cherished writers here on c.arch,
I really don't think any of us really disagree, it is just that we have
been discussing two (mostly) orthogonal issues.
a) memmove/memcpy are so important that people have been spending a lot
of time & effort trying to make it faster, with the complication that in general it cannot be implemented in pure C (which disallows direct comparison of arbitrary pointers).
b) Mitch have, like Andy ("Crazy") Glew many years before, realized that
if a cpu architecture actually has an instruction designed to do this particular job, it behooves cpu architects to make sure that it is in
fact so fast that it obviates any need for tricky coding to replace it.
Ideally, it should be able to copy a single object, up to a cache line
in size, in the same or less time needed to do so manually with a SIMD 512-bit load followed by a 512-bit store (both ops masked to not touch anything it shouldn't)
REP MOVSB on x86 does the canonical memcpy() operation, originally by
moving single bytes, and this was so slow that we also had REP MOVSW
(moving 16-bit entities) and then REP MOVSD on the 386 and REP MOVSQ on 64-bit cpus.
With a suitable chunk of logic, the basic MOVSB operation could in fact handle any kinds of alignments and sizes, while doing the actual
transfer at maximum bus speeds, i.e. at least one cache line/cycle for things already in $L1.
On 14/10/2024 16:40, Terje Mathisen wrote:
David Brown wrote:
On 13/10/2024 21:21, Terje Mathisen wrote:
David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:Standard library authors have the same superpowers, so that
When would you ever /need/ to compare pointers to
different objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in
pointers, rather than having only a valid pointer or
NULL. A compiler, for example, might want to store the >>>>>>>>> fact that an error occurred while parsing a subexpression
as a special pointer constant.
Compilers often have the unfair advantage, though, that
they can rely on what application programmers cannot, their >>>>>>>>> implementation details. (Some do not, such as f2c). >>>>>>>>
they can
implement an efficient memmove() even though a pure standard >>>>>>>> C programmer cannot (other than by simply calling the
standard library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of
libc writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the >>>>>> ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let
you write an efficient memmove() in standard C. That's why I
said there was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in
assembly or using inline assembly, rather than in non-portable C
(which is the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things
up that is proportionally more costly for small transfers.Â
Often that can be eliminated when the compiler optimises the
functions inline - when the compiler knows the size of the
move/copy, it can optimise directly.
What you are missing here David is the fact that Mitch's MM is a
single instruction which does the entire memmove() operation, and
has the inside knowledge about cache (residency at level x? width
in bytes)/memory ranges/access rights/etc needed to do so in a
very close to optimal manner, for both short and long transfers.
I am not missing that at all. And I agree that an advanced
hardware MM instruction could be a very efficient way to implement
both memcpy and memmove. (For my own kind of work, I'd worry
about such looping instructions causing an unbounded increased in
interrupt latency, but that too is solvable given enough hardware
effort.)
And I agree that once you have an "MM" (or similar) instruction,
you don't need to re-write the implementation for your memmove()
and memcpy() library functions for every new generation of
processors of a given target family.
What I /don't/ agree with is the claim that you /do/ need to keep
re-writing your implementations all the time. You will
/sometimes/ get benefits from doing so, but it is not as simple as
Mitch made out.
I.e. totally removing the need for compiler tricks or wide
register operations.
Also apropos the compiler library issue:
You start by teaching the compiler about the MM instruction, and
to recognize common patterns (just as most compilers already do
today), and then the memmove() calls will usually be inlined.
The original compile library issue was that it is impossible to
write an efficient memmove() implementation using pure portable
standard C. That is independent of any ISA, any specialist
instructions for memory moves, and any compiler optimisations.
And it is independent of the fact that some good compilers can
inline at least some calls to memcpy() and memmove() today, using
whatever instructions are most efficient for the target.
David, you and Mitch are among my most cherished writers here on
c.arch, I really don't think any of us really disagree, it is just
that we have been discussing two (mostly) orthogonal issues.
I agree. It's a "god dag mann, økseskaft" situation.
I have a huge respect for Mitch, his knowledge and experience, and
his willingness to share that freely with others. That's why I have
found this very frustrating.
No, that's not true. And according to my understanding, that's not whata) memmove/memcpy are so important that people have been spending a
lot of time & effort trying to make it faster, with the
complication that in general it cannot be implemented in pure C
(which disallows direct comparison of arbitrary pointers).
Yes.
(Unlike memmov(), memcpy() can be implemented in standard C as a
simple byte-copy loop, without needing to compare pointers. But an implementation that copies in larger blocks than a byte requires implementation dependent behaviour to determine alignments, or it
must rely on unaligned accesses being allowed by the implementation.)
b) Mitch have, like Andy ("Crazy") Glew many years before, realized
that if a cpu architecture actually has an instruction designed to
do this particular job, it behooves cpu architects to make sure
that it is in fact so fast that it obviates any need for tricky
coding to replace it.
Yes.
Ideally, it should be able to copy a single object, up to a cache
line in size, in the same or less time needed to do so manually
with a SIMD 512-bit load followed by a 512-bit store (both ops
masked to not touch anything it shouldn't)
Yes.
REP MOVSB on x86 does the canonical memcpy() operation, originally
by moving single bytes, and this was so slow that we also had REP
MOVSW (moving 16-bit entities) and then REP MOVSD on the 386 and
REP MOVSQ on 64-bit cpus.
With a suitable chunk of logic, the basic MOVSB operation could in
fact handle any kinds of alignments and sizes, while doing the
actual transfer at maximum bus speeds, i.e. at least one cache
line/cycle for things already in $L1.
I agree on all of that.
I am quite happy with the argument that suitable hardware can do
these basic operations faster than a software loop or the x86 "rep" instructions.
And I fully agree that these would be useful featuresYou are moving a goalpost.
in general-purpose processors.
My only point of contention is that the existence or lack of such instructions does not make any difference to whether or not you can
write a good implementation of memcpy() or memmove() in portable
standard C.
They would make it easier to write efficient
implementations of these standard library functions for targets that
had such instructions - but that would be implementation-specific
code. And that is one of the reasons that C standard library
implementations are tied to the specific compiler and target, and the
writers of these libraries have "superpowers" and are not limited to
standard C.
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard way to compare independent pointers (other than just for equality). Rarely
needing something does not mean /never/ needing it.
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for equality).
Rarely needing something does not mean /never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard way to
compare independent pointers (other than just for equality). Rarely
needing something does not mean /never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
The Algol family of block structure gave the illusion that flat was less necessary and it could all be done with lexical address-
ing and block scoping rules.
Then malloc() and mmap() came along.
On Mon, 14 Oct 2024 19:02:51 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for equality).
Rarely needing something does not mean /never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
That's their problem. The rest of the C world shouldn't suffer because
of odd birds.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard way to
compare independent pointers (other than just for equality). Rarely
needing something does not mean /never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
Depends. On the Burroughs mainframe there could be eight
active segments and the segment number was part of the pointer.
Pointers were 32-bits (actually 8 BCD digits)
According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
If you look at the 8086 manuals, that's clearly what they had in mind.
What I don't get is that the 286's segment stuff was so slow.
It had to load the whole segment descriptor from RAM and possibly
perform some additional setup.
Right, and they appeared not to care or realize it was a performance
problem.
On Mon, 14 Oct 2024 19:20:42 +0000, Michael S wrote:
On Mon, 14 Oct 2024 19:02:51 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/ needing
it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
That's their problem. The rest of the C world shouldn't suffer
because of odd birds.
So, you are saying that 286 in its hey-day was/is odd ?!?
On Mon, 14 Oct 2024 17:19:40 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 14/10/2024 16:40, Terje Mathisen wrote:
David Brown wrote:
On 13/10/2024 21:21, Terje Mathisen wrote:
David Brown wrote:
On 10/10/2024 20:38, MitchAlsup1 wrote:
On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
On 09/10/2024 23:37, MitchAlsup1 wrote:
On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
On 09/10/2024 20:10, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:Standard library authors have the same superpowers, so that >>>>>>>>>> they can
When would you ever /need/ to compare pointers to
different objects?
For almost all C programmers, the answer is "never".
Sometimes, it is handy to encode certain conditions in
pointers, rather than having only a valid pointer or
NULL. A compiler, for example, might want to store the >>>>>>>>>>> fact that an error occurred while parsing a subexpression >>>>>>>>>>> as a special pointer constant.
Compilers often have the unfair advantage, though, that
they can rely on what application programmers cannot, their >>>>>>>>>>> implementation details. (Some do not, such as f2c). >>>>>>>>>>
implement an efficient memmove() even though a pure standard >>>>>>>>>> C programmer cannot (other than by simply calling the
standard library
memmove() function!).
This is more a symptom of bad ISA design/evolution than of
libc writers needing superpowers.
No, it is not. It has absolutely /nothing/ to do with the >>>>>>>> ISA.
For example, if ISA contains an MM instruction which is the
embodiment of memmove() then absolutely no heroics are needed
of desired in the libc call.
The existence of a dedicated assembly instruction does not let
you write an efficient memmove() in standard C. That's why I
said there was no connection between the two concepts.
For some targets, it can be helpful to write memmove() in
assembly or using inline assembly, rather than in non-portable C
(which is the common case).
Thus, it IS a symptom of ISA evolution that one has to rewrite
memmove() every time wider SIMD registers are available.
It is not that simple.
There can often be trade-offs between the speed of memmove() and
memcpy() on large transfers, and the overhead in setting things
up that is proportionally more costly for small transfers.Â
Often that can be eliminated when the compiler optimises the
functions inline - when the compiler knows the size of the
move/copy, it can optimise directly.
What you are missing here David is the fact that Mitch's MM is a
single instruction which does the entire memmove() operation, and
has the inside knowledge about cache (residency at level x? width
in bytes)/memory ranges/access rights/etc needed to do so in a
very close to optimal manner, for both short and long transfers.
I am not missing that at all. And I agree that an advanced
hardware MM instruction could be a very efficient way to implement
both memcpy and memmove. (For my own kind of work, I'd worry
about such looping instructions causing an unbounded increased in
interrupt latency, but that too is solvable given enough hardware
effort.)
And I agree that once you have an "MM" (or similar) instruction,
you don't need to re-write the implementation for your memmove()
and memcpy() library functions for every new generation of
processors of a given target family.
What I /don't/ agree with is the claim that you /do/ need to keep
re-writing your implementations all the time. You will
/sometimes/ get benefits from doing so, but it is not as simple as
Mitch made out.
I.e. totally removing the need for compiler tricks or wide
register operations.
Also apropos the compiler library issue:
You start by teaching the compiler about the MM instruction, and
to recognize common patterns (just as most compilers already do
today), and then the memmove() calls will usually be inlined.
The original compile library issue was that it is impossible to
write an efficient memmove() implementation using pure portable
standard C. That is independent of any ISA, any specialist
instructions for memory moves, and any compiler optimisations.
And it is independent of the fact that some good compilers can
inline at least some calls to memcpy() and memmove() today, using
whatever instructions are most efficient for the target.
David, you and Mitch are among my most cherished writers here on
c.arch, I really don't think any of us really disagree, it is just
that we have been discussing two (mostly) orthogonal issues.
I agree. It's a "god dag mann, økseskaft" situation.
I have a huge respect for Mitch, his knowledge and experience, and
his willingness to share that freely with others. That's why I have
found this very frustrating.
a) memmove/memcpy are so important that people have been spending a
lot of time & effort trying to make it faster, with the
complication that in general it cannot be implemented in pure C
(which disallows direct comparison of arbitrary pointers).
Yes.
(Unlike memmov(), memcpy() can be implemented in standard C as a
simple byte-copy loop, without needing to compare pointers. But an
implementation that copies in larger blocks than a byte requires
implementation dependent behaviour to determine alignments, or it
must rely on unaligned accesses being allowed by the implementation.)
b) Mitch have, like Andy ("Crazy") Glew many years before, realized
that if a cpu architecture actually has an instruction designed to
do this particular job, it behooves cpu architects to make sure
that it is in fact so fast that it obviates any need for tricky
coding to replace it.
Yes.
Ideally, it should be able to copy a single object, up to a cache
line in size, in the same or less time needed to do so manually
with a SIMD 512-bit load followed by a 512-bit store (both ops
masked to not touch anything it shouldn't)
Yes.
REP MOVSB on x86 does the canonical memcpy() operation, originally
by moving single bytes, and this was so slow that we also had REP
MOVSW (moving 16-bit entities) and then REP MOVSD on the 386 and
REP MOVSQ on 64-bit cpus.
With a suitable chunk of logic, the basic MOVSB operation could in
fact handle any kinds of alignments and sizes, while doing the
actual transfer at maximum bus speeds, i.e. at least one cache
line/cycle for things already in $L1.
I agree on all of that.
I am quite happy with the argument that suitable hardware can do
these basic operations faster than a software loop or the x86 "rep"
instructions.
No, that's not true. And according to my understanding, that's not what
Terje wrote.
REP MOVSB _is_ almost ideal instruction for memcpy (modulo minor
details - fixed registers for src, dest, len and Direction flag in PSW instead of being part of the opcode).
REP MOVSW/D/Q were introduced because back then processors were small
and stupid. When your processor is big and smart you don't need them
any longer. REP MOVSB is sufficient.
New Arm64 instruction that are hopefully coming next year are akin to
REP MOVSB rather than to MOVSW/D/Q.
Instructions for memmove, also defined by Arm and by Mitch, is the next logical step. IMHO, the main gain here is not measurable improvement in performance, but saving of code size when inlined.
Now, is all that a good idea?
I am not 100% convinced.
One can argue that streaming alignment hardware that is necessary for 1st-class implementation of these instructions is useful not only for
memory copy.
So, may be, it makes sense to expose this hardware in more generic ways.
May be, via Load Multiple Register? It was present in Arm's A32/T32,
but didn't make it into ARM64. Or, may be, there are even better ways
that I was not thinking about.
And I fully agree that these would be useful features
in general-purpose processors.
My only point of contention is that the existence or lack of such
instructions does not make any difference to whether or not you can
write a good implementation of memcpy() or memmove() in portable
standard C.
You are moving a goalpost.
One does not need "good implementation" in a sense you have in mind.
All one needs is an implementation that pattern matching logic of
compiler unmistakably recognizes as memove/memcpy. That is very easily
done in standard C. For memmove, I had shown how to do it in one of the
posts below. For memcpy its very obvious, so no need to show.
They would make it easier to write efficient
implementations of these standard library functions for targets that
had such instructions - but that would be implementation-specific
code. And that is one of the reasons that C standard library
implementations are tied to the specific compiler and target, and the
writers of these libraries have "superpowers" and are not limited to
standard C.
According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
If you look at the 8086 manuals, that's clearly what they had in
mind.
What I don't get is that the 286's segment stuff was so slow.
It had to load the whole segment descriptor from RAM and possibly
perform some additional setup.
Right, and they appeared not to care or realize it was a performance
problem.
They didn't even do obvious things like see if you're reloading the
same value into the segment register and skip the rest of the setup.
Sure, you could put checks in your code and skip the segment load but
that would make your code a lot bigger and uglier.
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard way to
compare independent pointers (other than just for equality). Rarely
needing something does not mean /never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
On 14/10/2024 21:02, MitchAlsup1 wrote:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/ needing
it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
void * p = ...
void * q = ...
uintptr_t pu = (uintptr_t) p;
uintptr_t qu = (uintptr_t) q;
if (pu > qu) {
...
} else if (pu < qu) {
...
} else {
...
}
If your comparison needs to actually match up with the real virtual addresses, then this will not work. But does that actually matter?
Think about using this comparison for memmove().
Consider where these pointers come from. Maybe they are pointers to statically allocated data. Then you would expect the segment to be
the same in each case, and the uintptr_t comparison will be fine for memmove(). Maybe they come from malloc() and are in different
segments. Then the comparison here might not give the same result as
a full virtual address comparison - but that does not matter. If the pointers came from different mallocs, they could not overlap and
memmove() can run either direction.
The same applies to other uses, such as indexing in a binary search
tree or a hash map - the comparison above will be correct when it
matters.
On Tue, 15 Oct 2024 12:38:40 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 14/10/2024 21:02, MitchAlsup1 wrote:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/ needing
it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
void * p = ...
void * q = ...
uintptr_t pu = (uintptr_t) p;
uintptr_t qu = (uintptr_t) q;
if (pu > qu) {
...
} else if (pu < qu) {
...
} else {
...
}
If your comparison needs to actually match up with the real virtual
addresses, then this will not work. But does that actually matter?
Think about using this comparison for memmove().
Consider where these pointers come from. Maybe they are pointers to
statically allocated data. Then you would expect the segment to be
the same in each case, and the uintptr_t comparison will be fine for
memmove(). Maybe they come from malloc() and are in different
segments. Then the comparison here might not give the same result as
a full virtual address comparison - but that does not matter. If the
pointers came from different mallocs, they could not overlap and
memmove() can run either direction.
The same applies to other uses, such as indexing in a binary search
tree or a hash map - the comparison above will be correct when it
matters.
It's all fine for as long as there are no objects bigger than 64KB.
But with 16MB of virtual memory and with several* MB of physical memory
one does want objects that are bigger than 64KB!
On 15/10/2024 13:22, Michael S wrote:
On Tue, 15 Oct 2024 12:38:40 +0200
David Brown <david.brown@hesbynett.no> wrote:
On 14/10/2024 21:02, MitchAlsup1 wrote:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/ needing
it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
void * p = ...
void * q = ...
uintptr_t pu = (uintptr_t) p;
uintptr_t qu = (uintptr_t) q;
if (pu > qu) {
...
} else if (pu < qu) {
...
} else {
...
}
If your comparison needs to actually match up with the real virtual
addresses, then this will not work. But does that actually matter?
Think about using this comparison for memmove().
Consider where these pointers come from. Maybe they are pointers to
statically allocated data. Then you would expect the segment to be
the same in each case, and the uintptr_t comparison will be fine for
memmove(). Maybe they come from malloc() and are in different
segments. Then the comparison here might not give the same result as
a full virtual address comparison - but that does not matter. If the
pointers came from different mallocs, they could not overlap and
memmove() can run either direction.
The same applies to other uses, such as indexing in a binary search
tree or a hash map - the comparison above will be correct when it
matters.
It's all fine for as long as there are no objects bigger than 64KB.
But with 16MB of virtual memory and with several* MB of physical memory
one does want objects that are bigger than 64KB!
I don't know how such objects would be allocated and addressed in such a system. (I didn't do much DOS/Win16 programming, and on the few
occasions when I needed structures bigger than 64KB in total, they were structured in multiple levels.)
But I would expect that in almost any practical system where you can use "p++" to step through big arrays, you can also convert the pointer to a uintptr_t and compare as shown above.
The exceptions would be systems where pointers hold more than just addresses, such as access control information or bounds that mean they
are larger than the largest integer type on the target.
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
In an ideal world, it would be better if we could define `malloc` and `memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
Stefan--- Synchronet 3.20a-Linux NewsLink 1.114
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it
entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it
entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
It still is part of the ISO C standard.
https://pubs.opengroup.org/onlinepubs/9799919799/functions/malloc.html--- Synchronet 3.20a-Linux NewsLink 1.114
POSIX adds some extensions (marked 'CX').
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
In an ideal world, it would be better if we could define `malloc` and `memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it
entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
I don't see an advantage in being able to implement them in standard C.There is an advantage to the C approach of separating out someIt goes a bit further: for a general purpose language, any existing
facilities and supplying them only in the standard library.
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
The reason why you might want your own special memmove, or your own special malloc, is that you are doing niche and specialised software.
On Tue, 15 Oct 2024 22:05:56 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it >>>entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
It still is part of the ISO C standard.
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C.
I don't see an advantage in being able to implement them in standard C.There is an advantage to the C approach of separating out someIt goes a bit further: for a general purpose language, any existing
facilities and supplying them only in the standard library.
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
It means you can likely also implement a related yet different API
without having your code "demoted" to non-standard.
E.g. say if your application wants to use a region/pool/zone-based
memory management.
The fact that malloc can't be implemented in standard C is evidence
that standard C may not be general-purpose enough to accommodate an application that wants to use a custom-designed allocator.
I don't disagree with you, from a practical perspective:
- in practice, C serves us well for Emacs's GC, even though that can't
be written in standard C.
- it's not like there are lots of other languages out there that offer
you portability together with the ability to define your own `malloc`.
But it's still a weakness, just a fairly minor one.
The reason why you might want your own special memmove, or your own special >> malloc, is that you are doing niche and specialised software.
Region/pool/zone-based memory management is common enough that I would
not call it "niche", FWIW, and it's also used in applications that do want portability (GCC and Apache come to mind).
Can't think of a practical reason to implement my own `memove`, OTOH.
Stefan
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C. I am not asking
if it is still in the std libraries, I am asking what happened
to make it impossible to write malloc() in std. C ?!?
MitchAlsup1 <mitchalsup@aol.com> schrieb:
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C. I am not asking
if it is still in the std libraries, I am asking what happened
to make it impossible to write malloc() in std. C ?!?
You need to reserve memory by some way from the operating system,
which is, by necessity, outside of the scope of C (via brk(),
GETMAIN, mmap() or whatever).
But more problematic is the implementation of free() without
knowing how to compare pointers.
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 22:05:56 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of >>>>> a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it >>>>entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and >>>>> `memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
It still is part of the ISO C standard.
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C.
K&R may have been 'de facto' standard C, but not 'de jure'.
Unix V6 malloc used the 'brk' system call to allocate space
for the heap. Later versions used 'sbrk'.
Those are both kernel system calls.
It's a very good philosophy in programming language design that the core >language should only contain what it has to contain - if a desired
feature can be put in a library and be equally efficient and convenient
to use, then it should be in the standard library, not the core
language. It is much easier to develop, implement, enhance, adapt, and >otherwise change things in libraries than the core language.
And it is also fine, IMHO, that some things in the standard library need >non-standard C - the standard library is part of the implementation.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
The function has always been available in C since the language was >standardised, and AFAIK it was in K&R C. But no one (in authority) ever >claimed it could be implemented purely in standard C. What do you think
has changed?
MitchAlsup1 <mitchalsup@aol.com> schrieb:
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C. I am not asking
if it is still in the std libraries, I am asking what happened
to make it impossible to write malloc() in std. C ?!?
You need to reserve memory by some way from the operating system,
which is, by necessity, outside of the scope of C (via brk(),
GETMAIN, mmap() or whatever).
But more problematic is the implementation of free() without knowing
how to compare pointers.
On Wed, 16 Oct 2024 20:00:27 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
The paragraaph with 3 >'s indicates malloc() cannot be written in
standard C. It used to be written in standard K&R C. I am not
asking if it is still in the std libraries, I am asking what
happened to make it impossible to write malloc() in standard C ?!?
You need to reserve memory by some way from the operating system,
which is, by necessity, outside of the scope of C (via brk(),
GETMAIN, mmap() or whatever).
Agreed, but once you HAVE a way of getting memory (by whatever name)
you can write malloc in standard C.
On Tue, 15 Oct 2024 22:05:56 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any
existing functionality that cannot be written using the language
is a sign of a weakness because it shows that despite being
"general purpose" it fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT need to have it
entirely built-into the language.
In an ideal world, it would be better if we could define `malloc`
and `memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be standard K&R C--what dropped if from the
standard??
It still is part of the ISO C standard.
The paragraph with 3 >'s indicates malloc() cannot be written in
standard C. It used to be written in standard K&R C.
I am not
asking if it is still in the standard libraries, I am asking what
happened to make it impossible to write malloc() in standard C ?!?
On Wed, 16 Oct 2024 15:38:47 GMT, scott@slp53.sl.home (Scott Lurndal)
wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 22:05:56 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
malloc() used to be standard K&R C--what dropped it from the
standard ??
It still is part of the ISO C standard.
The paragraaph with 3 >'s indicates malloc() cannot be written in
standard C. It used to be written in standard K&R C.
K&R may have been 'de facto' standard C, but not 'de jure'.
Unix V6 malloc used the 'brk' system call to allocate space
for the heap. Later versions used 'sbrk'.
Those are both kernel system calls.
Yes, but malloc() subdivides an already provided space.
Because that space can be treated as a single array of char,
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing
functionality that cannot be written using the language is a sign of
a weakness because it shows that despite being "general purpose" it
fails to cover this specific "purpose".
On Wed, 16 Oct 2024 15:38:47 GMT, scott@slp53.sl.home (Scott Lurndal)
wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 22:05:56 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 15 Oct 2024 21:26:29 +0000, Stefan Monnier wrote:
There is an advantage to the C approach of separating out some
facilities and supplying them only in the standard library.
It goes a bit further: for a general purpose language, any existing >>>>>> functionality that cannot be written using the language is a sign of >>>>>> a weakness because it shows that despite being "general purpose" it >>>>>> fails to cover this specific "purpose".
One of the key ways C got into the minds of programmers was that
one could write stuff like printf() in C and NOT needd to have it
entirely built-into the language.
In an ideal world, it would be better if we could define `malloc` and >>>>>> `memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
It still is part of the ISO C standard.
The paragraaph with 3 >'s indicates malloc() cannot be written
in std. C. It used to be written in std. K&R C.
K&R may have been 'de facto' standard C, but not 'de jure'.
Unix V6 malloc used the 'brk' system call to allocate space
for the heap. Later versions used 'sbrk'.
Those are both kernel system calls.
Yes, but malloc() subdivides an already provided space. Because that
space can be treated as a single array of char, and comparing pointers
to elements of the same array is legal, the only thing I can see that prevents writing malloc() in standard C would be the need to somhow
define the array from the /language's/ POV (not the compiler's) prior
to using it.
On Wed, 16 Oct 2024 09:38:20 +0200, David Brown
<david.brown@hesbynett.no> wrote:
It's a very good philosophy in programming language design that the core
language should only contain what it has to contain - if a desired
feature can be put in a library and be equally efficient and convenient
to use, then it should be in the standard library, not the core
language. It is much easier to develop, implement, enhance, adapt, and
otherwise change things in libraries than the core language.
And it is also fine, IMHO, that some things in the standard library need
non-standard C - the standard library is part of the implementation.
But it is a problem if the library has to be written using a different compiler. [For this purpose I would consider specifying different
compiler flags to be using a different compiler.]
Why? Because once these things are discovered, many programmers will
see their advantages and lack the discipline to avoid using them for
more general application work.
In an ideal world, it would be better if we could define `malloc` and
`memmove` efficiently in standard C, but at least they can be
implemented in non-standard C.
malloc() used to be std. K&R C--what dropped if from the std ??
The function has always been available in C since the language was
standardised, and AFAIK it was in K&R C. But no one (in authority) ever
claimed it could be implemented purely in standard C. What do you think
has changed?
On Mon, 14 Oct 2024 17:19:40 +0200
David Brown <david.brown@hesbynett.no> wrote:
My only point of contention is that the existence or lack of such
instructions does not make any difference to whether or not you can
write a good implementation of memcpy() or memmove() in portable
standard C.
You are moving a goalpost.
One does not need "good implementation" in a sense you have in mind.
All one needs is an implementation that pattern matching logic of
compiler unmistakably recognizes as memove/memcpy. That is very easily
done in standard C. For memmove, I had shown how to do it in one of the
posts below. For memcpy its very obvious, so no need to show.
On Mon, 14 Oct 2024 19:39:41 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/ needing
it.
OK, take a segmented memory model with 16-bit pointers and a 24-bit
virtual address space. How do you actually compare to segmented
pointers ??
Depends. On the Burroughs mainframe there could be eight
active segments and the segment number was part of the pointer.
Pointers were 32-bits (actually 8 BCD digits)
S s OOOOOO
Where 'S' was a sign digit (C or D), 's' was the
segment number (0-7) and OOOOOO was the six digit
offset within the segment (500kB/1000kD each).
A particular task (process) could have up to
one million "environments", each environment
could have up to 100 "memory areas (up to 1000kD)
of which the first eight were loaded into the
processor base/limit registers. Index registers
were 8 digits and were loaded with a pointer as
described above. Operands could optionally select
one of the index registers and the operand address
was treated as an offset to the index register;
there were 7 index registers.
Access to memory areas 8-99 use string instructions
where the pointer was 16 BCD digits:
EEEEEEMM SsOOOOOO
Where EEEEEE was the evironment number (0-999999);
environments starting with D00000 were reserved for
the MCP (Operating System). MM was the memory area
number and the remaining eight digits described the
data within the memory area. A subroutine call could
call within a memory area or switch to a new environment.
Memory area 1 was the code region for the segment,
Memory area 0 held the stack and some global variables
and was typically shared by all environments.
Memory areas 2-7 were application dependent and could
be configured to be shared between environments at
link time.
What was the size of phiscal address space ?
I would suppose, more than 1,000,000 words?
Michael S <already5chosen@yahoo.com> writes:
On Mon, 14 Oct 2024 19:39:41 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/
needing it.
OK, take a segmented memory model with 16-bit pointers and a
24-bit virtual address space. How do you actually compare to
segmented pointers ??
Depends. On the Burroughs mainframe there could be eight
active segments and the segment number was part of the pointer.
Pointers were 32-bits (actually 8 BCD digits)
S s OOOOOO
Where 'S' was a sign digit (C or D), 's' was the
segment number (0-7) and OOOOOO was the six digit
offset within the segment (500kB/1000kD each).
A particular task (process) could have up to
one million "environments", each environment
could have up to 100 "memory areas (up to 1000kD)
of which the first eight were loaded into the
processor base/limit registers. Index registers
were 8 digits and were loaded with a pointer as
described above. Operands could optionally select
one of the index registers and the operand address
was treated as an offset to the index register;
there were 7 index registers.
Access to memory areas 8-99 use string instructions
where the pointer was 16 BCD digits:
EEEEEEMM SsOOOOOO
Where EEEEEE was the evironment number (0-999999);
environments starting with D00000 were reserved for
the MCP (Operating System). MM was the memory area
number and the remaining eight digits described the
data within the memory area. A subroutine call could
call within a memory area or switch to a new environment.
Memory area 1 was the code region for the segment,
Memory area 0 held the stack and some global variables
and was typically shared by all environments.
Memory areas 2-7 were application dependent and could
be configured to be shared between environments at
link time.
What was the size of phiscal address space ?
I would suppose, more than 1,000,000 words?
It varied based on the generation. In the
1960s, a half megabyte (10^6 digits)
was the limit.
In the 1970s, the architecture supported
10^8 digits, the largest B4800 systems
were shipped with 2 million digits (1MB).
In 1979, the B4900 was introduced supporting
up to 10MB (20 MD), later increased to
20MB/40MD.
In the 1980s, the largest systems (V500)
supported up to 10^9 digits. It
was that generation of machine where the
environment scheme was introduced.
Binaries compiled in 1966 ran on all
generations without recompilation.
There was room in the segmentation structures
for up to 10^18 digit physical addresses
(where the segments were aligned on 10^3
digit boundaries).
Unisys discontinued that line of systems in 1992.
On Fri, 18 Oct 2024 14:06:17 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 14 Oct 2024 19:39:41 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully standard
way to compare independent pointers (other than just for
equality). Rarely needing something does not mean /never/
needing it.
OK, take a segmented memory model with 16-bit pointers and a
24-bit virtual address space. How do you actually compare to
segmented pointers ??
Depends. On the Burroughs mainframe there could be eight
active segments and the segment number was part of the pointer.
Pointers were 32-bits (actually 8 BCD digits)
S s OOOOOO
Where 'S' was a sign digit (C or D), 's' was the
segment number (0-7) and OOOOOO was the six digit
offset within the segment (500kB/1000kD each).
A particular task (process) could have up to
one million "environments", each environment
could have up to 100 "memory areas (up to 1000kD)
of which the first eight were loaded into the
processor base/limit registers. Index registers
were 8 digits and were loaded with a pointer as
described above. Operands could optionally select
one of the index registers and the operand address
was treated as an offset to the index register;
there were 7 index registers.
Access to memory areas 8-99 use string instructions
where the pointer was 16 BCD digits:
EEEEEEMM SsOOOOOO
Where EEEEEE was the evironment number (0-999999);
environments starting with D00000 were reserved for
the MCP (Operating System). MM was the memory area
number and the remaining eight digits described the
data within the memory area. A subroutine call could
call within a memory area or switch to a new environment.
Memory area 1 was the code region for the segment,
Memory area 0 held the stack and some global variables
and was typically shared by all environments.
Memory areas 2-7 were application dependent and could
be configured to be shared between environments at
link time.
What was the size of phiscal address space ?
I would suppose, more than 1,000,000 words?
It varied based on the generation. In the
1960s, a half megabyte (10^6 digits)
was the limit.
In the 1970s, the architecture supported
10^8 digits, the largest B4800 systems
were shipped with 2 million digits (1MB).
In 1979, the B4900 was introduced supporting
up to 10MB (20 MD), later increased to
20MB/40MD.
In the 1980s, the largest systems (V500)
supported up to 10^9 digits. It
was that generation of machine where the
environment scheme was introduced.
Binaries compiled in 1966 ran on all
generations without recompilation.
There was room in the segmentation structures
for up to 10^18 digit physical addresses
(where the segments were aligned on 10^3
digit boundaries).
So, can it be said that ar least some of B6500-compatible models
suffered from the same problem as 80286 - the segment of maximal size
didn't cover all linear (or physical) address space?
Or their index register width was increased to accomodate 1e9 digits in
the single segment?
Unisys discontinued that line of systems in 1992.
I thought it lasted longer. My impresion was that there were still
hardware implemntation (alongside with emulation on Xeons) sold up
until 15 years ago.
I don't see an advantage in being able to implement them in standard C.
I /do/ see an advantage in being able to do so well in non-standard, implementation-specific C.
The reason why you might want your own special memmove, or your own
special malloc, is that you are doing niche and specialised software.
For example, you might be making real-time software and require specific time constraints on these functions. In such cases, you are not
interested in writing fully portable software - it will already contain
many implementation-specific features or use compiler extensions.
On 16/10/2024 08:21, David Brown wrote:
I have a vague feeling that once upon a time I wrote a malloc for an embedded system. Having only one process it had access to the entire
I don't see an advantage in being able to implement them in standard
C. I /do/ see an advantage in being able to do so well in
non-standard, implementation-specific C.
The reason why you might want your own special memmove, or your own
special malloc, is that you are doing niche and specialised software.
For example, you might be making real-time software and require
specific time constraints on these functions. In such cases, you are
not interested in writing fully portable software - it will already
contain many implementation-specific features or use compiler extensions.
memory range, and didn't need to talk to the OS. Entirely C is quite feasible there.
But memmove? On an 80286 it will be using rep movsw, rather than a
software loop, to copy the memory contents to the new location.
_That_ does require assembler, or compiler extensions, not standard C.
Michael S <already5chosen@yahoo.com> writes:
On Fri, 18 Oct 2024 14:06:17 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 14 Oct 2024 19:39:41 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:
On 13/10/2024 17:45, Anton Ertl wrote:
I do think it would be convenient if there were a fully
standard way to compare independent pointers (other than
just for equality). Rarely needing something does not mean
/never/ needing it.
OK, take a segmented memory model with 16-bit pointers and a
24-bit virtual address space. How do you actually compare to
segmented pointers ??
Depends. On the Burroughs mainframe there could be eight
active segments and the segment number was part of the pointer.
Pointers were 32-bits (actually 8 BCD digits)
S s OOOOOO
Where 'S' was a sign digit (C or D), 's' was the
segment number (0-7) and OOOOOO was the six digit
offset within the segment (500kB/1000kD each).
A particular task (process) could have up to
one million "environments", each environment
could have up to 100 "memory areas (up to 1000kD)
of which the first eight were loaded into the
processor base/limit registers. Index registers
were 8 digits and were loaded with a pointer as
described above. Operands could optionally select
one of the index registers and the operand address
was treated as an offset to the index register;
there were 7 index registers.
Access to memory areas 8-99 use string instructions
where the pointer was 16 BCD digits:
EEEEEEMM SsOOOOOO
Where EEEEEE was the evironment number (0-999999);
environments starting with D00000 were reserved for
the MCP (Operating System). MM was the memory area
number and the remaining eight digits described the
data within the memory area. A subroutine call could
call within a memory area or switch to a new environment.
Memory area 1 was the code region for the segment,
Memory area 0 held the stack and some global variables
and was typically shared by all environments.
Memory areas 2-7 were application dependent and could
be configured to be shared between environments at
link time.
What was the size of phiscal address space ?
I would suppose, more than 1,000,000 words?
It varied based on the generation. In the
1960s, a half megabyte (10^6 digits)
was the limit.
In the 1970s, the architecture supported
10^8 digits, the largest B4800 systems
were shipped with 2 million digits (1MB).
In 1979, the B4900 was introduced supporting
up to 10MB (20 MD), later increased to
20MB/40MD.
In the 1980s, the largest systems (V500)
supported up to 10^9 digits. It
was that generation of machine where the
environment scheme was introduced.
Binaries compiled in 1966 ran on all
generations without recompilation.
There was room in the segmentation structures
for up to 10^18 digit physical addresses
(where the segments were aligned on 10^3
digit boundaries).
So, can it be said that ar least some of B6500-compatible models
No. The systems I described above are from the medium
systems family (B2000/B3000/B4000).
The B5000/B6000/B7000
(large) family systems were a completely different stack based
architecture with a 48-bit word size. The Small systems (B1000)
supported task-specific dynamic microcode loading (different
microcode for a cobol app vs. a fortran app).
Medium systems evolved from the Electrodata Datatron and 220 (1954)
through the Burroughs B300 to the Burroughs B3500 by 1965. The B5000
was also developed at the old Electrodata plant in Pasadena
(where I worked in the 80s) - eventually large systems moved
out - the more capable large systems (B7XXX) were designed in
Tredyffrin Pa, the less capable large systems (B5XXX) were designed
in Mission Viejo, Ca.
suffered from the same problem as 80286 - the segment of maximal size >didn't cover all linear (or physical) address space?
Or their index register width was increased to accomodate 1e9 digits
in the single segment?
Unisys discontinued that line of systems in 1992.
I thought it lasted longer. My impresion was that there were still
hardware implemntation (alongside with emulation on Xeons) sold up
until 15 years ago.
Large systems still exist today in emulation[*], as do the
former Univac (Sperry 2200) systems. The last medium system
(V380) was retired by the City of Santa Ana in 2010 (almost two
decades after Unisys cancelled the product line) and was moved
to the Living Computer Museum.
City of Santa Ana replaced the single 1980 vintage V380 with
29 windows servers.
After the merger of Burroughs and Sperry in '86 there were six
different mainframe architectures - by 1990, all but
two (2200 and large systems) had been terminated.
[*] Clearpath Libra https://www.unisys.com/client-education/clearpath-forward-libra-servers/
On 18/10/2024 18:38, Vir Campestris wrote:
On 16/10/2024 08:21, David Brown wrote:
I have a vague feeling that once upon a time I wrote a malloc for an
I don't see an advantage in being able to implement them in standard
C. I /do/ see an advantage in being able to do so well in non-
standard, implementation-specific C.
The reason why you might want your own special memmove, or your own
special malloc, is that you are doing niche and specialised software.
For example, you might be making real-time software and require
specific time constraints on these functions. In such cases, you are
not interested in writing fully portable software - it will already
contain many implementation-specific features or use compiler
extensions.
embedded system. Having only one process it had access to the entire
memory range, and didn't need to talk to the OS. Entirely C is quite
feasible there.
Sure - but you are not writing portable standard C. You are relying on implementation details, or writing code that is only suitable for a particular implementation (or set of implementations). It is normal to write this kind of thing in C, but it is non-portable C. (Or at least,
not fully portable C.)
But memmove? On an 80286 it will be using rep movsw, rather than a
software loop, to copy the memory contents to the new location.
_That_ does require assembler, or compiler extensions, not standard C.
It would normally be written in C, and the compiler will generate the
"rep" assembly. The bit you can't write in fully portable standard C is the comparison of the pointers so you know which direction to do the copying.
On 18/10/2024 20:45, David Brown wrote:
On 18/10/2024 18:38, Vir Campestris wrote:Ah, I see your point. Because some implementations will require communication with the OS there cannot be a truly portable malloc.
On 16/10/2024 08:21, David Brown wrote:
I have a vague feeling that once upon a time I wrote a malloc for an
I don't see an advantage in being able to implement them in standard
C. I /do/ see an advantage in being able to do so well in non-
standard, implementation-specific C.
The reason why you might want your own special memmove, or your own
special malloc, is that you are doing niche and specialised
software. For example, you might be making real-time software and
require specific time constraints on these functions. In such
cases, you are not interested in writing fully portable software -
it will already contain many implementation-specific features or use
compiler extensions.
embedded system. Having only one process it had access to the entire
memory range, and didn't need to talk to the OS. Entirely C is quite
feasible there.
Sure - but you are not writing portable standard C. You are relying
on implementation details, or writing code that is only suitable for a
particular implementation (or set of implementations). It is normal
to write this kind of thing in C, but it is non-portable C. (Or at
least, not fully portable C.)
It's a long time since I had to mistrust a compiler so much that I was pulling the assembler apart. It sounds as though they have got smarterBut memmove? On an 80286 it will be using rep movsw, rather than a
software loop, to copy the memory contents to the new location.
_That_ does require assembler, or compiler extensions, not standard C.
It would normally be written in C, and the compiler will generate the
"rep" assembly. The bit you can't write in fully portable standard C
is the comparison of the pointers so you know which direction to do
the copying.
in the meantime.
I just checked BTW, and you are correct.
On 20/10/2024 22:51, Vir Campestris wrote:For near-light-speed code I used to write it first in C, optimize that,
On 18/10/2024 20:45, David Brown wrote:
On 18/10/2024 18:38, Vir Campestris wrote:Ah, I see your point. Because some implementations will require
On 16/10/2024 08:21, David Brown wrote:
I have a vague feeling that once upon a time I wrote a malloc for an
I don't see an advantage in being able to implement them in
standard C. I /do/ see an advantage in being able to do so well in >>>>> non- standard, implementation-specific C.
The reason why you might want your own special memmove, or your own >>>>> special malloc, is that you are doing niche and specialised
software. For example, you might be making real-time software and
require specific time constraints on these functions. In such
cases, you are not interested in writing fully portable software - >>>>> it will already contain many implementation-specific features or
use compiler extensions.
embedded system. Having only one process it had access to the entire
memory range, and didn't need to talk to the OS. Entirely C is quite
feasible there.
Sure - but you are not writing portable standard C. You are relying >>> on implementation details, or writing code that is only suitable for >>> a particular implementation (or set of implementations). It is
normal to write this kind of thing in C, but it is non-portable C.Â
(Or at least, not fully portable C.)
communication with the OS there cannot be a truly portable malloc.
Yes.
I think /every/ implementation will require communication with the OS, > if there is an OS - otherwise it will need support from other parts of > the toolchain (such as symbols created in a linker script to define the
heap area - that's the typical implementation in small embedded systems).
The nearest you could get to a portable implementation would be using a local unsigned char array as the heap, but I don't believe that would be fully correct according to the effective type rules (or the "strict aliasing" or type-based aliasing rules, if you prefer those terms). It would also not be good enough for the needs of many programs.
Of course, a fair amount of the code for malloc/free can written in
fully portable C - and almost all of it can be written in a somewhat
vaguely defined "widely portable C" where you can mask pointer bits to > handle alignment, and other such conveniences.
It's a long time since I had to mistrust a compiler so much that I wasBut memmove? On an 80286 it will be using rep movsw, rather than a
software loop, to copy the memory contents to the new location.
_That_ does require assembler, or compiler extensions, not standard C. >>>>
It would normally be written in C, and the compiler will generate the
"rep" assembly. The bit you can't write in fully portable standard
C is the comparison of the pointers so you know which direction to do
the copying.
pulling the assembler apart. It sounds as though they have got smarter
in the meantime.
I just checked BTW, and you are correct.
Looking at the generated assembly is usually not a matter of mistrusting
the compiler. One of the reasons I do so is to check that the compiler
can generate efficient object code from my source code, in cases where I need maximal efficiency. I'd rather not write assembly unless I really have to!
That makes no sense to me. We are talking about implementing standard library functions. If you want to implement other functions, go ahead.I don't see an advantage in being able to implement them in standard C.It means you can likely also implement a related yet different API
without having your code "demoted" to non-standard.
Because some implementations will require
communication with the OS there cannot be a truly portable malloc.
On Sun, 20 Oct 2024 21:51:30 +0100, Vir Campestris wrote:
Because some implementations will require
communication with the OS there cannot be a truly portable malloc.
There can if you have a portable OS API. The only serious candidate for
that is POSIX.
On Mon, 21 Oct 2024 23:17:10 +0000, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 21:51:30 +0100, Vir Campestris wrote:
Because some implementations will require communication with the OS
there cannot be a truly portable malloc.
There can if you have a portable OS API. The only serious candidate for
that is POSIX.
POSIX is an environment not an OS.
For near-light-speed code I used to write it first in C, optimize
that, then I would translate it into (inline) asm and re-optimize
based on having the full cpu architecture available, before in the
final stage I would use the asm experience to tweak the C just
enough to let the compiler generate machine code quite close
(90+%) to my best asm, while still being portable to any cpu with
more or less the same capabilities.
One example: When I won an international competition to write the
fastest Pentomino solver (capable of finding all 2339/1010/368/2
solutions of the 6x10/5x12/4x15/3x20 layouts), I also included the
portable C version.
My asm submission was twice as fast as anyone else, while the C
version was still fast enough that a couple of years later I got a
prize in the mail: Someone in France had submitted my C code,
with my name & address, to a similar competition there and it was
still faster than anyone else. :-)
Terje Mathisen <terje.mathisen@tmsw.no> writes:
[C vs assembly]
For near-light-speed code I used to write it first in C, optimize
that, then I would translate it into (inline) asm and re-optimize
based on having the full cpu architecture available, before in the
final stage I would use the asm experience to tweak the C just
enough to let the compiler generate machine code quite close
(90+%) to my best asm, while still being portable to any cpu with
more or less the same capabilities.
One example: When I won an international competition to write the
fastest Pentomino solver (capable of finding all 2339/1010/368/2
solutions of the 6x10/5x12/4x15/3x20 layouts), I also included the
portable C version.
My asm submission was twice as fast as anyone else, while the C
version was still fast enough that a couple of years later I got a
prize in the mail: Someone in France had submitted my C code,
with my name & address, to a similar competition there and it was
still faster than anyone else. :-)
I hope you will consider writing a book, "Writing Fast Code" (or
something along those lines). The core of the book could be, oh,
let's say between 8 and 12 case studies, starting with a problem
statement and tracing through the process that you followed, or
would follow, with stops along the way showing the code at each
of the different stages.
If you do write such I book I guarantee I will want to buy one.
On Mon, 21 Oct 2024 23:52:59 +0000, MitchAlsup1 wrote:
On Mon, 21 Oct 2024 23:17:10 +0000, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 21:51:30 +0100, Vir Campestris wrote:
Because some implementations will require communication with the OS
there cannot be a truly portable malloc.
There can if you have a portable OS API. The only serious candidate for
that is POSIX.
POSIX is an environment not an OS.
Guess what the “OS” part of “POSIX” stands for.
Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
[C vs assembly]
For near-light-speed code I used to write it first in C, optimize
that, then I would translate it into (inline) asm and re-optimize
based on having the full cpu architecture available, before in the
final stage I would use the asm experience to tweak the C just
enough to let the compiler generate machine code quite close
(90+%) to my best asm, while still being portable to any cpu with
more or less the same capabilities.
One example: When I won an international competition to write the
fastest Pentomino solver (capable of finding all 2339/1010/368/2
solutions of the 6x10/5x12/4x15/3x20 layouts), I also included the
portable C version.
My asm submission was twice as fast as anyone else, while the C
version was still fast enough that a couple of years later I got a
prize in the mail: Someone in France had submitted my C code,
with my name & address, to a similar competition there and it was
still faster than anyone else. :-)
I hope you will consider writing a book, "Writing Fast Code" (or
something along those lines). The core of the book could be, oh,
let's say between 8 and 12 case studies, starting with a problem
statement and tracing through the process that you followed, or
would follow, with stops along the way showing the code at each
of the different stages.
If you do write such a book I guarantee I will want to buy one.
Thank you Tim!
Probably not a book but I would consider writing a series of blog
posts similar to that, now that I am about to retire:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:Exactly!
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
mitchalsup@aol.com (MitchAlsup1) writes:
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
And start working for "HER". (Honeydew list).
Terje Mathisen <terje.mathisen@tmsw.no> writes:
Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
[C vs assembly]
For near-light-speed code I used to write it first in C, optimize
that, then I would translate it into (inline) asm and re-optimize
based on having the full cpu architecture available, before in the
final stage I would use the asm experience to tweak the C just
enough to let the compiler generate machine code quite close
(90+%) to my best asm, while still being portable to any cpu with
more or less the same capabilities.
One example: When I won an international competition to write the
fastest Pentomino solver (capable of finding all 2339/1010/368/2
solutions of the 6x10/5x12/4x15/3x20 layouts), I also included the
portable C version.
My asm submission was twice as fast as anyone else, while the C
version was still fast enough that a couple of years later I got a
prize in the mail: Someone in France had submitted my C code,
with my name & address, to a similar competition there and it was
still faster than anyone else. :-)
I hope you will consider writing a book, "Writing Fast Code" (or
something along those lines). The core of the book could be, oh,
let's say between 8 and 12 case studies, starting with a problem
statement and tracing through the process that you followed, or
would follow, with stops along the way showing the code at each
of the different stages.
If you do write such a book I guarantee I will want to buy one.
Thank you Tim!
I know from past experience you are good at this. I would love
to hear what you have to say.
Probably not a book but I would consider writing a series of blog
posts similar to that, now that I am about to retire:
You could try writing one blog post a month on the subject. By
this time next year you will have plenty of material and be well
on your way to putting a book together. (First drafts are always
the hardest part...)
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
P.S. Is the email address in your message a good way to reach you?
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy
the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
Exactly!
I have unlimited amounts of potential/available mapping work, and I do
want to get back to NTP Hackers.
We recently started (officially) on the 754-2029 revision.
I'm still connected to Mill Computing as well.--- Synchronet 3.20a-Linux NewsLink 1.114
Terje
On Wed, 23 Oct 2024 19:11:59 +0000, Terje Mathisen wrote:I don't know that usage, I thought quires was a typesetting/printing
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week
before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy >>>> the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
Exactly!
I have unlimited amounts of potential/available mapping work, and I do>> want to get back to NTP Hackers.
We recently started (officially) on the 754-2029 revision.
Are you going to put in something equivalent to quires ??
Terje Mathisen <terje.mathisen@tmsw.no> writes:
Probably not a book but I would consider writing a series of blog
posts similar to that, now that I am about to retire:
You could try writing one blog post a month on the subject. By
this time next year you will have plenty of material and be well
on your way to putting a book together. (First drafts are always
the hardest part...)
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
Probably not a book but I would consider writing a series of blog
posts similar to that, now that I am about to retire:
You could try writing one blog post a month on the subject. By
this time next year you will have plenty of material and be well
on your way to putting a book together. (First drafts are always
the hardest part...)
One thing I have thought of is a wiki of optimization techniques that contains descriptions of the techniques and case studies, but I have
not yet implemented this idea.
On 24/10/2024 08:55, Anton Ertl wrote:
One thing I have thought of is a wiki of optimization techniques that
contains descriptions of the techniques and case studies, but I have
not yet implemented this idea.
Would it make sense to start something under Wikibooks on Wikipedia?
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 19:11:59 +0000, Terje Mathisen wrote:
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week >>>>>> before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual
vacation and self-chosen "work". In any case I hope you both enjoy >>>>> the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
Exactly!
I have unlimited amounts of potential/available mapping work, and I do
want to get back to NTP Hackers.
We recently started (officially) on the 754-2029 revision.
Are you going to put in something equivalent to quires ??
I don't know that usage, I thought quires was a typesetting/printing
measure?
Terje
On Sun, 20 Oct 2024 21:51:30 +0100, Vir Campestris wrote:
Because some implementations will require
communication with the OS there cannot be a truly portable malloc.
There can if you have a portable OS API. The only serious candidate for
that is POSIX.
My wife do have a small list of things that we (i.e. I) could do when we retire...
I'm pretty sure you don't get POSIX in your 64kb (max).
On 22/10/2024 00:17, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 21:51:30 +0100, Vir Campestris wrote:
Because some implementations will require
communication with the OS there cannot be a truly portable malloc.
There can if you have a portable OS API. The only serious candidate for
that is POSIX.
One of the other groups I'm following just for the hell of it is comp.os.cpm/ I'm pretty sure you don't get POSIX in your 64kb (max).
On Thu, 24 Oct 2024 5:39:52 +0000, Terje Mathisen wrote:OK, I have seen and used "Super-accumulator" as the term for those, I
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 19:11:59 +0000, Terje Mathisen wrote:
MitchAlsup1 wrote:
On Wed, 23 Oct 2024 14:25:42 +0000, Tim Rentsch wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
My wife and I will both go on "permanent vacation" starting a week >>>>>>> before Christmas. :-)
I'm guessing that permanent vacation will be some mixture of actual >>>>>> vacation and self-chosen "work". In any case I hope you both >>>>>> enjoy
the time.
Just remember, retirement does not mean you "stop working"
it means you "stop working for HIM".
Exactly!
I have unlimited amounts of potential/available mapping work, and I do >>>> want to get back to NTP Hackers.
We recently started (officially) on the 754-2029 revision.
Are you going to put in something equivalent to quires ??
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
MitchAlsup1 <mitchalsup@aol.com> schrieb:
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
On 10/28/2024 9:30 AM, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
Another newer alternative. This came up on my news feed. I haven't
looked at the details at all, so I can't comment on it.
https://arxiv.org/abs/2410.03692
MitchAlsup1 <mitchalsup@aol.com> schrieb:
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
These would be very large registers. You'd need some way to store and load the these for register spills, fills and task switch, as well as move
and manage them.
Karlsruhe above has a link to http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit:
"A floating-point accumulator occupies a 168-byte storage area that is aligned on a 256-byte boundary. An accumulator consists of a four-byte
status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory accumulator.
Of course, once you have 168-byte registers people are going to
think of new uses for them.
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
These would be very large registers. You'd need some way to store and load >> the these for register spills, fills and task switch, as well as move
and manage them.
Karlsruhe above has a link to
http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit:
"A floating-point accumulator occupies a 168-byte storage area that is
aligned on a 256-byte boundary. An accumulator consists of a four-byte
status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory accumulator.
Makes sense, given the time this was implemented. This was also a
mid-range machine, not a number cruncher. I do not find the
number of cycles that the instructions took.
But this was also for hex floating point. A similar scheme for IEEE
double would need a bit more than 2048 bits, so five AVX-512 registers.
SIMD from hell? Pretend that a CPU is a graphics card? :-)
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:These would be very large registers. You'd need some way to store and load >> the these for register spills, fills and task switch, as well as move
In posits, a quire is an accumulator with as many binary digitsNot restricted to posits, I believe (but the term may differ).
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility
and manage them.
Karlsruhe above has a link to
http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit:
"A floating-point accumulator occupies a 168-byte storage area that is
aligned on a 256-byte boundary. An accumulator consists of a four-byte
status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory accumulator.
Makes sense, given the time this was implemented. This was also a
mid-range machine, not a number cruncher. I do not find the
number of cycles that the instructions took.
But this was also for hex floating point. A similar scheme for IEEE
double would need a bit more than 2048 bits, so five AVX-512 registers.
Thomas Koenig wrote:
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:These would be very large registers. You'd need some way to store and load >>> the these for register spills, fills and task switch, as well as move
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility >>>
and manage them.
Karlsruhe above has a link to
http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit:
"A floating-point accumulator occupies a 168-byte storage area that is
aligned on a 256-byte boundary. An accumulator consists of a four-byte
status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory accumulator.
Makes sense, given the time this was implemented. This was also a
mid-range machine, not a number cruncher. I do not find the
number of cycles that the instructions took.
At the time, memory was just a few clock cycles away from the CPU, so
not really that problematic. Today, such a super-accumulator would stay
in $L1 most of the time, or at least the central, in-use cache line of
it, would do so.
But this was also for hex floating point. A similar scheme for IEEE
double would need a bit more than 2048 bits, so five AVX-512 registers.
With 1312 bits of storage, their fp inputs (hex fp?) must have had a
smaller exponent range than ieee double.
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Thomas Koenig wrote:
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:These would be very large registers. You'd need some way to store and
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility >>>>
load
the these for register spills, fills and task switch, as well as move
and manage them.
Karlsruhe above has a link to
http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit: >>>>
"A floating-point accumulator occupies a 168-byte storage area that is >>>> aligned on a 256-byte boundary. An accumulator consists of a four-byte >>>> status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory
accumulator.
Makes sense, given the time this was implemented. This was also a
mid-range machine, not a number cruncher. I do not find the
number of cycles that the instructions took.
At the time, memory was just a few clock cycles away from the CPU, so
not really that problematic. Today, such a super-accumulator would stay
in $L1 most of the time, or at least the central, in-use cache line of
it, would do so.
But this was also for hex floating point. A similar scheme for IEEE
double would need a bit more than 2048 bits, so five AVX-512 registers.
With 1312 bits of storage, their fp inputs (hex fp?) must have had a
smaller exponent range than ieee double.
IBM format had one sign bit, seven exponent bits and six or fourteen hexadecimal digits for single and double precision, respectively.
(Insert fear and loathing for hex float here).
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Thomas Koenig wrote:
EricP <ThatWouldBeTelling@thevillage.com> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:These would be very large registers. You'd need some way to store and load >>>> the these for register spills, fills and task switch, as well as move
In posits, a quire is an accumulator with as many binary digits
as to cover max-exponent to min-exponent; so one can accumulate
an essentially unbounded number of sums without loss of precision
--to obtain a sum with a single rounding.
Not restricted to posits, I believe (but the term may differ).
At university, I had my programming
courses on a Pascal compiler which implemented
https://en.wikipedia.org/wiki/Karlsruhe_Accurate_Arithmetic ,
a hardware implementation was on the 4361 as an option
https://en.wikipedia.org/wiki/IBM_4300#High-Accuracy_Arithmetic_Facility >>>>
and manage them.
Karlsruhe above has a link to
http://www.bitsavers.org/pdf/ibm/370/princOps/SA22-7093-0_High_Accuracy_Arithmetic_Jan84.pdf
which describes their large accumulators as residing in memory, which
avoids the spill/fill/switch issue but with an obvious performance hit: >>>>
"A floating-point accumulator occupies a 168-byte storage area that is >>>> aligned on a 256-byte boundary. An accumulator consists of a four-byte >>>> status area on the left, followed by a 164-byte numeric area."
The operands are specified by virtual address of their in-memory accumulator.
Makes sense, given the time this was implemented. This was also a
mid-range machine, not a number cruncher. I do not find the
number of cycles that the instructions took.
At the time, memory was just a few clock cycles away from the CPU, so
not really that problematic. Today, such a super-accumulator would stay
in $L1 most of the time, or at least the central, in-use cache line of
it, would do so.
But this was also for hex floating point. A similar scheme for IEEE
double would need a bit more than 2048 bits, so five AVX-512 registers.
With 1312 bits of storage, their fp inputs (hex fp?) must have had a
smaller exponent range than ieee double.
IBM format had one sign bit, seven exponent bits and six or fourteen >hexadecimal digits for single and double precision, respectively.
(Insert fear and loathing for hex float here).
(Insert fear and loathing for hex float here).
Heck, watching Kahan's notes on FP problems leaves one in fear of
binary floating point representations.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 120:05:55 |
Calls: | 12,958 |
Files: | 186,574 |
Messages: | 3,265,641 |