float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the double
is slower?
On 11/3/2024 11:53 PM, fir wrote:
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the double
is slower?
Chris M. Thomasson wrote:
On 11/3/2024 11:53 PM, fir wrote:
float takes less space and when you keep arrays of floats for sure float >>> is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed double
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
fir wrote:
Chris M. Thomasson wrote:im writing some cpu intensive experiment (something liek alpha blending images on cpu mostly) and interestingly i just turned float into double
On 11/3/2024 11:53 PM, fir wrote:
float takes less space and when you keep arrays of floats for sure
float
is better (less spase and uses less memory bandwidth so i guess floats >>>> can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed double
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
in that routine and it speeded up (as far as i can see, as i dont have
tme for much tests , changing to double turnet 35 ,s per frame into 34
ms per frame
Chris M. Thomasson wrote:
On 11/3/2024 11:53 PM, fir wrote:
float takes less space and when you keep arrays of floats for sure float >>> is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed double
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the double
is slower?
On 04/11/2024 08:53, fir wrote:
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern "big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to double when writing in C. If you want to stick to floats, make sure you don't use double-precision constants - a missing "f" suffix can change a whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using - "-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea, especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)
its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)
David Brown wrote:
On 04/11/2024 08:53, fir wrote:the code that seem to speeded up a bit when turning float to double is
float takes less space and when you keep arrays of floats for sure float >>> is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
union Color
{
unsigned u;
struct { unsigned char b,g,r,a;};
};
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
inline unsigned GetPixelUnsafe_(int x, int y)
{
return frame_bitmap[y*frame_size_x+x];
}
inline void SetPixelUnsafe_(int x, int y, unsigned color)
{
frame_bitmap[y*frame_size_x+x]=color;
}
void DrawPoint(int i)
{
// if(!point[i].enabled) return;
int xq = point[i].x;
int yq = point[i].y;
Color c;
Color bc;
if(d_toggler)
{
// DrawCircle(xq,yq,point[i].radius,0xffffff);
FillCircle(xq,yq,point[i].radius,point[i].c.u);
return;
}
float R = point[i].radius*5;
int y_start = max(0, yq-R);
int y_end = min(frame_size_y, yq+R);
int x_start = max(0, xq-R);
int x_end = min(frame_size_x, xq+R);
for(int y = y_start; y<y_end; y++)
{
for(int x = x_start; x<x_end; x++)
{
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
if(!i_toggler)
{
if(p<0.4*R) continue;
}
else
if(p<0) continue;
p/=R;
bc.u = GetPixelUnsafe_(x,y);
int r = bc.r + (point[i].c.r)* p*p*p;
int g = bc.g + (point[i].c.g)* p*p*p;
int b = bc.b + (point[i].c.b)* p*p*p;
if(!r_toggler)
{
if(r>255) r = 255;
if(g>255) g = 255;
if(b>255) b = 255;
}
c.r = r;
c.g = g;
c.b = b;
SetPixelUnsafe_(x,y,c.u);
}
}
}
this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)
its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)
David Brown wrote:
On 04/11/2024 08:53, fir wrote:the code that seem to speeded up a bit when turning float to double is
float takes less space and when you keep arrays of floats for sure float >>> is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
On 05/11/2024 10:49, fir wrote:
David Brown wrote:
On 04/11/2024 08:53, fir wrote:the code that seem to speeded up a bit when turning float to double is
float takes less space and when you keep arrays of floats for sure
float
is better (less spase and uses less memory bandwidth so i guess floats >>>> can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache >>> it rate can make floats faster than doubles.
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source >>> code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get >>> some idea of any fundamental differences in timings. Usually for modern >>> "big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you >>> don't use double-precision constants - a missing "f" suffix can change a >>> whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using - >>> "-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in >>> what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
I've tried to snip the bits that are important here.
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
What happens here depends on what #include files you use. If you have #include <math.h>, then "sqrt" is defined with doubles. So the sum-of-squares expression is calculated using floats. Then this sum is converted to a double (taking an extra instruction or two) before
calling double-precision sqrt. Then it is converting that result back
to float to return it.
If you have "#include <tgmath.h>", then "sqrt" here will be done as
float sqrtf, rather than double. But the library version of sqrtf()
might actually call sqrt (double). If you want to be sure, be explicit
with sqrtf().
And on many platforms, sqrt (float or double) uses a library function
for full IEEE compatibility. With "-ffast-math", you are telling the compiler you promise that the operand for "sqrt" will be "nice", and it
can use a single hardware sqrt instruction. This will likely be a lot faster, especially if the float version is used. (Disclaimer - I
haven't looked at this on modern x86 targets. Check yourself - I
recommend putting your code into godbolt.org and examining the assembly.)
In the code that uses this function, you are starting with integer types
that need to be converted to float to pass to the distance function, and
the result of the call is used in a float expression before being
converted to double.
In short, it is a complete mess of conversions. And unless you are
using something like gcc's "-ffast-math" to say "don't worry about the
minor details of IEEE, optimise akin to integer arithmetic", then the compiler has to generate all these back-and-forth conversions.
Being consistent in your types is going to improve things, whether you
use floats or doubles. You might even be better off using integer
arithmetic in some points.
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
David Brown wrote:
On 05/11/2024 10:49, fir wrote:well that interesting..especially i was unaware of this sqrtf i will see
David Brown wrote:
On 04/11/2024 08:53, fir wrote:the code that seem to speeded up a bit when turning float to double is
float takes less space and when you keep arrays of floats for sure
float
is better (less spase and uses less memory bandwidth so i guess floats >>>>> can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and
cache
it rate can make floats faster than doubles.
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range >>>> of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the
source
code in question, the compiler and the options. So there is no single >>>> easy answer.
You can, as Bonita suggested, look up instruction timing information at >>>> agner.org for the cpu you are using (assuming it's an x86 device) to
get
some idea of any fundamental differences in timings. Usually for
modern
"big" processors, basic operations such as addition and multiplication >>>> are single cycle or faster (i.e., multiple instructions can be done in >>>> parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some >>>> devices, you can do that with floats but not doubles - and even if you >>>> can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to >>>> double when writing in C. If you want to stick to floats, make sure
you
don't use double-precision constants - a missing "f" suffix can
change a
whole expression into double calculations. Remember that it takes time >>>> to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those >>>> are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a >>>> good start. Make sure you are optimising for the cpu(s) you are
using -
"-march=native" is good for local programs, but you will want something >>>> more specific if the binary needs to run on a variety of machines. The >>>> closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity, >>>> it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of
doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea, >>>> especially if you are mixing floats and doubles. "-Wfloat-conversion" >>>> will warn about implicit conversions from doubles to floats (or to
integers).
I've tried to snip the bits that are important here.
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
What happens here depends on what #include files you use. If you have
#include <math.h>, then "sqrt" is defined with doubles. So the
sum-of-squares expression is calculated using floats. Then this sum is
converted to a double (taking an extra instruction or two) before
calling double-precision sqrt. Then it is converting that result back
to float to return it.
If you have "#include <tgmath.h>", then "sqrt" here will be done as
float sqrtf, rather than double. But the library version of sqrtf()
might actually call sqrt (double). If you want to be sure, be explicit
with sqrtf().
And on many platforms, sqrt (float or double) uses a library function
for full IEEE compatibility. With "-ffast-math", you are telling the
compiler you promise that the operand for "sqrt" will be "nice", and it
can use a single hardware sqrt instruction. This will likely be a lot
faster, especially if the float version is used. (Disclaimer - I
haven't looked at this on modern x86 targets. Check yourself - I
recommend putting your code into godbolt.org and examining the assembly.)
In the code that uses this function, you are starting with integer types
that need to be converted to float to pass to the distance function, and
the result of the call is used in a float expression before being
converted to double.
In short, it is a complete mess of conversions. And unless you are
using something like gcc's "-ffast-math" to say "don't worry about the
minor details of IEEE, optimise akin to integer arithmetic", then the
compiler has to generate all these back-and-forth conversions.
Being consistent in your types is going to improve things, whether you
use floats or doubles. You might even be better off using integer
arithmetic in some points.
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
a bit later
as to -fast-math i dont noticed the difference though i was not testing
it besides simple sight.. i used it back years then but later i disabled
it as i get some bug in one code which was afair caused by that
(im not sure though, today i rarely code at all so im not to much fresh
to various test)
in fact i could more hardy optimise it just by building table with that fading circle of size 45x45 and do a look up there (back then i was
doing a big doze of thsi level optimisations, but after all i know it is
to do on final stage of app as it generally makes harder to work on it
at live and test various changes, but as final stage its generally worth
if something runs 30-50% faster)
some can test it BTW
https://drive.google.com/file/d/1-Obb6F19h5yfCbCETP4-VFoV3XYGpRsN/view?usp=sharing
its for windows but worx under wine afair /and on linux wirtual machine
on windows also (afair, i dont know as i got only windows)
fir wrote:
some can test it BTW
https://drive.google.com/file/d/1-Obb6F19h5yfCbCETP4-VFoV3XYGpRsN/
view?usp=sharing
its for windows but worx under wine afair /and on linux wirtual machine
on windows also (afair, i dont know as i got only windows)
you may also see it on youtube if afraid to runn app (though app is much better)
https://www.youtube.com/watch?v=7_Fodb7ivZY
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 993 |
Nodes: | 10 (0 / 10) |
Uptime: | 208:05:26 |
Calls: | 12,972 |
Calls today: | 1 |
Files: | 186,574 |
Messages: | 3,268,395 |