I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?
On 10/17/2024 4:40 PM, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>> instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a distributed
seqlock for some reason. Are you using an asymmetric membar in here?
in smr_poll ?
I remember a long time ago I was messing around where each thread had
two version counters:
pseudo code:
per_thread
{
word m_version[2];
word acquire()
{
word ver = load(global_version);
m_version[ver % 2] = ver ;
return ver ;
}
void release(word ver)
{
m_version[ver % 2] = 0;
}
}
The global_version would only be incremented by the polling thread. This
was WAY back. I think I might of posted about it on cpt.
So, when a node was made unreachable, it would be included in the
polling logic. The polling could increment the version counter then wait
for all the threads prior m_versions to be zero. Collect the current generation of objects in a defer list. Then on the next cycle it would increment the version counter, wait until all threads prior versions
were zero, then delete the defer count, and transfer the current gen to
the defer.
It went something like that.
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>> instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a distributed
seqlock for some reason. Are you using an asymmetric membar in here?
in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>>> instead of mostly wait-free. The reader lock logic after loading >>>>>> the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>> it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>>>> instead of mostly wait-free. The reader lock logic after loading >>>>>>> the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>> it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens. >>>>>>
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get back
to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait- >>>>>>>> free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>> instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>> it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region >>>> is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>> some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get back
to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric membar?
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
On 10/27/2024 3:29 PM, jseigh wrote:
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit
happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
Ahh, nice! acquire/release, no seq_cst, right? ;^)
On 10/27/24 18:32, Chris M. Thomasson wrote:
On 10/27/2024 3:29 PM, jseigh wrote:
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after >>>>>>>>>>> loading
the address of the reader lock object into a register is now 2 >>>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on >>>>>>>>>>> porting
it to c++ and don't want to waste any more time on c version. >>>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit >>>>>>>>>> happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>>> membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical >>>>>>> region
is 3 instructions unless you use atomics which are expensive and >>>>>>> have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>>> In this case I am slingshotting off of the asymmetric memory
barrier.
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>>> logic or qsbr logic as a config option, but the extra code
complexity
and the fact that qsbr required 2 grace periods kind of made that >>>>>>> unfeasible. The qsbr logic was mostly ripped out but there were >>>>>>> still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work >>>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>>> proxies and testcases which tend to be more work than the code that >>>>>>> they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in >>>>> part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
Ahh, nice! acquire/release, no seq_cst, right? ;^)
The membar version? That's a store/load membar so it is expensive.
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
On 10/27/2024 3:29 PM, jseigh wrote:
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit
happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
I should get back into one of my older proxy algorithms. Things are
mostly wait free, if XADD can be wait free itself. No CAS in sight. I
just found an older version I posted.. Almost forgot I make this post:
https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ
https://pastebin.com/raw/nPVYXbWM
(raw text, no ad bullshit)
On 10/27/2024 3:29 PM, jseigh wrote:
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit
happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
I should get back into one of my older proxy algorithms. Things are
mostly wait free, if XADD can be wait free itself. No CAS in sight. I
just found an older version I posted.. Almost forgot I make this post:
https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ
https://pastebin.com/raw/nPVYXbWM
(raw text, no ad bullshit)
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
For cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.
Joe Seigh
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)
For cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.
Joe Seigh
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);^^^^^^^^^^^^^^^^^^^^^^
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
x1 < (x1 + n)
for any value of x1 and any value of n from 0 to 2**63;
eg.
0xfffffffffffffff0 < 0x0000000000000001
The rewrite is almost complete except for some thread_local
stuff. I think I might break off there. Most of the
additional work is writing the test code. I'm considering
rewriting it in Rust.
Joe Seigh
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
x1 < (x1 + n)
for any value of x1 and any value of n from 0 to 2**63;
eg.
0xfffffffffffffff0 < 0x0000000000000001
The rewrite is almost complete except for some thread_local
stuff. I think I might break off there. Most of the
additional work is writing the test code. I'm considering
rewriting it in Rust.
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);^^^^^^^^^^^^^^^^^^^^^^
}
Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
Only your single polling thread can mutate the shadow_epoch, right?
On 10/29/24 00:35, Chris M. Thomasson wrote:
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);^^^^^^^^^^^^^^^^^^^^^^
}
Still don't know how your pure C++ write up can handle this without an
std::atomic_thread_fence(std::memory_order_acquire).
No thread fence is necessary. The loads can move before
the store. They just can't move before the async
membar. After that membar any previously retired
objects are no longer reachable.
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
On 10/29/24 00:38, Chris M. Thomasson wrote:
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
Only your single polling thread can mutate the shadow_epoch, right?
Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
I was wondering in your c++ version if you had to use any seq_cst
The membar version? That's a store/load membar so it is expensive. >>>>>
barriers. I think acquire/release should be good enough. Now, when
I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers
and things like that.
I take it that your pure C++ version has no atomic RMW, right? Just >>>>> loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
[...]
Humm... I am not sure if it would work with just the release. The
polling thread would read from these per thread epochs, _ref_epoch,
using an acquire barrier? Still. Not sure if that would work. Need to
put my thinking cap on. ;^)
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
per_thread
{
std::mutex m_locks[2];
lock()
{
word ver = g_version;
m_locks[ver % 2].lock();
}
unlock(word ver)
{
m_locks[ver % 2].unlock();
}
}
___________________
The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version
for all threads as t
{
t.m_locks[ver % 2].lock();
t.m_locks[ver % 2].unlock();
}
___________________
After that, it knew the previous generation was completed.
It was just a way for using a mutex to get distributed proxy like behavior.
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
per_thread
{
std::mutex m_locks[2];
lock()
{
word ver = g_version;
m_locks[ver % 2].lock();
}
unlock(word ver)
{
m_locks[ver % 2].unlock();
}
}
___________________
The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version
for all threads as t
{
t.m_locks[ver % 2].lock();
t.m_locks[ver % 2].unlock();
}
___________________
After that, it knew the previous generation was completed.
It was just a way for using a mutex to get distributed proxy like behavior.
On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
For some reason you made me think of another very simple proxy
technique using per thread mutexes. It was an experiment a while back:
___________________
per_thread
{
std::mutex m_locks[2];
lock()
{
word ver = g_version;
m_locks[ver % 2].lock();
}
unlock(word ver)
{
m_locks[ver % 2].unlock();
}
Oops! lock() should return the version then pass it on to unlock. Sorry
for missing that in my pseudo-code. ;^o
}
___________________
The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version
for all threads as t
{
t.m_locks[ver % 2].lock();
t.m_locks[ver % 2].unlock();
}
___________________
After that, it knew the previous generation was completed.
It was just a way for using a mutex to get distributed proxy like
behavior.
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
On 10/29/2024 4:27 AM, jseigh wrote:
Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.
Are you taking advantage of the fancy alignment capabilities of C++?
https://en.cppreference.com/w/cpp/language/alignas
and friends? They seem to work fine wrt the last time I checked them.
It's nice to have a standard way to pad and align on cache line
boundaries. :^)
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar in
your C++ version. Sorry. However, it still would be fun to test
against... ;^)
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
Joe, can you call dtors for nodes after a single epoch?
On 10/29/24 17:56, Chris M. Thomasson wrote:
On 10/29/2024 4:27 AM, jseigh wrote:
Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.
Are you taking advantage of the fancy alignment capabilities of C++?
https://en.cppreference.com/w/cpp/language/alignas
and friends? They seem to work fine wrt the last time I checked them.
It's nice to have a standard way to pad and align on cache line
boundaries. :^)
It's target processor dependent. You need to query the cache line size
and pass to compile as a define variable.
There's supposed to be built in define that would be target system
dependent but it's not implemented.
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
Joe Seigh
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work.
I don't see why it would not work.
For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
On 11/4/24 00:14, Chris M. Thomasson wrote:
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a
non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.
Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.
On Mon, 4 Nov 2024 07:46:37 -0500
jseigh <jseigh_es00@xemaps.com> boring babbled:
On 11/4/24 00:14, Chris M. Thomasson wrote:
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version, >>>>> then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar >>>>> in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a
non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.
Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.
Given most concurrent operating systems are written in these "imperfect" languages how does that square with your definition? And how would your perfect language run on them?
Anyway, concurrency is the job of the OS, not the language. C++ threading is just a wrapper around pthreads on *nix and windows threads on Windows. The language just needs an interface to the underlying OS functionality, it should
not try and implement the functionality itself as it'll always be a hack.
On 11/4/2024 6:09 AM, Muttley@DastartdlyHQ.org wrote:
On Mon, 4 Nov 2024 07:46:37 -0500
jseigh <jseigh_es00@xemaps.com> boring babbled:
On 11/4/24 00:14, Chris M. Thomasson wrote:
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version, >>>>>> then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar >>>>>> in your C++ version. Sorry. However, it still would be fun to test >>>>>> against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that >>>> uses an async memory barrier is that its automatically rendered into a >>>> non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us >>>> at least! ;^)
That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.
Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.
Given most concurrent operating systems are written in these "imperfect"
languages how does that square with your definition? And how would your
perfect language run on them?
Anyway, concurrency is the job of the OS, not the language. C++
threading is
just a wrapper around pthreads on *nix and windows threads on Windows.
The
language just needs an interface to the underlying OS functionality,
it should
not try and implement the functionality itself as it'll always be a hack.
A start would be C++ having an "always lock free" CAS for two contiguous words on systems that support it, yes, even 64 bit. ala:
struct anchor {
word a;
word b;
};
cmpxchg8b for x86, cmpxchg16b for x64, ect...
https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (1 / 9) |
Uptime: | 75:29:29 |
Calls: | 12,948 |
Calls today: | 2 |
Files: | 186,574 |
Messages: | 3,264,524 |