Forum: War Ensemble BBS

smrproxy v2

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Thu Oct 17 08:10:20 2024

From Newsgroup: comp.lang.c++

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 13:10:43 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Thu Oct 17 17:08:04 2024

From Newsgroup: comp.lang.c++

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Joe Seigh
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 16:40:00 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a distributed
seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 17:16:55 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 4:40 PM, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?

I remember a long time ago I was messing around where each thread had
two version counters:

pseudo code:

per_thread
{
word m_version[2];

word acquire()
{
word ver = load(global_version);
m_version[ver % 2] = ver ;
return ver ;
}

void release(word ver)
{
m_version[ver % 2] = 0;
}
}

The global_version would only be incremented by the polling thread. This
was WAY back. I think I might of posted about it on cpt.

So, when a node was made unreachable, it would be included in the
polling logic. The polling could increment the version counter then wait
for all the threads prior m_versions to be zero. Collect the current generation of objects in a defer list. Then on the next cycle it would increment the version counter, wait until all threads prior versions
were zero, then delete the defer count, and transfer the current gen to
the defer.

It went something like that.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 17:24:08 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 5:16 PM, Chris M. Thomasson wrote:

On 10/17/2024 4:40 PM, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>> instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC.   I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a distributed
seqlock for some reason. Are you using an asymmetric membar in here?
in smr_poll ?

I remember a long time ago I was messing around where each thread had
two version counters:

pseudo code:

per_thread
{
    word m_version[2];

    word acquire()
    {
        word ver = load(global_version);
        m_version[ver % 2] = ver ;
        return ver ;
    }

    void release(word ver)
    {
        m_version[ver % 2] = 0;
    }
}

The global_version would only be incremented by the polling thread. This
was WAY back. I think I might of posted about it on cpt.

So, when a node was made unreachable, it would be included in the
polling logic. The polling could increment the version counter then wait
for all the threads prior m_versions to be zero. Collect the current generation of objects in a defer list. Then on the next cycle it would increment the version counter, wait until all threads prior versions
were zero, then delete the defer count, and transfer the current gen to
the defer.

It went something like that.

Iirc, I was using FlushProcessWriteBuffers back then for an asymmetric
barrier for my experiment. The polling thread would execute one after it increased the global version. Actualy, I can't remember where I placed
it exactly, after or before. The defer list made things work.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Fri Oct 18 08:07:11 2024

From Newsgroup: comp.lang.c++

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.

Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Oct 25 15:00:15 2024

From Newsgroup: comp.lang.c++

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>> instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a distributed
seqlock for some reason. Are you using an asymmetric membar in here?
in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.

Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get back to
you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Fri Oct 25 18:56:16 2024

From Newsgroup: comp.lang.c++

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>>> instead of mostly wait-free. The reader lock logic after loading >>>>>> the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>> it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.

Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Joe Seigh
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 12:33:46 2024

From Newsgroup: comp.lang.c++

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>>>> instead of mostly wait-free. The reader lock logic after loading >>>>>>> the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>> it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens. >>>>>>

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.

Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get back
to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric membar?
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Sun Oct 27 18:29:43 2024

From Newsgroup: comp.lang.c++

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait- >>>>>>>> free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>> instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>> it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region >>>> is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.

Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>> some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get back
to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 15:32:33 2024

From Newsgroup: comp.lang.c++

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

Ahh, nice! acquire/release, no seq_cst, right? ;^)

--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Sun Oct 27 20:35:54 2024

From Newsgroup: comp.lang.c++

On 10/27/24 18:32, Chris M. Thomasson wrote:

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit
happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

Ahh, nice! acquire/release, no seq_cst, right? ;^)

The membar version? That's a store/load membar so it is expensive.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:02:58 2024

From Newsgroup: comp.lang.c++

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after >>>>>>>>>>> loading
the address of the reader lock object into a register is now 2 >>>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on >>>>>>>>>>> porting
it to c++ and don't want to waste any more time on c version. >>>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit >>>>>>>>>> happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>>> membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical >>>>>>> region
is 3 instructions unless you use atomics which are expensive and >>>>>>> have
memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>>> In this case I am slingshotting off of the asymmetric memory
barrier.

Earlier at one point I was going to have smrproxy use hazard pointer >>>>>>> logic or qsbr logic as a config option, but the extra code
complexity
and the fact that qsbr required 2 grace periods kind of made that >>>>>>> unfeasible. The qsbr logic was mostly ripped out but there were >>>>>>> still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work >>>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>>> proxies and testcases which tend to be more work than the code that >>>>>>> they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in >>>>> part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

Ahh, nice! acquire/release, no seq_cst, right? ;^)

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I say
C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:08:23 2024

From Newsgroup: comp.lang.c++

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

I should get back into one of my older proxy algorithms. Things are
mostly wait free, if XADD can be wait free itself. No CAS in sight. I
just found an older version I posted.. Almost forgot I make this post:

https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

https://pastebin.com/raw/nPVYXbWM
(raw text, no ad bullshit)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:09:54 2024

From Newsgroup: comp.lang.c++

On 10/27/2024 9:08 PM, Chris M. Thomasson wrote:

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit
happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

I should get back into one of my older proxy algorithms. Things are
mostly wait free, if XADD can be wait free itself. No CAS in sight. I
just found an older version I posted.. Almost forgot I make this post:

https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

https://pastebin.com/raw/nPVYXbWM
(raw text, no ad bullshit)

It beats a read write lock, but it has trouble beating one that does not
use any atomic RMW's in the fast path ala:
___________________
collector& acquire()
{
// increment the master count _and_ obtain current collector.
std::uint32_t current =
m_current.fetch_add(ct_ref_inc, std::memory_order_acquire);

// decode the collector index.
return m_collectors[current & ct_proxy_mask];
}

void release(collector& c)
{
// decrement the collector.
std::uint32_t count =
c.m_count.fetch_sub(ct_ref_inc, std::memory_order_release);

// check for the completion of the quiescence process.
if ((count & ct_ref_mask) == ct_ref_complete)
{
// odd reference count and drop-to-zero condition detected!
prv_quiesce_complete(c);
}
}
___________________
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:13:05 2024

From Newsgroup: comp.lang.c++

On 10/27/2024 9:08 PM, Chris M. Thomasson wrote:

On 10/27/2024 3:29 PM, jseigh wrote:

On 10/27/24 15:33, Chris M. Thomasson wrote:

On 10/25/2024 3:56 PM, jseigh wrote:

On 10/25/24 18:00, Chris M. Thomasson wrote:

On 10/18/2024 5:07 AM, jseigh wrote:

On 10/17/24 19:40, Chris M. Thomasson wrote:

On 10/17/2024 2:08 PM, jseigh wrote:

On 10/17/24 16:10, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

I have to take a look at it! Been really busy lately. Shit
happens.

There's a quick and dirty explanation at
http://threadnought.wordpress.com/

repo at https://github.com/jseigh/smrproxy

I'll need to create some memory access diagrams that
visualize how it works at some point.

Anyway if it's new, another algorithm to use without
attribution.

Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

Yes, linux membarrier() in smr_poll.

Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were >>>>>> still
some pieces there.

Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.

https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/

No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.

Never even tried Rust, shit, I am behind the times. ;^)

Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?

Is C++ thinking about creating a standard way to gain an asymmetric
membar?

I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.

Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.

I should get back into one of my older proxy algorithms. Things are
mostly wait free, if XADD can be wait free itself. No CAS in sight. I
just found an older version I posted.. Almost forgot I make this post:

https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

https://pastebin.com/raw/nPVYXbWM
(raw text, no ad bullshit)

then we can do something like a per-reader-thread incrementing its own
version counter on every say 1000 reads and do a store followed by a
release membar to allow the polling thread to pick it up via an acquire
load. This is in pure C++, no asymmetric magic here at all.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Oct 28 07:45:15 2024

From Newsgroup: comp.lang.c++

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

For cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.

Joe Seigh
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 14:57:23 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some
others along those lines, so to speak. This is for the store and load
version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO
mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)

For cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 15:09:39 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 2:57 PM, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.

The fun part is that since its pure C++, those membars are going to be
emitted for me no matter what arch it's compiling for.

This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)

For cmpxchg, it has full cst_seq. For other rmw atomics I don't
know. I have to ask on c.a. I think some data dependency and/or
control dependency might factor in.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Oct 28 21:17:55 2024

From Newsgroup: comp.lang.c++

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)

fwiw, here's the lock and unlock logic from smrproxy rewrite

inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}

inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

x1 < (x1 + n)

for any value of x1 and any value of n from 0 to 2**63;
eg.
0xfffffffffffffff0 < 0x0000000000000001

The rewrite is almost complete except for some thread_local
stuff. I think I might break off there. Most of the
additional work is writing the test code. I'm considering
rewriting it in Rust.

Joe Seigh
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:35:33 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)

fwiw, here's the lock and unlock logic from smrproxy rewrite

    inline void lock()
    {
        epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
        _ref_epoch.store(_epoch, std::memory_order_relaxed);

        std::atomic_signal_fence(std::memory_order_acquire);

^^^^^^^^^^^^^^^^^^^^^^

    }

Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).

    inline void unlock()
    {
        _ref_epoch.store(0, std::memory_order_release);
    }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

    x1 < (x1 + n)

for any value of x1 and any value of n from 0 to 2**63;
eg.
   0xfffffffffffffff0 < 0x0000000000000001

For some reason it reminds me of your atomic 63 bit counter from a while
back. If your smr_poll thread is the only thread that can increment the
epoch, then it will be able to detect rolling to zero, right? For one of
my older proxys the thread would have two version counters:

per_thread
{
word m_ver[2];

lock()
{
word ver = g_version;
m_ver[ver % 2] = ver;
}

unlock(word ver)
{
m_ver[ver % 2] = 0;
}
}

So, it's different logic. The only way I got around using memory
barriers (compiler barriers aside for a moment), here was for the
polling thread to use an asymmetric membar. For a pure C++ version, I
think it would require:

per_thread
{
word m_ver[2];

lock()
{
word ver = g_version;
m_ver[ver % 2] = ver;
#LoadStore | #LoadLoad;
}

unlock(word ver)
{
#LoadStore | #StoreStore;
m_ver[ver % 2] = 0;
}
}

I think.... Humm...

The rewrite is almost complete except for some thread_local
stuff. I think I might break off there. Most of the
additional work is writing the test code. I'm considering
rewriting it in Rust.

Joe Seigh

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:38:09 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)

fwiw, here's the lock and unlock logic from smrproxy rewrite

    inline void lock()
    {
        epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
        _ref_epoch.store(_epoch, std::memory_order_relaxed);
        std::atomic_signal_fence(std::memory_order_acquire);
    }

    inline void unlock()
    {
        _ref_epoch.store(0, std::memory_order_release);
    }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

Only your single polling thread can mutate the shadow_epoch, right?

    x1 < (x1 + n)

for any value of x1 and any value of n from 0 to 2**63;
eg.
   0xfffffffffffffff0 < 0x0000000000000001

The rewrite is almost complete except for some thread_local
stuff. I think I might break off there. Most of the
additional work is writing the test code. I'm considering
rewriting it in Rust.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:41:12 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive.

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.

I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)

fwiw, here's the lock and unlock logic from smrproxy rewrite

    inline void lock()
    {
        epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
        _ref_epoch.store(_epoch, std::memory_order_relaxed);
        std::atomic_signal_fence(std::memory_order_acquire);
    }

    inline void unlock()
    {
        _ref_epoch.store(0, std::memory_order_release);
    }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

[...]

Humm... I am not sure if it would work with just the release. The
polling thread would read from these per thread epochs, _ref_epoch,
using an acquire barrier? Still. Not sure if that would work. Need to
put my thinking cap on. ;^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 22:02:23 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

For some reason you made me think of another very simple proxy technique
using per thread mutexes. It was an experiment a while back: ___________________
per_thread
{
std::mutex m_locks[2];

lock()
{
word ver = g_version;
m_locks[ver % 2].lock();
}

unlock(word ver)
{
m_locks[ver % 2].unlock();
}
}
___________________

The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version

for all threads as t
{
t.m_locks[ver % 2].lock();
t.m_locks[ver % 2].unlock();
}
___________________

After that, it knew the previous generation was completed.

It was just a way for using a mutex to get distributed proxy like behavior.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Tue Oct 29 07:23:56 2024

From Newsgroup: comp.lang.c++

On 10/29/24 00:35, Chris M. Thomasson wrote:

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);

         std::atomic_signal_fence(std::memory_order_acquire);

^^^^^^^^^^^^^^^^^^^^^^

     }

Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).

No thread fence is necessary. The loads can move before
the store. They just can't move before the async
membar. After that membar any previously retired
objects are no longer reachable.

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Tue Oct 29 07:27:40 2024

From Newsgroup: comp.lang.c++

On 10/29/24 00:38, Chris M. Thomasson wrote:

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);
         std::atomic_signal_fence(std::memory_order_acquire);
     }

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

Only your single polling thread can mutate the shadow_epoch, right?

Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 14:51:58 2024

From Newsgroup: comp.lang.c++

On 10/29/2024 4:23 AM, jseigh wrote:

On 10/29/24 00:35, Chris M. Thomasson wrote:

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);

         std::atomic_signal_fence(std::memory_order_acquire);

^^^^^^^^^^^^^^^^^^^^^^

     }

Still don't know how your pure C++ write up can handle this without an
std::atomic_thread_fence(std::memory_order_acquire).

No thread fence is necessary. The loads can move before
the store. They just can't move before the async
membar. After that membar any previously retired
objects are no longer reachable.

Ahhhh. I thought your C++ version is going to be one that did not use an asymmetric membar in order to make it so called "100%" pure... Not sure
about C++ including one in the actual standard. If they did I think it
would have some info about it, perhaps akin to:

https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

Well, even then if an asymmetric was not available on the arch the code
is being compiled for, it can say, well, shit happens! Then allow you to
fall back to another version...?

Also, the C++ wording of this is interesting:

https://en.cppreference.com/w/cpp/atomic/atomic/is_always_lock_free

wrt:
_____________
0 for the built-in atomic types that are never lock-free,
1 for the built-in atomic types that are sometimes lock-free,
2 for the built-in atomic types that are always lock-free.
_____________

That's fun... ;^)

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 14:56:16 2024

From Newsgroup: comp.lang.c++

On 10/29/2024 4:27 AM, jseigh wrote:

On 10/29/24 00:38, Chris M. Thomasson wrote:

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);
         std::atomic_signal_fence(std::memory_order_acquire);
     }

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

Only your single polling thread can mutate the shadow_epoch, right?

Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.

Are you taking advantage of the fancy alignment capabilities of C++?

https://en.cppreference.com/w/cpp/language/alignas

and friends? They seem to work fine wrt the last time I checked them.

It's nice to have a standard way to pad and align on cache line
boundaries. :^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 15:05:47 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

On 10/28/2024 6:17 PM, jseigh wrote:

On 10/28/24 17:57, Chris M. Thomasson wrote:

On 10/28/2024 4:45 AM, jseigh wrote:

On 10/28/24 00:02, Chris M. Thomasson wrote:

On 10/27/2024 5:35 PM, jseigh wrote:

On 10/27/24 18:32, Chris M. Thomasson wrote:

The membar version? That's a store/load membar so it is expensive. >>>>>

I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when
I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers
and things like that.

I take it that your pure C++ version has no atomic RMW, right? Just >>>>> loads and stores?

While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.

Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and
some others along those lines, so to speak. This is for the store and
load version. Now, RMW on x86 basically implies a StoreLoad wrt the
LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
instance the original SMR algo requires a storeload as is on x86/x64.
MFENCE or LOCK prefix.

Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
themselves. This is a lot different than using stores and loads. The
original SMR and Peterson's algo needs that "store followed by a load
to a different location" action to hold true, aka, storeload...

Now, I don't think that a data-dependant load can act like a
storeload. I thought that they act sort of like an acquire, aka
#LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
dependencies. Now, the DEC Alpha is a different story... ;^)

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);
         std::atomic_signal_fence(std::memory_order_acquire);
     }

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

[...]

Humm... I am not sure if it would work with just the release. The
polling thread would read from these per thread epochs, _ref_epoch,
using an acquire barrier? Still. Not sure if that would work. Need to
put my thinking cap on. ;^)

Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work. I
don't see why it would not work.

For some reason, I thought you were going to not use an async membar in
your C++ version. Sorry. However, it still would be fun to test
against... ;^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:40:12 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC.   I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
per_thread
{
    std::mutex m_locks[2];

    lock()
    {
        word ver = g_version;
        m_locks[ver % 2].lock();
    }

    unlock(word ver)
    {
        m_locks[ver % 2].unlock();
    }
}
___________________

The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version

for all threads as t
{
   t.m_locks[ver % 2].lock();
   t.m_locks[ver % 2].unlock();
}
___________________

After that, it knew the previous generation was completed.

It was just a way for using a mutex to get distributed proxy like behavior.

There are fun things to do here. A thread can do an unlock lock cycle
ever often, say 1000 iterations. The fun part is that this can beat a
read write lock.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:42:03 2024

From Newsgroup: comp.lang.c++

On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC.   I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
per_thread
{
    std::mutex m_locks[2];

    lock()
    {
        word ver = g_version;
        m_locks[ver % 2].lock();
    }

    unlock(word ver)
    {
        m_locks[ver % 2].unlock();
    }

Oops! lock() should return the version then pass it on to unlock. Sorry
for missing that in my pseudo-code. ;^o

}
___________________

The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version

for all threads as t
{
   t.m_locks[ver % 2].lock();
   t.m_locks[ver % 2].unlock();
}
___________________

After that, it knew the previous generation was completed.

It was just a way for using a mutex to get distributed proxy like behavior.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:53:19 2024

From Newsgroup: comp.lang.c++

On 10/29/2024 11:42 PM, Chris M. Thomasson wrote:

On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC.   I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

For some reason you made me think of another very simple proxy
technique using per thread mutexes. It was an experiment a while back:
___________________
per_thread
{
     std::mutex m_locks[2];

     lock()
     {
         word ver = g_version;
         m_locks[ver % 2].lock();
     }

     unlock(word ver)
     {
         m_locks[ver % 2].unlock();
     }

Oops! lock() should return the version then pass it on to unlock. Sorry
for missing that in my pseudo-code. ;^o

}
___________________

The polling thread would increase the g_version counter then lock and
unlock all of the threads previous locks. Iirc, it worked way better
than a read write lock for sure. Basically:
___________________
word ver = g_version.inc(); // ver is the previous version

for all threads as t
{
    t.m_locks[ver % 2].lock();
    t.m_locks[ver % 2].unlock();
}
___________________

After that, it knew the previous generation was completed.

It was just a way for using a mutex to get distributed proxy like
behavior.

Iirc, I still had to use a defer list where the dtors were called for
the nodes.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:54:33 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

Joe, can you call dtors for nodes after a single epoch?
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:36:45 2024

From Newsgroup: comp.lang.c++

On 10/29/24 17:56, Chris M. Thomasson wrote:

On 10/29/2024 4:27 AM, jseigh wrote:

Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.

Are you taking advantage of the fancy alignment capabilities of C++?

https://en.cppreference.com/w/cpp/language/alignas

and friends? They seem to work fine wrt the last time I checked them.

It's nice to have a standard way to pad and align on cache line
boundaries. :^)

It's target processor dependent. You need to query the cache line size
and pass to compile as a define variable.

There's supposed to be built in define that would be target system
dependent but it's not implemented.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:39:38 2024

From Newsgroup: comp.lang.c++

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work. I don't see why it would not work.

For some reason, I thought you were going to not use an async membar in
your C++ version. Sorry. However, it still would be fun to test
against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:43:09 2024

From Newsgroup: comp.lang.c++

On 10/30/24 02:54, Chris M. Thomasson wrote:

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

Joe, can you call dtors for nodes after a single epoch?

Yes since I can check that epochs are being referenced
directly instead of indirectly via quiescent states.
It's the inferring from events vs. conditions thing.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Nov 1 12:20:39 2024

From Newsgroup: comp.lang.c++

On 10/30/2024 9:36 AM, jseigh wrote:

On 10/29/24 17:56, Chris M. Thomasson wrote:

On 10/29/2024 4:27 AM, jseigh wrote:

Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.

Are you taking advantage of the fancy alignment capabilities of C++?

https://en.cppreference.com/w/cpp/language/alignas

and friends? They seem to work fine wrt the last time I checked them.

It's nice to have a standard way to pad and align on cache line
boundaries. :^)

It's target processor dependent. You need to query the cache line size
and pass to compile as a define variable.

There's supposed to be built in define that would be target system
dependent but it's not implemented.

What about:

https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size

?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Nov 2 13:50:51 2024

From Newsgroup: comp.lang.c++

On 10/17/2024 5:10 AM, jseigh wrote:

I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.

It's way faster now.

It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.

No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.

Joe Seigh

Another crude experiment I was testing out back in the day. Heart beat
for per threads, wrt a version that does not use asymmetric membars to
test against:

per_thread
{
word cur = 0;

void beat()
{
word global = g_version;
if (cur == global) return;
store_seq_cst(&cur, global);
}
}

Well, this case requires worker threads to beat every now and then when
they are outside of critical sections. Just a test case against my
asymmetric versions. The _single_ poll thread would increment the
g_version then check for threads that were equal to the new version.
This was a quiescent period when all threads went through it.

Of course the asymmetric version beat it, but it was fun to test against anyway.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Nov 3 21:14:37 2024

From Newsgroup: comp.lang.c++

On 10/30/2024 9:39 AM, jseigh wrote:

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work.
I don't see why it would not work.

For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.

I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Nov 4 07:46:37 2024

From Newsgroup: comp.lang.c++

On 11/4/24 00:14, Chris M. Thomasson wrote:

On 10/30/2024 9:39 AM, jseigh wrote:

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.

For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.

I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)

That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.

Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Muttley@Muttley@DastartdlyHQ.org to comp.lang.c++ on Mon Nov 4 14:09:14 2024

From Newsgroup: comp.lang.c++

On Mon, 4 Nov 2024 07:46:37 -0500
jseigh <jseigh_es00@xemaps.com> boring babbled:

On 11/4/24 00:14, Chris M. Thomasson wrote:

On 10/30/2024 9:39 AM, jseigh wrote:

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.

For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.

I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a
non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)

That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.

Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.

Given most concurrent operating systems are written in these "imperfect" languages how does that square with your definition? And how would your perfect language run on them?

Anyway, concurrency is the job of the OS, not the language. C++ threading is just a wrapper around pthreads on *nix and windows threads on Windows. The language just needs an interface to the underlying OS functionality, it should not try and implement the functionality itself as it'll always be a hack.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Nov 4 12:42:53 2024

From Newsgroup: comp.lang.c++

On 11/4/2024 6:09 AM, Muttley@DastartdlyHQ.org wrote:

On Mon, 4 Nov 2024 07:46:37 -0500
jseigh <jseigh_es00@xemaps.com> boring babbled:

On 11/4/24 00:14, Chris M. Thomasson wrote:

On 10/30/2024 9:39 AM, jseigh wrote:

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version, >>>>> then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.

For some reason, I thought you were going to not use an async membar >>>>> in your C++ version. Sorry. However, it still would be fun to test
against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.

I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a
non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)

That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.

Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.

Given most concurrent operating systems are written in these "imperfect" languages how does that square with your definition? And how would your perfect language run on them?

Anyway, concurrency is the job of the OS, not the language. C++ threading is just a wrapper around pthreads on *nix and windows threads on Windows. The language just needs an interface to the underlying OS functionality, it should
not try and implement the functionality itself as it'll always be a hack.

A start would be C++ having an "always lock free" CAS for two contiguous
words on systems that support it, yes, even 64 bit. ala:

struct anchor {
word a;
word b;
};

cmpxchg8b for x86, cmpxchg16b for x64, ect...

https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Nov 9 13:51:26 2024

From Newsgroup: comp.lang.c++

On 11/4/2024 12:42 PM, Chris M. Thomasson wrote:

On 11/4/2024 6:09 AM, Muttley@DastartdlyHQ.org wrote:

On Mon, 4 Nov 2024 07:46:37 -0500
jseigh <jseigh_es00@xemaps.com> boring babbled:

On 11/4/24 00:14, Chris M. Thomasson wrote:

On 10/30/2024 9:39 AM, jseigh wrote:

On 10/29/24 18:05, Chris M. Thomasson wrote:

On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:

Ahhh, if you are using an async membar in your upcoming C++ version, >>>>>> then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.

For some reason, I thought you were going to not use an async membar >>>>>> in your C++ version. Sorry. However, it still would be fun to test >>>>>> against... ;^)

The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.

I got side tracked with more heavy math. The problem with C++ code that >>>> uses an async memory barrier is that its automatically rendered into a >>>> non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us >>>> at least! ;^)

That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.

Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.

Given most concurrent operating systems are written in these "imperfect"
languages how does that square with your definition? And how would your
perfect language run on them?

Anyway, concurrency is the job of the OS, not the language. C++
threading is
just a wrapper around pthreads on *nix and windows threads on Windows.
The
language just needs an interface to the underlying OS functionality,
it should
not try and implement the functionality itself as it'll always be a hack.

A start would be C++ having an "always lock free" CAS for two contiguous words on systems that support it, yes, even 64 bit. ala:

struct anchor {
word a;
word b;
};

Even better:

struct anchor {
void* a;
word b;
};

where sizeof(void*) = sizeof(word)... ;^)

cmpxchg8b for x86, cmpxchg16b for x64, ect...

https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Grey Gamer
  Thu Nov 21 07:37:11 2024
  from Show Low, Az via Telnet
- Microbot
  Thu Nov 21 03:10:00 2024
  from Moore, Ok via Telnet
- Winston
  Wed Nov 20 09:30:02 2024
  from Kerrville, Tx via SSH
- Microbot
  Wed Nov 20 05:27:23 2024
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	991
Nodes:	10 (1 / 9)
Uptime:	75:29:29
Calls:	12,948
Calls today:	2
Files:	186,574
Messages:	3,264,524

smrproxy v2

Who's Online

Recent Visitors

System Info