• smrproxy v2

    From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Thu Oct 17 08:10:20 2024
    From Newsgroup: comp.lang.c++

    I replaced the hazard pointer logic in smrproxy. It's now wait-free
    instead of mostly wait-free. The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store. The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC. I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm. I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    Joe Seigh



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 13:10:43 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Thu Oct 17 17:08:04 2024
    From Newsgroup: comp.lang.c++

    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Joe Seigh
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 16:40:00 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a distributed
    seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 17:16:55 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 4:40 PM, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?

    I remember a long time ago I was messing around where each thread had
    two version counters:

    pseudo code:

    per_thread
    {
    word m_version[2];


    word acquire()
    {
    word ver = load(global_version);
    m_version[ver % 2] = ver ;
    return ver ;
    }


    void release(word ver)
    {
    m_version[ver % 2] = 0;
    }
    }


    The global_version would only be incremented by the polling thread. This
    was WAY back. I think I might of posted about it on cpt.

    So, when a node was made unreachable, it would be included in the
    polling logic. The polling could increment the version counter then wait
    for all the threads prior m_versions to be zero. Collect the current generation of objects in a defer list. Then on the next cycle it would increment the version counter, wait until all threads prior versions
    were zero, then delete the defer count, and transfer the current gen to
    the defer.

    It went something like that.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Oct 17 17:24:08 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 5:16 PM, Chris M. Thomasson wrote:
    On 10/17/2024 4:40 PM, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free >>>>> instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a distributed
    seqlock for some reason. Are you using an asymmetric membar in here?
    in smr_poll ?

    I remember a long time ago I was messing around where each thread had
    two version counters:

    pseudo code:

    per_thread
    {
        word m_version[2];


        word acquire()
        {
            word ver = load(global_version);
            m_version[ver % 2] = ver ;
            return ver ;
        }


        void release(word ver)
        {
            m_version[ver % 2] = 0;
        }
    }


    The global_version would only be incremented by the polling thread. This
    was WAY back. I think I might of posted about it on cpt.

    So, when a node was made unreachable, it would be included in the
    polling logic. The polling could increment the version counter then wait
    for all the threads prior m_versions to be zero. Collect the current generation of objects in a defer list. Then on the next cycle it would increment the version counter, wait until all threads prior versions
    were zero, then delete the defer count, and transfer the current gen to
    the defer.

    It went something like that.


    Iirc, I was using FlushProcessWriteBuffers back then for an asymmetric
    barrier for my experiment. The polling thread would execute one after it increased the global version. Actualy, I can't remember where I placed
    it exactly, after or before. The defer list made things work.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Fri Oct 18 08:07:11 2024
    From Newsgroup: comp.lang.c++

    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region
    is 3 instructions unless you use atomics which are expensive and have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer
    logic or qsbr logic as a config option, but the extra code complexity
    and the fact that qsbr required 2 grace periods kind of made that
    unfeasible. The qsbr logic was mostly ripped out but there were still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy. There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Joe Seigh



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Oct 25 15:00:15 2024
    From Newsgroup: comp.lang.c++

    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free >>>>> instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a distributed
    seqlock for some reason. Are you using an asymmetric membar in here?
    in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region
    is 3 instructions unless you use atomics which are expensive and have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer
    logic or qsbr logic as a config option, but the extra code complexity
    and the fact that qsbr required 2 grace periods kind of made that unfeasible.  The qsbr logic was mostly ripped out but there were still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get back to
    you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Fri Oct 25 18:56:16 2024
    From Newsgroup: comp.lang.c++

    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free >>>>>> instead of mostly wait-free.  The reader lock logic after loading >>>>>> the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>> it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric
    membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region
    is 3 instructions unless you use atomics which are expensive and have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer
    logic or qsbr logic as a config option, but the extra code complexity
    and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem. The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Joe Seigh
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 12:33:46 2024
    From Newsgroup: comp.lang.c++

    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free >>>>>>> instead of mostly wait-free.  The reader lock logic after loading >>>>>>> the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>> it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens. >>>>>>

    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric
    membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region
    is 3 instructions unless you use atomics which are expensive and have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer
    logic or qsbr logic as a config option, but the extra code complexity
    and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get back
    to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric membar?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Sun Oct 27 18:29:43 2024
    From Newsgroup: comp.lang.c++

    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait- >>>>>>>> free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>> instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>> it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent. >>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>

    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric
    membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region >>>> is 3 instructions unless you use atomics which are expensive and have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer
    logic or qsbr logic as a config option, but the extra code complexity
    and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were still >>>> some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get back
    to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric membar?

    I don't think so. It's platform dependent. Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers. This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    Joe Seigh

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 15:32:33 2024
    From Newsgroup: comp.lang.c++

    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>

    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric
    membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
    Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    Ahh, nice! acquire/release, no seq_cst, right? ;^)

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Sun Oct 27 20:35:54 2024
    From Newsgroup: comp.lang.c++

    On 10/27/24 18:32, Chris M. Thomasson wrote:
    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit
    happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical
    region
    is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
    Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were >>>>>> still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy.  There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    Ahh, nice! acquire/release, no seq_cst, right? ;^)


    The membar version? That's a store/load membar so it is expensive.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:02:58 2024
    From Newsgroup: comp.lang.c++

    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:
    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after >>>>>>>>>>> loading
    the address of the reader lock object into a register is now 2 >>>>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on >>>>>>>>>>> porting
    it to c++ and don't want to waste any more time on c version. >>>>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit >>>>>>>>>> happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric >>>>>>>> membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical >>>>>>> region
    is 3 instructions unless you use atomics which are expensive and >>>>>>> have
    memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>>> In this case I am slingshotting off of the asymmetric memory
    barrier.

    Earlier at one point I was going to have smrproxy use hazard pointer >>>>>>> logic or qsbr logic as a config option, but the extra code
    complexity
    and the fact that qsbr required 2 grace periods kind of made that >>>>>>> unfeasible.  The qsbr logic was mostly ripped out but there were >>>>>>> still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work >>>>>>> besides just rewriting smrproxy.  There coming up with an api for >>>>>>> proxies and testcases which tend to be more work than the code that >>>>>>> they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in >>>>> part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    Ahh, nice! acquire/release, no seq_cst, right? ;^)


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I say
    C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
    like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:08:23 2024
    From Newsgroup: comp.lang.c++

    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>>

    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric
    membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical region >>>>> is 3 instructions unless you use atomics which are expensive and have >>>>> memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look
    somewhat similar so you have to know how the reclaim logic uses it.
    In this case I am slingshotting off of the asymmetric memory barrier. >>>>>
    Earlier at one point I was going to have smrproxy use hazard pointer >>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>> and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were still >>>>> some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work
    besides just rewriting smrproxy.  There coming up with an api for
    proxies and testcases which tend to be more work than the code that
    they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    I should get back into one of my older proxy algorithms. Things are
    mostly wait free, if XADD can be wait free itself. No CAS in sight. I
    just found an older version I posted.. Almost forgot I make this post:

    https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

    https://pastebin.com/raw/nPVYXbWM
    (raw text, no ad bullshit)



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:09:54 2024
    From Newsgroup: comp.lang.c++

    On 10/27/2024 9:08 PM, Chris M. Thomasson wrote:
    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit
    happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical
    region
    is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
    Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were >>>>>> still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy.  There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    I should get back into one of my older proxy algorithms. Things are
    mostly wait free, if XADD can be wait free itself. No CAS in sight. I
    just found an older version I posted.. Almost forgot I make this post:

    https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

    https://pastebin.com/raw/nPVYXbWM
    (raw text, no ad bullshit)




    It beats a read write lock, but it has trouble beating one that does not
    use any atomic RMW's in the fast path ala:
    ___________________
    collector& acquire()
    {
    // increment the master count _and_ obtain current collector.
    std::uint32_t current =
    m_current.fetch_add(ct_ref_inc, std::memory_order_acquire);

    // decode the collector index.
    return m_collectors[current & ct_proxy_mask];
    }

    void release(collector& c)
    {
    // decrement the collector.
    std::uint32_t count =
    c.m_count.fetch_sub(ct_ref_inc, std::memory_order_release);

    // check for the completion of the quiescence process.
    if ((count & ct_ref_mask) == ct_ref_complete)
    {
    // odd reference count and drop-to-zero condition detected!
    prv_quiesce_complete(c);
    }
    }
    ___________________
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Oct 27 21:13:05 2024
    From Newsgroup: comp.lang.c++

    On 10/27/2024 9:08 PM, Chris M. Thomasson wrote:
    On 10/27/2024 3:29 PM, jseigh wrote:
    On 10/27/24 15:33, Chris M. Thomasson wrote:
    On 10/25/2024 3:56 PM, jseigh wrote:
    On 10/25/24 18:00, Chris M. Thomasson wrote:
    On 10/18/2024 5:07 AM, jseigh wrote:
    On 10/17/24 19:40, Chris M. Thomasson wrote:
    On 10/17/2024 2:08 PM, jseigh wrote:
    On 10/17/24 16:10, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now >>>>>>>>>> wait- free
    instead of mostly wait-free.  The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store.  The unlock is same >>>>>>>>>> as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
    No idea of it's a new algorithm.  I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.

    I have to take a look at it! Been really busy lately. Shit
    happens.


    There's a quick and dirty explanation at
    http://threadnought.wordpress.com/

    repo at https://github.com/jseigh/smrproxy

    I'll need to create some memory access diagrams that
    visualize how it works at some point.

    Anyway if it's new, another algorithm to use without
    attribution.

    Interesting. From a quick view, it kind of reminds me of a
    distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?

    Yes, linux membarrier() in smr_poll.

    Not seqlock, not least for the reason that exiting the critical
    region
    is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.

    A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
    Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
    unfeasible.  The qsbr logic was mostly ripped out but there were >>>>>> still
    some pieces there.

    Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy.  There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.

    Damn! I almost missed this post! Fucking Thunderbird... Will get
    back to you. Working on something else right now Joe, thanks.

    https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/


    No problem.  The c++ work is progressing pretty slowly, not least in
    part because the documentation is not always clear as to what
    something does or even what problem it is supposed to solve.
    To think I took a pass on on rust because I though it was
    more complicated than it needed to be.

    Never even tried Rust, shit, I am behind the times. ;^)

    Humm... I don't think we can get 100% C++ because of the damn
    asymmetric membar for these rather "specialized" algorithms?

    Is C++ thinking about creating a standard way to gain an asymmetric
    membar?

    I don't think so.  It's platform dependent.  Apart from linux, mostly
    it's done with a call to some virtual memory function that flushes
    the TLBs (translation lookaside buffers) which involves IPI calls
    to all the processors and those have memory barriers.  This is
    old, 1973, patent 3,947,823 cited by the patent I did.

    Anyway, I version the code so there's a asymmetric memory barrier
    version and an explicit memory barrier version, the latter
    being much slower.

    I should get back into one of my older proxy algorithms. Things are
    mostly wait free, if XADD can be wait free itself. No CAS in sight. I
    just found an older version I posted.. Almost forgot I make this post:

    https://groups.google.com/g/comp.lang.c++/c/FBqOMvqWpR4/m/bDZZLUmAAgAJ

    https://pastebin.com/raw/nPVYXbWM
    (raw text, no ad bullshit)




    then we can do something like a per-reader-thread incrementing its own
    version counter on every say 1000 reads and do a store followed by a
    release membar to allow the polling thread to pick it up via an acquire
    load. This is in pure C++, no asymmetric magic here at all.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Oct 28 07:45:15 2024
    From Newsgroup: comp.lang.c++

    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
    like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    For cmpxchg, it has full cst_seq. For other rmw atomics I don't
    know. I have to ask on c.a. I think some data dependency and/or
    control dependency might factor in.

    Joe Seigh
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 14:57:23 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and some
    others along those lines, so to speak. This is for the store and load
    version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
    prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO
    mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
    This is a lot different than using stores and loads. The original SMR
    and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a storeload.
    I thought that they act sort of like an acquire, aka #LoadStore |
    #LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
    the DEC Alpha is a different story... ;^)




    For cmpxchg, it has full cst_seq.  For other rmw atomics I don't
    know.  I have to ask on c.a.  I think some data dependency and/or
    control dependency might factor in.

    Joe Seigh

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 15:09:39 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 2:57 PM, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
    prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.

    The fun part is that since its pure C++, those membars are going to be
    emitted for me no matter what arch it's compiling for.



    This is a lot different than using stores and loads. The original SMR
    and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a storeload.
    I thought that they act sort of like an acquire, aka #LoadStore |
    #LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
    the DEC Alpha is a different story... ;^)




    For cmpxchg, it has full cst_seq.  For other rmw atomics I don't
    know.  I have to ask on c.a.  I think some data dependency and/or
    control dependency might factor in.

    Joe Seigh


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Oct 28 21:17:55 2024
    From Newsgroup: comp.lang.c++

    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
    prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
    This is a lot different than using stores and loads. The original SMR
    and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a storeload.
    I thought that they act sort of like an acquire, aka #LoadStore |
    #LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
    the DEC Alpha is a different story... ;^)


    fwiw, here's the lock and unlock logic from smrproxy rewrite

    inline void lock()
    {
    epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
    _ref_epoch.store(_epoch, std::memory_order_relaxed);
    std::atomic_signal_fence(std::memory_order_acquire);
    }

    inline void unlock()
    {
    _ref_epoch.store(0, std::memory_order_release);
    }

    epoch_t is interesting. It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    x1 < (x1 + n)

    for any value of x1 and any value of n from 0 to 2**63;
    eg.
    0xfffffffffffffff0 < 0x0000000000000001


    The rewrite is almost complete except for some thread_local
    stuff. I think I might break off there. Most of the
    additional work is writing the test code. I'm considering
    rewriting it in Rust.

    Joe Seigh
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:35:33 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and
    some others along those lines, so to speak. This is for the store and
    load version. Now, RMW on x86 basically implies a StoreLoad wrt the
    LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
    instance the original SMR algo requires a storeload as is on x86/x64.
    MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
    fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
    RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
    themselves. This is a lot different than using stores and loads. The
    original SMR and Peterson's algo needs that "store followed by a load
    to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a
    storeload. I thought that they act sort of like an acquire, aka
    #LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
    dependencies. Now, the DEC Alpha is a different story... ;^)


    fwiw, here's the lock and unlock logic from smrproxy rewrite

        inline void lock()
        {
            epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
            _ref_epoch.store(_epoch, std::memory_order_relaxed);


            std::atomic_signal_fence(std::memory_order_acquire);
    ^^^^^^^^^^^^^^^^^^^^^^

        }

    Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).


        inline void unlock()
        {
            _ref_epoch.store(0, std::memory_order_release);
        }






    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

        x1 < (x1 + n)

    for any value of x1 and any value of n from 0 to 2**63;
    eg.
       0xfffffffffffffff0 < 0x0000000000000001

    For some reason it reminds me of your atomic 63 bit counter from a while
    back. If your smr_poll thread is the only thread that can increment the
    epoch, then it will be able to detect rolling to zero, right? For one of
    my older proxys the thread would have two version counters:

    per_thread
    {
    word m_ver[2];


    lock()
    {
    word ver = g_version;
    m_ver[ver % 2] = ver;
    }

    unlock(word ver)
    {
    m_ver[ver % 2] = 0;
    }
    }

    So, it's different logic. The only way I got around using memory
    barriers (compiler barriers aside for a moment), here was for the
    polling thread to use an asymmetric membar. For a pure C++ version, I
    think it would require:


    per_thread
    {
    word m_ver[2];


    lock()
    {
    word ver = g_version;
    m_ver[ver % 2] = ver;
    #LoadStore | #LoadLoad;
    }

    unlock(word ver)
    {
    #LoadStore | #StoreStore;
    m_ver[ver % 2] = 0;
    }
    }

    I think.... Humm...



    The rewrite is almost complete except for some thread_local
    stuff.  I think I might break off there.  Most of the
    additional work is writing the test code.  I'm considering
    rewriting it in Rust.

    Joe Seigh

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:38:09 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and
    some others along those lines, so to speak. This is for the store and
    load version. Now, RMW on x86 basically implies a StoreLoad wrt the
    LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
    instance the original SMR algo requires a storeload as is on x86/x64.
    MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
    fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
    RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
    themselves. This is a lot different than using stores and loads. The
    original SMR and Peterson's algo needs that "store followed by a load
    to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a
    storeload. I thought that they act sort of like an acquire, aka
    #LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
    dependencies. Now, the DEC Alpha is a different story... ;^)


    fwiw, here's the lock and unlock logic from smrproxy rewrite

        inline void lock()
        {
            epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
            _ref_epoch.store(_epoch, std::memory_order_relaxed);
            std::atomic_signal_fence(std::memory_order_acquire);
        }

        inline void unlock()
        {
            _ref_epoch.store(0, std::memory_order_release);
        }

    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    Only your single polling thread can mutate the shadow_epoch, right?



        x1 < (x1 + n)

    for any value of x1 and any value of n from 0 to 2**63;
    eg.
       0xfffffffffffffff0 < 0x0000000000000001


    The rewrite is almost complete except for some thread_local
    stuff.  I think I might break off there.  Most of the
    additional work is writing the test code.  I'm considering
    rewriting it in Rust.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 21:41:12 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive.

    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when I
    say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
    things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just
    loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and
    some others along those lines, so to speak. This is for the store and
    load version. Now, RMW on x86 basically implies a StoreLoad wrt the
    LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
    instance the original SMR algo requires a storeload as is on x86/x64.
    MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
    fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
    RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
    themselves. This is a lot different than using stores and loads. The
    original SMR and Peterson's algo needs that "store followed by a load
    to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a
    storeload. I thought that they act sort of like an acquire, aka
    #LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
    dependencies. Now, the DEC Alpha is a different story... ;^)


    fwiw, here's the lock and unlock logic from smrproxy rewrite

        inline void lock()
        {
            epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
            _ref_epoch.store(_epoch, std::memory_order_relaxed);
            std::atomic_signal_fence(std::memory_order_acquire);
        }

        inline void unlock()
        {
            _ref_epoch.store(0, std::memory_order_release);
        }

    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    [...]

    Humm... I am not sure if it would work with just the release. The
    polling thread would read from these per thread epochs, _ref_epoch,
    using an acquire barrier? Still. Not sure if that would work. Need to
    put my thinking cap on. ;^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Oct 28 22:02:23 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    For some reason you made me think of another very simple proxy technique
    using per thread mutexes. It was an experiment a while back: ___________________
    per_thread
    {
    std::mutex m_locks[2];

    lock()
    {
    word ver = g_version;
    m_locks[ver % 2].lock();
    }

    unlock(word ver)
    {
    m_locks[ver % 2].unlock();
    }
    }
    ___________________

    The polling thread would increase the g_version counter then lock and
    unlock all of the threads previous locks. Iirc, it worked way better
    than a read write lock for sure. Basically:
    ___________________
    word ver = g_version.inc(); // ver is the previous version

    for all threads as t
    {
    t.m_locks[ver % 2].lock();
    t.m_locks[ver % 2].unlock();
    }
    ___________________

    After that, it knew the previous generation was completed.

    It was just a way for using a mutex to get distributed proxy like behavior.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Tue Oct 29 07:23:56 2024
    From Newsgroup: comp.lang.c++

    On 10/29/24 00:35, Chris M. Thomasson wrote:
    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:


    fwiw, here's the lock and unlock logic from smrproxy rewrite

         inline void lock()
         {
             epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
             _ref_epoch.store(_epoch, std::memory_order_relaxed);


             std::atomic_signal_fence(std::memory_order_acquire);
    ^^^^^^^^^^^^^^^^^^^^^^

         }

    Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).

    No thread fence is necessary. The loads can move before
    the store. They just can't move before the async
    membar. After that membar any previously retired
    objects are no longer reachable.



         inline void unlock()
         {
             _ref_epoch.store(0, std::memory_order_release);
         }

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Tue Oct 29 07:27:40 2024
    From Newsgroup: comp.lang.c++

    On 10/29/24 00:38, Chris M. Thomasson wrote:
    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:


    fwiw, here's the lock and unlock logic from smrproxy rewrite

         inline void lock()
         {
             epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
             _ref_epoch.store(_epoch, std::memory_order_relaxed);
             std::atomic_signal_fence(std::memory_order_acquire);
         }

         inline void unlock()
         {
             _ref_epoch.store(0, std::memory_order_release);
         }

    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    Only your single polling thread can mutate the shadow_epoch, right?


    Yes. It's just an optimization. The reader threads could read
    from the global epoch but it would be in a separate cache line
    and be an extra dependent load. So one dependent load and
    same cache line.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 14:51:58 2024
    From Newsgroup: comp.lang.c++

    On 10/29/2024 4:23 AM, jseigh wrote:
    On 10/29/24 00:35, Chris M. Thomasson wrote:
    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:


    fwiw, here's the lock and unlock logic from smrproxy rewrite

         inline void lock()
         {
             epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
             _ref_epoch.store(_epoch, std::memory_order_relaxed);


             std::atomic_signal_fence(std::memory_order_acquire);
    ^^^^^^^^^^^^^^^^^^^^^^

         }

    Still don't know how your pure C++ write up can handle this without an
    std::atomic_thread_fence(std::memory_order_acquire).

    No thread fence is necessary.  The loads can move before
    the store.  They just can't move before the async
    membar.  After that membar any previously retired
    objects are no longer reachable.

    Ahhhh. I thought your C++ version is going to be one that did not use an asymmetric membar in order to make it so called "100%" pure... Not sure
    about C++ including one in the actual standard. If they did I think it
    would have some info about it, perhaps akin to:

    https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

    Well, even then if an asymmetric was not available on the arch the code
    is being compiled for, it can say, well, shit happens! Then allow you to
    fall back to another version...?

    Also, the C++ wording of this is interesting:

    https://en.cppreference.com/w/cpp/atomic/atomic/is_always_lock_free

    wrt:
    _____________
    ​0​ for the built-in atomic types that are never lock-free,
    1 for the built-in atomic types that are sometimes lock-free,
    2 for the built-in atomic types that are always lock-free.
    _____________

    That's fun... ;^)






         inline void unlock()
         {
             _ref_epoch.store(0, std::memory_order_release);
         }


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 14:56:16 2024
    From Newsgroup: comp.lang.c++

    On 10/29/2024 4:27 AM, jseigh wrote:
    On 10/29/24 00:38, Chris M. Thomasson wrote:
    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:


    fwiw, here's the lock and unlock logic from smrproxy rewrite

         inline void lock()
         {
             epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
             _ref_epoch.store(_epoch, std::memory_order_relaxed);
             std::atomic_signal_fence(std::memory_order_acquire);
         }

         inline void unlock()
         {
             _ref_epoch.store(0, std::memory_order_release);
         }

    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    Only your single polling thread can mutate the shadow_epoch, right?


    Yes.  It's just an optimization. The reader threads could read
    from the global epoch but it would be in a separate cache line
    and be an extra dependent load.  So one dependent load and
    same cache line.

    Are you taking advantage of the fancy alignment capabilities of C++?

    https://en.cppreference.com/w/cpp/language/alignas

    and friends? They seem to work fine wrt the last time I checked them.

    It's nice to have a standard way to pad and align on cache line
    boundaries. :^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 15:05:47 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
    On 10/28/2024 6:17 PM, jseigh wrote:
    On 10/28/24 17:57, Chris M. Thomasson wrote:
    On 10/28/2024 4:45 AM, jseigh wrote:
    On 10/28/24 00:02, Chris M. Thomasson wrote:
    On 10/27/2024 5:35 PM, jseigh wrote:
    On 10/27/24 18:32, Chris M. Thomasson wrote:


    The membar version?  That's a store/load membar so it is expensive. >>>>>
    I was wondering in your c++ version if you had to use any seq_cst
    barriers. I think acquire/release should be good enough. Now, when
    I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers
    and things like that.

    I take it that your pure C++ version has no atomic RMW, right? Just >>>>> loads and stores?

    While a lock action has acquire memory order semantics, if the
    implementation has internal stores, you have to those stores
    are complete before any access from the critical section.
    So you may need a store/load memory barrier.

    Wrt acquiring a lock the only class of mutex logic that comes to mind
    that requires an explicit storeload style membar is Petersons, and
    some others along those lines, so to speak. This is for the store and
    load version. Now, RMW on x86 basically implies a StoreLoad wrt the
    LOCK prefix, XCHG aside for it has an implied LOCK prefix. For
    instance the original SMR algo requires a storeload as is on x86/x64.
    MFENCE or LOCK prefix.

    Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic
    fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in
    RMO mode. On x86, the LOCK prefix handles that wrt the RMW's
    themselves. This is a lot different than using stores and loads. The
    original SMR and Peterson's algo needs that "store followed by a load
    to a different location" action to hold true, aka, storeload...

    Now, I don't think that a data-dependant load can act like a
    storeload. I thought that they act sort of like an acquire, aka
    #LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data-
    dependencies. Now, the DEC Alpha is a different story... ;^)


    fwiw, here's the lock and unlock logic from smrproxy rewrite

         inline void lock()
         {
             epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
             _ref_epoch.store(_epoch, std::memory_order_relaxed);
             std::atomic_signal_fence(std::memory_order_acquire);
         }

         inline void unlock()
         {
             _ref_epoch.store(0, std::memory_order_release);
         }

    epoch_t is interesting.  It's uint64_t but handles wrapped
    compares, ie. for an epoch_t x1 and uint64_t n

    [...]

    Humm... I am not sure if it would work with just the release. The
    polling thread would read from these per thread epochs, _ref_epoch,
    using an acquire barrier? Still. Not sure if that would work. Need to
    put my thinking cap on. ;^)

    Ahhh, if you are using an async membar in your upcoming C++ version,
    then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will work. I
    don't see why it would not work.

    For some reason, I thought you were going to not use an async membar in
    your C++ version. Sorry. However, it still would be fun to test
    against... ;^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:40:12 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
    per_thread
    {
        std::mutex m_locks[2];

        lock()
        {
            word ver = g_version;
            m_locks[ver % 2].lock();
        }

        unlock(word ver)
        {
            m_locks[ver % 2].unlock();
        }
    }
    ___________________

    The polling thread would increase the g_version counter then lock and
    unlock all of the threads previous locks. Iirc, it worked way better
    than a read write lock for sure. Basically:
    ___________________
    word ver = g_version.inc(); // ver is the previous version

    for all threads as t
    {
       t.m_locks[ver % 2].lock();
       t.m_locks[ver % 2].unlock();
    }
    ___________________

    After that, it knew the previous generation was completed.

    It was just a way for using a mutex to get distributed proxy like behavior.

    There are fun things to do here. A thread can do an unlock lock cycle
    ever often, say 1000 iterations. The fun part is that this can beat a
    read write lock.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:42:03 2024
    From Newsgroup: comp.lang.c++

    On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    For some reason you made me think of another very simple proxy technique using per thread mutexes. It was an experiment a while back: ___________________
    per_thread
    {
        std::mutex m_locks[2];

        lock()
        {
            word ver = g_version;
            m_locks[ver % 2].lock();
        }

        unlock(word ver)
        {
            m_locks[ver % 2].unlock();
        }

    Oops! lock() should return the version then pass it on to unlock. Sorry
    for missing that in my pseudo-code. ;^o



    }
    ___________________

    The polling thread would increase the g_version counter then lock and
    unlock all of the threads previous locks. Iirc, it worked way better
    than a read write lock for sure. Basically:
    ___________________
    word ver = g_version.inc(); // ver is the previous version

    for all threads as t
    {
       t.m_locks[ver % 2].lock();
       t.m_locks[ver % 2].unlock();
    }
    ___________________

    After that, it knew the previous generation was completed.

    It was just a way for using a mutex to get distributed proxy like behavior.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:53:19 2024
    From Newsgroup: comp.lang.c++

    On 10/29/2024 11:42 PM, Chris M. Thomasson wrote:
    On 10/28/2024 10:02 PM, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    For some reason you made me think of another very simple proxy
    technique using per thread mutexes. It was an experiment a while back:
    ___________________
    per_thread
    {
         std::mutex m_locks[2];

         lock()
         {
             word ver = g_version;
             m_locks[ver % 2].lock();
         }

         unlock(word ver)
         {
             m_locks[ver % 2].unlock();
         }

    Oops! lock() should return the version then pass it on to unlock. Sorry
    for missing that in my pseudo-code. ;^o



    }
    ___________________

    The polling thread would increase the g_version counter then lock and
    unlock all of the threads previous locks. Iirc, it worked way better
    than a read write lock for sure. Basically:
    ___________________
    word ver = g_version.inc(); // ver is the previous version

    for all threads as t
    {
        t.m_locks[ver % 2].lock();
        t.m_locks[ver % 2].unlock();
    }
    ___________________

    After that, it knew the previous generation was completed.

    It was just a way for using a mutex to get distributed proxy like
    behavior.


    Iirc, I still had to use a defer list where the dtors were called for
    the nodes.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Oct 29 23:54:33 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    Joe, can you call dtors for nodes after a single epoch?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:36:45 2024
    From Newsgroup: comp.lang.c++

    On 10/29/24 17:56, Chris M. Thomasson wrote:
    On 10/29/2024 4:27 AM, jseigh wrote:


    Yes.  It's just an optimization. The reader threads could read
    from the global epoch but it would be in a separate cache line
    and be an extra dependent load.  So one dependent load and
    same cache line.

    Are you taking advantage of the fancy alignment capabilities of C++?

    https://en.cppreference.com/w/cpp/language/alignas

    and friends? They seem to work fine wrt the last time I checked them.

    It's nice to have a standard way to pad and align on cache line
    boundaries. :^)

    It's target processor dependent. You need to query the cache line size
    and pass to compile as a define variable.

    There's supposed to be built in define that would be target system
    dependent but it's not implemented.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:39:38 2024
    From Newsgroup: comp.lang.c++

    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version,
    then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will work. I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar in
    your C++ version. Sorry. However, it still would be fun to test
    against... ;^)

    The C version has both versions. The C++ version does only the
    async member version. But I'm not publishing that code so it's
    a moot point.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Wed Oct 30 12:43:09 2024
    From Newsgroup: comp.lang.c++

    On 10/30/24 02:54, Chris M. Thomasson wrote:
    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    Joe, can you call dtors for nodes after a single epoch?

    Yes since I can check that epochs are being referenced
    directly instead of indirectly via quiescent states.
    It's the inferring from events vs. conditions thing.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Nov 1 12:20:39 2024
    From Newsgroup: comp.lang.c++

    On 10/30/2024 9:36 AM, jseigh wrote:
    On 10/29/24 17:56, Chris M. Thomasson wrote:
    On 10/29/2024 4:27 AM, jseigh wrote:


    Yes.  It's just an optimization. The reader threads could read
    from the global epoch but it would be in a separate cache line
    and be an extra dependent load.  So one dependent load and
    same cache line.

    Are you taking advantage of the fancy alignment capabilities of C++?

    https://en.cppreference.com/w/cpp/language/alignas

    and friends? They seem to work fine wrt the last time I checked them.

    It's nice to have a standard way to pad and align on cache line
    boundaries. :^)

    It's target processor dependent.  You need to query the cache line size
    and pass to compile as a define variable.

    There's supposed to be built in define that would be target system
    dependent but it's not implemented.

    What about:

    https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size

    ?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Nov 2 13:50:51 2024
    From Newsgroup: comp.lang.c++

    On 10/17/2024 5:10 AM, jseigh wrote:
    I replaced the hazard pointer logic in smrproxy.  It's now wait-free
    instead of mostly wait-free.  The reader lock logic after loading
    the address of the reader lock object into a register is now 2
    instructions a load followed by a store.  The unlock is same
    as before, just a store.

    It's way faster now.

    It's on the feature/003 branch as a POC.   I'm working on porting
    it to c++ and don't want to waste any more time on c version.

    No idea of it's a new algorithm.  I suspect that since I use
    the term epoch that it will be claimed that it's ebr, epoch
    based reclamation, and that all ebr algorithms are equivalent.
    Though I suppose you could argue it's qsbr if I point out what
    the quiescent states are.

    Joe Seigh

    Another crude experiment I was testing out back in the day. Heart beat
    for per threads, wrt a version that does not use asymmetric membars to
    test against:

    per_thread
    {
    word cur = 0;

    void beat()
    {
    word global = g_version;
    if (cur == global) return;
    store_seq_cst(&cur, global);
    }
    }


    Well, this case requires worker threads to beat every now and then when
    they are outside of critical sections. Just a test case against my
    asymmetric versions. The _single_ poll thread would increment the
    g_version then check for threads that were equal to the new version.
    This was a quiescent period when all threads went through it.

    Of course the asymmetric version beat it, but it was fun to test against anyway.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Nov 3 21:14:37 2024
    From Newsgroup: comp.lang.c++

    On 10/30/2024 9:39 AM, jseigh wrote:
    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version,
    then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will work.
    I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar
    in your C++ version. Sorry. However, it still would be fun to test
    against... ;^)

    The C version has both versions.  The C++ version does only the
    async member version.  But I'm not publishing that code so it's
    a moot point.

    I got side tracked with more heavy math. The problem with C++ code that
    uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
    including them in some future standard. It would be nice. Well, for us
    at least! ;^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jseigh@jseigh_es00@xemaps.com to comp.lang.c++ on Mon Nov 4 07:46:37 2024
    From Newsgroup: comp.lang.c++

    On 11/4/24 00:14, Chris M. Thomasson wrote:
    On 10/30/2024 9:39 AM, jseigh wrote:
    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version,
    then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will
    work. I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar
    in your C++ version. Sorry. However, it still would be fun to test
    against... ;^)

    The C version has both versions.  The C++ version does only the
    async member version.  But I'm not publishing that code so it's
    a moot point.

    I got side tracked with more heavy math. The problem with C++ code that
    uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
    including them in some future standard. It would be nice. Well, for us
    at least! ;^)

    That's never going to happen. DWCAS has been around for more than
    50 years and c++ doesn't support that and probably never will.
    You can't write lock-free queues that are ABA free and
    are performant without that. So async memory barriers won't
    happen any time soon either.

    Long term I think c++ will fade into irrelevance along with
    all the other programming languages based on an imperfect
    knowledge of concurrency, which is basically all of them
    right now.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Muttley@Muttley@DastartdlyHQ.org to comp.lang.c++ on Mon Nov 4 14:09:14 2024
    From Newsgroup: comp.lang.c++

    On Mon, 4 Nov 2024 07:46:37 -0500
    jseigh <jseigh_es00@xemaps.com> boring babbled:
    On 11/4/24 00:14, Chris M. Thomasson wrote:
    On 10/30/2024 9:39 AM, jseigh wrote:
    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version,
    then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will
    work. I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar
    in your C++ version. Sorry. However, it still would be fun to test
    against... ;^)

    The C version has both versions.  The C++ version does only the
    async member version.  But I'm not publishing that code so it's
    a moot point.

    I got side tracked with more heavy math. The problem with C++ code that
    uses an async memory barrier is that its automatically rendered into a
    non-portable state... Yikes! Imvvvvvho, C/C++ should think about
    including them in some future standard. It would be nice. Well, for us
    at least! ;^)

    That's never going to happen. DWCAS has been around for more than
    50 years and c++ doesn't support that and probably never will.
    You can't write lock-free queues that are ABA free and
    are performant without that. So async memory barriers won't
    happen any time soon either.

    Long term I think c++ will fade into irrelevance along with
    all the other programming languages based on an imperfect
    knowledge of concurrency, which is basically all of them
    right now.

    Given most concurrent operating systems are written in these "imperfect" languages how does that square with your definition? And how would your perfect language run on them?

    Anyway, concurrency is the job of the OS, not the language. C++ threading is just a wrapper around pthreads on *nix and windows threads on Windows. The language just needs an interface to the underlying OS functionality, it should not try and implement the functionality itself as it'll always be a hack.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Nov 4 12:42:53 2024
    From Newsgroup: comp.lang.c++

    On 11/4/2024 6:09 AM, Muttley@DastartdlyHQ.org wrote:
    On Mon, 4 Nov 2024 07:46:37 -0500
    jseigh <jseigh_es00@xemaps.com> boring babbled:
    On 11/4/24 00:14, Chris M. Thomasson wrote:
    On 10/30/2024 9:39 AM, jseigh wrote:
    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version, >>>>> then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will
    work. I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar >>>>> in your C++ version. Sorry. However, it still would be fun to test
    against... ;^)

    The C version has both versions.  The C++ version does only the
    async member version.  But I'm not publishing that code so it's
    a moot point.

    I got side tracked with more heavy math. The problem with C++ code that
    uses an async memory barrier is that its automatically rendered into a
    non-portable state... Yikes! Imvvvvvho, C/C++ should think about
    including them in some future standard. It would be nice. Well, for us
    at least! ;^)

    That's never going to happen. DWCAS has been around for more than
    50 years and c++ doesn't support that and probably never will.
    You can't write lock-free queues that are ABA free and
    are performant without that. So async memory barriers won't
    happen any time soon either.

    Long term I think c++ will fade into irrelevance along with
    all the other programming languages based on an imperfect
    knowledge of concurrency, which is basically all of them
    right now.

    Given most concurrent operating systems are written in these "imperfect" languages how does that square with your definition? And how would your perfect language run on them?

    Anyway, concurrency is the job of the OS, not the language. C++ threading is just a wrapper around pthreads on *nix and windows threads on Windows. The language just needs an interface to the underlying OS functionality, it should
    not try and implement the functionality itself as it'll always be a hack.


    A start would be C++ having an "always lock free" CAS for two contiguous
    words on systems that support it, yes, even 64 bit. ala:

    struct anchor {
    word a;
    word b;
    };

    cmpxchg8b for x86, cmpxchg16b for x64, ect...

    https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Nov 9 13:51:26 2024
    From Newsgroup: comp.lang.c++

    On 11/4/2024 12:42 PM, Chris M. Thomasson wrote:
    On 11/4/2024 6:09 AM, Muttley@DastartdlyHQ.org wrote:
    On Mon, 4 Nov 2024 07:46:37 -0500
    jseigh <jseigh_es00@xemaps.com> boring babbled:
    On 11/4/24 00:14, Chris M. Thomasson wrote:
    On 10/30/2024 9:39 AM, jseigh wrote:
    On 10/29/24 18:05, Chris M. Thomasson wrote:
    On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:


    Ahhh, if you are using an async membar in your upcoming C++ version, >>>>>> then it would be fine. No problem. A compiler fence ala
    atomic_signal_fence, and the the explicit release, well, it will
    work. I don't see why it would not work.

    For some reason, I thought you were going to not use an async membar >>>>>> in your C++ version. Sorry. However, it still would be fun to test >>>>>> against... ;^)

    The C version has both versions.  The C++ version does only the
    async member version.  But I'm not publishing that code so it's
    a moot point.

    I got side tracked with more heavy math. The problem with C++ code that >>>> uses an async memory barrier is that its automatically rendered into a >>>> non-portable state... Yikes! Imvvvvvho, C/C++ should think about
    including them in some future standard. It would be nice. Well, for us >>>> at least! ;^)

    That's never going to happen.  DWCAS has been around for more than
    50 years and c++ doesn't support that and probably never will.
    You can't write lock-free queues that are ABA free and
    are performant without that.  So async memory barriers won't
    happen any time soon either.

    Long term I think c++ will fade into irrelevance along with
    all the other programming languages based on an imperfect
    knowledge of concurrency, which is basically all of them
    right now.

    Given most concurrent operating systems are written in these "imperfect"
    languages how does that square with your definition? And how would your
    perfect language run on them?

    Anyway, concurrency is the job of the OS, not the language. C++
    threading is
    just a wrapper around pthreads on *nix and windows threads on Windows.
    The
    language just needs an interface to the underlying OS functionality,
    it should
    not try and implement the functionality itself as it'll always be a hack.


    A start would be C++ having an "always lock free" CAS for two contiguous words on systems that support it, yes, even 64 bit. ala:

    struct anchor {
        word a;
        word b;
    };

    Even better:

    struct anchor {
    void* a;
    word b;
    };

    where sizeof(void*) = sizeof(word)... ;^)


    cmpxchg8b for x86, cmpxchg16b for x64, ect...

    https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b



    --- Synchronet 3.20a-Linux NewsLink 1.114