• Re: Zen Microcode

    From mitchalsup@mitchalsup@aol.com (MitchAlsup1) to comp.arch on Fri Mar 7 20:29:26 2025
    From Newsgroup: comp.arch

    A "good try" at encryption is what engineers show management
    in order to claim they know what they are doing {{even when
    they really don't}}.

    I was in the meetings where the AMD architecture team discussed
    this "security issue" and I can name names.
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Mar 9 16:20:10 2025
    From Newsgroup: comp.arch

    On 3/7/2025 2:29 PM, MitchAlsup1 wrote:
    A "good try" at encryption is what engineers show management
    in order to claim they know what they are doing {{even when
    they really don't}}.

    I was in the meetings where the AMD architecture team discussed
    this "security issue" and I can name names.


    Not sure about the specifics of this case.


    But, sometimes one can also use encryption mostly as a legal tool (say,
    for anti-tampering).
    Like, if it is just bare data, they can't do as much.
    But, if encryption or similar is involved, they can bring in the full
    force of the law...


    In the latter case, the encryption would often be something like XOR'ing
    with a bit pattern or a Caesar cipher or similar.


    Like, say, lazy man's encryption could be something like:
    void encode(void *dst, void *src, int sz, uint64_t key)
    {
    uint64_t *cs, *ct, *cse;
    cs=src; cse=cs+(sz+7)>>3; ct=dst;
    while(cs<cse)
    { *ct++=(*cs++)+key; }
    }
    void decode(void *dst, void *src, int sz, uint64_t key)
    { encode(dst, src, sz, (~key)+1); }

    Where, in this case, the strength (or lack thereof) doesn't really matter.

    If you happen to already know some of the non-encoded data, breaking
    this is trivial (and figuring out 8 bytes is enough to decode the whole thing). Only reason to do it 8 bytes at a time (vs 1 byte) is because 8
    bytes is faster.

    But, if encoding a known format (say, PE/COFF or WAV or similar), could probably crack it very quickly relying on some basic knowledge of the
    file format (eg, where to find magic numbers and blobs of NUL bytes).
    Could potentially break it in under 1000 clock-cycles this way.



    Or, maybe they could make it a little stronger by using PRNG...

    uint64_t permuteKey(uint64_t key)
    {
    uint64_t ckey, cklo, ckhi, cka;
    cklo=((uint32_t)(key>> 0))*0xE20B7AC6ULL; //*1
    ckhi=((uint32_t)(key>>32))*0xE20B7AC6ULL;
    cka=(ckhi>>32)|((cklo>>32)<<32);
    ckey=key+cka;
    return(ckey);
    }
    *1: Use cases that can be turned into a (faster) 32-bit widening
    multiply. Where, full 64-bit multiply is unreasonably slow. In this
    case, the multiplies serve to mix the bits around somewhat.

    void encode(void *dst, void *src, int sz,
    uint64_t key1, uint64_t key2)
    {
    uint64_t ckey, cka, ckb, ckc, ckstep, v;
    uint64_t *cs, *ct, *cse;
    int n;

    cs=src; cse=cs+(sz+7)>>3; ct=dst;

    //setup cost, likely expensive, probably unavoidable
    cka=key1; ckb=key2; ckc=key1^key2;
    ckey=((uint32_t)ckc)*0xE20B7AC6ULL;
    n=(ckey>>32)&63;
    while(n--)
    cka=permuteKey(cka);
    n=(ckey>>38)&63;
    while(n--)
    ckb=permuteKey(ckb);
    n=(ckey>>44)&15;
    while(n--)
    ckc=permuteKey(ckc);

    ckey=cka+ckb; n=64;
    ckstep=ckey+ckc;
    ckey=permuteKey(ckey); //(strength boost)
    ckstep=permuteKey(ckstep); //?

    while(cs<cse)
    {
    v=(*cs++);
    n--;
    *ct++=v^ckey;
    ckey+=ckstep; //weak, but cheap-ish...
    ckstep=(ckstep<<1)^(ckstep>>27); //? (strength boost)

    //permute key, stronger but slow
    if(!n)
    {
    cka=permuteKey(cka);
    ckb=permuteKey(ckb);
    ckc=permuteKey(ckc);
    ckey=cka+ckb;
    ckstep=ckey+ckc;
    ckey=permuteKey(ckey); //? (strength boost)
    ckstep=permuteKey(ckstep); //?
    n=64; //so only do it rarely
    }
    }
    }

    Where, it would be no longer sufficient to know N bytes of payload data
    to break it. As for whether it would be acceptably cheap/fast is unknown.

    To try to limit computational cost, only permute keys once every 512
    bytes or so (though, it would still be fairly weak within each 512
    block; but doing this too often could negatively effect data throughput).

    Could be made faster (say, by working 32 bytes at a time), but would get probably a bit too bulky for use as an example here (but, I suspect
    could be possible to get it within around 80% of memcpy speed with some creative unrolling).

    Switched to XOR in the example (as the final data-facing step), which
    avoids needing a separate decoder function.


    Or, a possible faster/cheaper intermediate option being to not
    re-permute mid-stream.

    Though, if one had a chunk of known data (*2), it could be possible to
    work out the step values (using the power of integer subtract), and
    break the rest. So, probably not sufficient... (Maybe passably if this strategy would only break a small chunk of data).

    *2: Say, magic numbers or known locations where one is likely to find
    blobs of NUL bytes or similar given the file format.


    Say, it probably at least needs to look like it would be hard to break,
    and not something where someone can look at it and figure out that the
    key could be broken by subtracting pairs of values and then effectively
    having captured the key-state for the whole message...

    While, ideally, also not adding too much computational overhead.

    Though, not sure where exactly would be the lower bar here (probably
    needs to at least appear like it would work).

    ...


    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Thu Apr 3 07:47:42 2025
    From Newsgroup: comp.arch

    On Sun, 9 Mar 2025 16:20:10 -0500, BGB wrote:

    In the latter case, the encryption would often be something like XOR'ing
    with a bit pattern or a Caesar cipher or similar.

    XOR is perfectly fine as an encryption technique, provided that the
    sequence being XORed with is sufficiently strongly pseudorandom.

    This is known as a “stream” cipher. Basically, any “block” cipher can be
    turned into a stream cipher by using it to generate the XOR sequence.
    --- Synchronet 3.20c-Linux NewsLink 1.2