• Re: PSA: Clipboard differences between Chromium & Firefox across platforms

    From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 21:01:50 2026
    From Newsgroup: alt.os.linux

    Thanks go to Lawrence for making me find a clipboard analyzer on Windows.
    Tto my knowledge, in decades on these ngs, nobody ever has discussed them.

    This is the one I used:'
    InsideClipboard v1.30
    Web site: https://www.nirsoft.net

    For those in this thread who are not on Windows, there are certain Windows-specific oddities that may need to be explained in that output.

    The "format" field is a numeric identifier that Windows uses internally to label a clipboard format.
    Format ID 1 = CF_TEXT
    Format ID 7 = CF_OEMTEXT
    Format ID 13 = CF_UNICODETEXT
    Format ID 16 = CF_LOCALE
    These are built in Windows formats.
    They exist on every Windows system.

    Yet the important ones in this test were:
    Format ID 49426 = HTML Format
    Format ID 49661 = Chromium internal source RFH token
    Format ID 49683 = Chromium internal source URL

    For each of those, Chromium told Windows (taking one as an example):
    "I want to register a clipboard format named HTML Format"
    and Windows assigned it ID 49426.

    Windows doesn't care what it's called.
    Windows just assigns it an available number.

    Why are the second set of numbers so big?
    Because Windows built-in formats use up all the small numbers:
    1, 7, 13, 16, etc.
    While application-registered formats use available large numbers:
    49426, 49661, 49683.

    These indicate that Chromium placed multiple formats on the clipboard
    HTML Format
    Chromium internal source RFH token
    Chromium internal source URL

    But when your Notepad++ macro rewrote the clipboard, all of those formats disappeared.
    A. This is why Ctrl+A started working again.
    B. The shortcuts.xml CTRL+B macro removed the HTML Fragment land mine

    This proves the PSA (at least the problematic Chromium portion of the PSA).
    And it proves the single-step solution (on Windows).

    Voila!
    I love one-step automation!
    --
    If we think we understand something, we don't. But if we think we don't
    quite yet understand something, then we're just beginning to understand it.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 21:29:15 2026
    From Newsgroup: alt.os.linux

    Maria Sophia wrote:
    Each section of your output helps confirm the pattern I was trying to describe in the PSA and your suggestion for me to do the same resonated.

    Lawrence and Paul and Carlos were way ahead of me all along, but,
    I think I finally figured it out, where I'm usually the last to know. :)

    This reminds me, in a small way, of how I felt when reading Einstein's 1916 book (later revised in the early 1920's, which lost copyright 100 years
    later) in that every revelation reveals a new mystery to resolve next.

    Keeping in mind the whole thing started when I pasted Chromium text into Notepad++ which caused Control+A to die, this is a short explanation.

    Windows assigns every clipboard format a numeric ID, e.g., 1, 7, 13, 16.
    CF_TEXT is the old ANSI text format.
    CF_OEMTEXT is the old OEM codepage text format.
    CF_UNICODETEXT is the modern Unicode text format.
    CF_LOCALE tells Windows what language or locale the text came from
    etc.

    While NirSoft InsideClipboard shows those IDs, it turns out that Chromium registers its own formats by name, so Windows assigns those names whatever available (usually large) numbers it has, such as 49426 or 49683.

    Notepad++ does not use most of those CF clipboard formats directly.
    Notepad++ almost always just asks Windows only for CF_UNICODETEXT.

    The important detail is that Windows uses the HTML Format entry to generate
    the plain text that Notepad++ receives. That conversion step is where the invisible CTRL+A land mine comes from.

    That means the presence of HTML Format changes how the plain
    text is produced, even though Notepad++ never reads the HTML itself.

    When the control+B shortcuts.xml macro rewrites the clipboard, it removes
    HTML Format and all Chromium internal formats.

    With only plain text formats left, Windows no longer has to convert from
    HTML, so the plain text becomes clean and Ctrl+A works again.

    But what exactly is causing Control+A to stop working in Notepad++?
    The reason Ctrl+A dies is not that HTML is pasted into the file.

    The problem actually happens earlier, inside Windows, when Windows converts
    the HTML Format entry into CF_UNICODETEXT for Notepad++.

    When Chromium puts HTML Format on the clipboard, Windows must run its HTML-to-text converter. That converter uses the StartHTML, EndHTML, StartFragment, and EndFragment offsets inside the HTML Fragment block.

    If those offsets are wrong, or if the HTML fragment is malformed, the
    converter can produce a CF_UNICODETEXT stream with hidden control
    characters, mismatched boundaries, or an unexpected buffer length.

    Notepad++ receives that CF_UNICODETEXT stream and loads it into its
    internal Scintilla buffer. If the buffer contains an unexpected control sequence or a broken length field, Scintilla can fail to compute the
    full document range.

    Bingo!

    When that happens, Ctrl+A does not select the whole buffer because
    Scintilla thinks the document ends earlier than it actually does.

    The Control+B macro fixes the issue because it wipes the clipboard and
    replaces it with plain text only (among other things that it does).

    With no HTML Format present, Windows does not run the HTML-to-text
    converter again, so the CF_UNICODETEXT stream is finkally clean
    and Scintilla can compute the correct document length.

    Once the buffer is clean, Ctrl+A works again.
    Whew!

    Given this took me hours to debug & resolve, the whole point of this PSA is
    to help the next person not have to do all the work that I just had to do!
    --
    "Everything should be made as simple as possible, but not simpler."
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 21:58:46 2026
    From Newsgroup: alt.os.linux

    Lawrence D˘Oliveiro wrote:
    On Sun, 15 Feb 2026 17:13:16 -0500, Paul wrote:

    If we're to have clipboard management, there should be
    a way to tell exactly what the source program offered
    and no more.

    That is exactly how it works.

    Speaking of "how it works", I'm not sure if it's a bug, or not, but what do
    you think of this description I created below of how (I think) it works?

    All that's left after fixing the issue was understanding what actually
    went wrong in the first place (which killed the control+A in Notepad++).

    It turns out that Windows does not convert the HTML Format entry into text until an application explicitly asks for a text format.

    So the corruption happens at the moment Notepad++ requests CF_UNICODETEXT.

    The sequence (as far as I can re-construct it) is...

    1. With Ctrl+C, Chromium places several formats on the clipboard,
    including HTML Format, CF_UNICODETEXT, and its internal metadata.

    2. With Ctrl+V, Notepad++ asks Windows:
    "Give me CF_UNICODETEXT."

    3. Windows sees that HTML Format is available and may choose to generate
    the CF_UNICODETEXT stream by converting the HTML fragment.

    Kaboom!

    4. That conversion step can produce a corrupted CF_UNICODETEXT stream.
    The corruption is not visible text. Which is why I couldn't "see" it.
    It is a bad length field or a hidden control character (apparently).

    Is that a bug?
    I don't know.

    5. Scintilla loads that corrupted stream into its internal buffer.
    But the buffer boundaries are now wrong, so Ctrl+A fails because
    Scintilla thinks the document ends earlier than it actually does.

    So why didn't I see it in the Notepad++ hex editor?

    The HTML is never pasted into the file, so it can't be seen.
    But it affects the text Windows hands to Notepad++ at paste time.

    Well then, why does adding and deleting a character fix it?

    Because the corruption lives only in Scintilla's internal buffer
    structures, not in the visible text. When the macro inserts a space,
    Scintilla is forced to rebuild its entire buffer. That rebuild wipes out
    the corrupted boundary. Removing the space forces a second rebuild,
    which simply restores the original content. The second rebuild is not
    needed for the fix; it is only needed to undo the temporary change.

    After that, the macro selects all and cuts the text. Cutting forces
    Windows to create a brand new clipboard entry. This new clipboard entry contains only plain text formats, because Scintilla does not generate
    HTML Format or any Chromium internal formats.

    I don't know if this is a bug or not, as all I know, in the end, is...
    1. The corrupted CF_UNICODETEXT stream from the original paste is gone.
    2. The clipboard now contains only clean plain text.
    3. Scintilla now has a clean buffer with correct boundaries.
    4. Ctrl+A works again.

    Woo hoo!

    So the fix works because Windows created the problem when converting the
    HTML fragment into text, which corrupted Scintilla's internal buffer.
    Adding and removing a character forces Scintilla to rebuild its buffer,
    and cutting the text forces Windows to rebuild the clipboard without
    HTML Format. The corruption cannot survive those two rebuilds.

    I think we explained it as simply as we could, but not simpler.
    --
    How wonderful that we have met with a paradox.
    Now we have some hope of making progress.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Mon Feb 16 03:11:50 2026
    From Newsgroup: alt.os.linux

    On Sun, 15 Feb 2026 21:58:46 -0500, Maria Sophia wrote:

    Is that a bug?

    Only happens under Windows?

    Probably.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From vallor@vallor@vallor.earth to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Mon Feb 16 03:15:35 2026
    From Newsgroup: alt.os.linux

    At Sun, 15 Feb 2026 21:01:50 -0500, Maria Sophia <mariasophia@comprehension.com> wrote:

    Thanks go to Lawrence for making me find a clipboard analyzer on Windows.
    Tto my knowledge, in decades on these ngs, nobody ever has discussed them.

    This is the one I used:'
    InsideClipboard v1.30
    Web site: https://www.nirsoft.net

    For those in this thread who are not on Windows, there are certain Windows-specific oddities that may need to be explained in that output.

    The "format" field is a numeric identifier that Windows uses internally to label a clipboard format.
    Format ID 1 = CF_TEXT
    Format ID 7 = CF_OEMTEXT
    Format ID 13 = CF_UNICODETEXT
    Format ID 16 = CF_LOCALE
    These are built in Windows formats.
    They exist on every Windows system.

    Yet the important ones in this test were:
    Format ID 49426 = HTML Format
    Format ID 49661 = Chromium internal source RFH token
    Format ID 49683 = Chromium internal source URL

    For each of those, Chromium told Windows (taking one as an example):
    "I want to register a clipboard format named HTML Format"
    and Windows assigned it ID 49426.

    Windows doesn't care what it's called.
    Windows just assigns it an available number.

    Why are the second set of numbers so big?
    Because Windows built-in formats use up all the small numbers:
    1, 7, 13, 16, etc.
    While application-registered formats use available large numbers:
    49426, 49661, 49683.

    These indicate that Chromium placed multiple formats on the clipboard
    HTML Format
    Chromium internal source RFH token
    Chromium internal source URL


    Did this unholy horrorshow of a plan come from Cutler and
    VMS?

    fu2: alt.os.linux
    --
    -v System76 Thelio Mega v1.1 x86_64 Mem: 258G
    OS: Linux 6.18.10 D: Mint 22.3 DE: Xfce 4.18 (X11)
    NVIDIA GeForce RTX 3090Ti (24G) (580.105.08)
    "You have two choices for dinner: Take it or Leave it."
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 22:30:32 2026
    From Newsgroup: alt.os.linux

    Carlos E. R. wrote:
    Using the same selection as in my previous test.
    Firefox:

    cer@Laicolasse:~> xclip -selection clipboard -t TARGETS -o
    TIMESTAMP
    TARGETS
    MULTIPLE
    SAVE_TARGETS
    text/html
    text/_moz_htmlcontext
    text/_moz_htmlinfo
    UTF8_STRING
    COMPOUND_TEXT
    TEXT
    STRING
    text/plain;charset=utf-8
    text/plain
    text/x-moz-url-priv
    cer@Laicolasse:~>

    Chrome

    cer@Laicolasse:~> xclip -selection clipboard -t TARGETS -o
    TIMESTAMP
    TARGETS
    SAVE_TARGETS
    MULTIPLE
    STRING
    TEXT
    UTF8_STRING
    text/plain;charset=utf-8
    text/plain
    text/html
    chromium/x-internal-source-rfh-token
    chromium/x-source-url
    cer@Laicolasse:~>

    Hi Carlos,

    (I'm going to remove the mac users fu because they're only trolling us.)

    Woo hoo!

    That was WONDERFUL. I love learning how operating systems, um, operate!

    Thank you for investing time and energy into the test.
    a. Lawrence proved the PSA using wl-paste in Wayland on Linux
    b. You proved the PSA using xclip in X11 on Linux
    c. I proved the PSA using InsideClipboard on Windows
    d. The Apple users are likely way out of their element
    (In hindsight, I should not have included them as they only trolled.)

    We are all reproducing the PSA exactly as expected.
    Working together as a team, we are making progress here for everyone.

    Your X11 results line up exactly with what Lawrence demonstrated on Wayland
    and what I captured on Windows with InsideClipboard.

    Three platforms, three toolchains, and the same pattern every time.

    Firefox exposes text/plain, UTF8_STRING, text/html, and its own Mozilla metadata formats. Chromium exposes text/plain, UTF8_STRING, text/html, and
    its Chromium-specific metadata formats. The names differ across platforms,
    but the structure is identical.

    Your X11 TARGETS dump confirms that the clipboard behavior is consistent
    across Wayland, X11, and Windows. The only thing that changes is how the platform expresses the formats such as
    a. MIME types on X11/Wayland
    b. CF_* formats on Windows
    c. Who knows how macOS does it (least of all the mac users!)
    The underlying reality is the same on all the tested platforms.

    So yes, you have reproduced Lawrence's Wayland results and my Windows
    results exactly. This closes the loop and confirms the PSA across all three environments.

    Let me attempt to break down what your X11 output shows, AFAICT,
    in the same way we analyzed Lawrence's Wayland data and my Windows data.

    FIREFOX ON X11
    --------------
    text/html
    text/_moz_htmlcontext
    text/_moz_htmlinfo
    text/plain;charset=utf-8
    UTF8_STRING
    COMPOUND_TEXT
    TEXT
    STRING
    text/plain
    text/x-moz-url-priv

    This matches Lawrence's Wayland results line for line:

    Firefox always exports text/html.
    Firefox always exports its own Mozilla-specific metadata formats:
    text/_moz_htmlcontext
    text/_moz_htmlinfo
    text/x-moz-url-priv
    Firefox always exports multiple plain-text formats:
    UTF8_STRING, TEXT, STRING, text/plain, etc.

    This is exactly what we saw on Wayland, and it is exactly what I saw on
    Windows (just expressed as CF_* formats instead of MIME types).

    CHROMIUM ON X11
    ---------------
    text/html
    chromium/x-internal-source-rfh-token
    chromium/x-source-url
    text/plain;charset=utf-8
    text/plain
    UTF8_STRING
    TEXT
    STRING

    This is the X11 equivalent of what both Lawrence and I observed:

    Chromium always exports text/html.
    Chromium always exports its own internal metadata formats:
    chromium/x-internal-source-rfh-token
    chromium/x-source-url
    Chromium always exports multiple plain-text formats.

    On Windows, these appear as:
    HTML Format
    Chromium internal source RFH token
    Chromium internal source URL
    CF_TEXT
    CF_OEMTEXT
    CF_UNICODETEXT
    CF_LOCALE

    On Wayland, Lawrence saw:
    text/html
    chromium/x-source-url
    chromium/x-internal-source-rfh-token
    text/plain;charset=utf-8
    UTF8_STRING
    TEXT
    STRING

    Your X11 results match both sets exactly.

    THE CROSS-PLATFORM PATTERN
    --------------------------

    Across all three environments:

    Firefox exports HTML plus Mozilla metadata plus plain text.
    Chromium exports HTML plus Chromium metadata plus plain text.
    The names differ (MIME types on X11/Wayland, CF_* formats on Windows),
    but the structure is identical.

    This is the key point: Chromium always includes HTML Format and its
    metadata, even when the selection looks like plain text.

    That is the same behavior that triggered the Windows HTML-to-text
    conversion bug that corrupted Scintilla's buffer and killed Ctrl+A.

    CONFIRMATION OF THE PSA
    ------------------------
    Lawrence confirmed the behavior on Wayland.
    I confirmed the behavior on Windows.
    You have now confirmed the behavior on X11.

    Three platforms, three toolchains, one consistent truth:

    Chromium always places HTML Format and internal metadata on the clipboard,
    and that HTML fragment can influence how downstream applications interpret
    the plain text.

    This is exactly the Chromium side of the PSA, now verified independently on
    all major desktop environments.

    Beautiful work, Carlos. Lawrence. Paul. Andy. And others.
    All of us provided facts that had to be fit into Occam's Razor.

    I'm not sure if this is a bug, but at least we can begin to understand it.
    --
    My brain is wired for Occam's Razor, but to be always 100% logically
    correct, I need to have as many facts as I can to fit into the picture.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 22:44:41 2026
    From Newsgroup: alt.os.linux

    Lawrence D˘Oliveiro wrote:
    On Sun, 15 Feb 2026 21:58:46 -0500, Maria Sophia wrote:

    Is that a bug?

    Only happens under Windows?

    Probably.

    Hi Lawrence,

    That's a good point because we got so wrapped up in proving that the
    underlying mechanism were the same between the various platforms (although
    the mac users only trolled us so I'm gonna remove them in the fup line)
    that we forget that there is a real "issue" with the Control+A breaking.

    It does seem to be a Windows corruption of Scintilla's internal indexing
    tables when Windows hands Chromium data to Scintilla.
    a. line boundaries
    b. document length
    c. byte offsets
    d. character offset

    But why doesn't it happen with Firefox?

    The more confused I am, the more I learn, so I think it all comes down to
    how each browser constructs its clipboard data diffferently.

    Which, after all, is the SUBJECT line of this thread, as it's all about
    Firefox versus Chromium, which are on "all common consumer platforms". :)

    Back to Firefox, my tentative assessment is Chromium always puts a full
    HTML Fragment block on the clipboard, complete with StartHTML, EndHTML, StartFragment, EndFragment offsets, and its own chromium/x-* metadata.

    Yet Firefox does not.

    Unfortunately, Windows uses that HTML Fragment to generate CF_UNICODETEXT.

    If the HTML fragment or offsets are malformed, Windows produces a
    corrupted CF_UNICODETEXT stream. That corrupted stream breaks Scintilla.

    I can only assume Firefox's HTML fragment is simpler and hence Firefox's
    HTML fragment does not trigger the buggy Windows conversion path.

    What do you think of that hypothesis of why only Chromium, not Firefox?
    --
    This is getting more and more like Heisenberg's uncertainty principle.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux on Sun Feb 15 22:53:44 2026
    From Newsgroup: alt.os.linux

    Carlos E. R. wrote:
    Regarding ONLY your comment:
    "I already tested and found no bug."
    I don't think anyone suggested there was a bug, where we're all simply
    openly and honestly discussing a, oh, shall we say, "quirk" in the system.

    The problem you have with ctrl-A in Notepad++ after pasting html from Chrome.

    Hi Carlos,

    I'm gonna remove the mac folks in the fup because they've not participated
    save to troll us, but after discussing this with Lawrence, and using
    Occam's Razor to put all the data together, I don't think anyone has
    suggested that graphical Linux editors have the same control+A disabling.

    So I think I agree now that there may well be a bug in how Windows hands
    the text from Chromium (but not from Firefox) to Scintilla/Notepad++.

    In keeping with the original SUBJECT of this thread, where Firefox and
    Chromium run on all common consumer platforms (well, not on iOS), my best
    guess is that Firefox's HTML fragment is simpler and hence Firefox's
    HTML fragment does not trigger the buggy Windows conversion path.

    Who knows how many $EDITORs are affected...
    --
    How wonderful that we have met with a paradox.
    Now we have some hope of making progress.
    --- Synchronet 3.21b-Linux NewsLink 1.2