• Re: Help with Streaming and Chunk Processing for Large JSON Data(60 GB) from Kenna API

    From Grant Edwards@grant.b.edwards@gmail.com to comp.lang.python on Mon Sep 30 11:44:50 2024
    From Newsgroup: comp.lang.python

    On 2024-09-30, Left Right via Python-list <python-list@python.org> wrote:
    Whether and to what degree you can stream JSON depends on JSON
    structure. In general, however, JSON cannot be streamed (but commonly
    it can be).

    Imagine a pathological case of this shape: 1... <60GB of digits>. This
    is still a valid JSON (it doesn't have any limits on how many digits a
    number can have). And you cannot parse this number in a streaming way
    because in order to do that, you need to start with the least
    significant digit.

    Which is how arabic numbers were originally parsed, but when
    westerners adopted them from a R->L written language, thet didn't flip
    them around to match the L->R written language into which they were
    being adopted.

    So now long numbers can't be parsed as a stream in software. They
    should have anticipated this problem back in the 13th century and
    flipped the numbers around.




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Grant Edwards@grant.b.edwards@gmail.com to comp.lang.python on Mon Sep 30 14:41:46 2024
    From Newsgroup: comp.lang.python

    On 2024-09-30, Dan Sommers via Python-list <python-list@python.org> wrote:
    On 2024-09-30 at 11:44:50 -0400,
    Grant Edwards via Python-list <python-list@python.org> wrote:

    On 2024-09-30, Left Right via Python-list <python-list@python.org> wrote:
    [...]
    Imagine a pathological case of this shape: 1... <60GB of digits>. This
    is still a valid JSON (it doesn't have any limits on how many digits a
    number can have). And you cannot parse this number in a streaming way
    because in order to do that, you need to start with the least
    significant digit.

    Which is how arabic numbers were originally parsed, but when
    westerners adopted them from a R->L written language, thet didn't
    flip them around to match the L->R written language into which they
    were being adopted.

    Interesting.

    So now long numbers can't be parsed as a stream in software. They
    should have anticipated this problem back in the 13th century and
    flipped the numbers around.

    What am I missing? Handwavingly, start with the first digit, and as
    long as the next character is a digit, multipliy the accumulated
    result by 10 (or the appropriate base) and add the next value.
    [...] But why do I need to start with the least significant digit?

    Excellent question. That's actully a pretty standard way to parse
    numeric literals. I accepted the claim at face value that in JSON
    there is something that requires parsing numeric literals from the
    least significant end -- but I can't think of why the usual algorithms
    used by other languages' lexers for yonks wouldn't work for JSON.

    --
    Grant
    --- Synchronet 3.20a-Linux NewsLink 1.114