Dear Python Experts,
I am working with the Kenna Application's API to retrieve vulnerability
data. The API endpoint provides a single, massive JSON file in gzip format, approximately 60 GB in size. Handling such a large dataset in one go is proving to be quite challenging, especially in terms of memory management.
I am looking for guidance on how to efficiently stream this data and
process it in chunks using Python. Specifically, I am wondering if there’s a way to use the requests library or any other libraries that would allow
us to pull data from the API endpoint in a memory-efficient manner.
Here are the relevant API endpoints from Kenna:
- Kenna API Documentation
<https://apidocs.kennasecurity.com/reference/welcome>
- Kenna Vulnerabilities Export
<https://apidocs.kennasecurity.com/reference/retrieve-data-export>
If anyone has experience with similar use cases or can offer any advice, it would be greatly appreciated.
Thank you in advance for your help!
Best regards
Asif Ali
--
https://mail.python.org/mailman/listinfo/python-list
On 9/30/2024 11:30 AM, Barry via Python-list wrote:
On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
import polars as pl
pl.read_json("file.json")
This is not going to work unless the computer has a lot more the 60GiB of RAM.
As later suggested a streaming parser is required.
Streaming won't work because the file is gzipped. You have to receive
the whole thing before you can unzip it. Once unzipped it will be even larger, and all in memory.
In Common Lisp, integers can be written in any integer base from two
to thirty six, inclusive. So knowing the last digit doesn't tell
you whether an integer is even or odd until you know the base
anyway.
In Common Lisp, you can write integers as #nnR[digits], where nn is the decimal representation of the base (possibly without a leading zero),
the # and the R are literal characters, and the digits are written in
the intended base. So the input #16fFFFF is read as the integer 65535.
By that definition of "streaming", no parser can ever be streaming,
because there will be some constructs that must be read in their
entirety before a suitably-structured piece of output can be
emitted.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 994 |
Nodes: | 10 (0 / 10) |
Uptime: | 97:17:37 |
Calls: | 13,016 |
Calls today: | 2 |
Files: | 186,574 |
Messages: | 3,282,095 |