EN VI

Python - Trying to stream my (very large) json file with ijson - is it formatted wrong?

2024-03-11 20:00:08
Python - Trying to stream my (very large) json file with ijson - is it formatted wrong?

I'm trying to stream through a large json file using ijson in python. This is my first time trying this.

my code is really simple right now:

with open('file.json', 'rb') as f:
j = ijson.items(f, 'item')

for item in j:
    print('x')

This returns a "trailing garbage" error - essentially the 2nd item in the file is considered garbage, i think because of the file format.

My json file is this one from kaggle, and is formatted like this:

{"_id":{"$oid":"6457879fd1187d621cbbba9c"},"sourceCC":"us",...etc...}
{"_id":{"$oid":"6457879fd1187d621cbddd8a"},"sourceCC":"us",...etc...}

It is about 3GB in size, so im unable to open it.

If i use 'multiple_items=True' i believe it considers all the items to be multiple values for the same item, so it does not return any error, but also does not return anything else.

What can I do?

Thanks.

Solution:

That's not actuall a JSON document. That is a series of JSON documents concatenated using newlines. You don't need ijson to read it; you can instead read it line-by-line and use the built-in json module:

import json

with open('myfile.json') as fd:
  for line in fd:
    obj = json.loads(line)
    # do something with obj here
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login