EN VI

Snowflake - Flatten JSON?

How to Snowflake - Flatten JSON

i have a json like this:

{
    "col1": 5905,
    "col2": "4540",
    "col3": "23",
    "obj1": {
        "col4": 5905,
        "col5": 3100000000000,
        "col6": null
    },
    "obj2": [
        {
            "col7": null,
            "col8": 1
        },
        {
            "col7": null,
            "col8": 2
        }
    ],
    
    "obj3": {
        "col9": 2611937,
        "col10": "ABCD",
        "obj4": {
            "col11": "1-1PT-515",
            "col12": "XXXXXXX"
        },
        "col13": 152
    }
}

I'm not an expert on json formatting so i have this script in snowflake to flatten this file and obtaining a columnar table with each "colx" as a column and each value as a new row but the performance is really bad

select 
    tab1.col1::VARCHAR AS col_1
    ,tab1.col2::VARCHAR AS col_2
    ,tab1.col3::VARCHAR AS col_3
    ,sub1.this:col4::VARCHAR AS col_4
    ,sub1.this:col5::VARCHAR AS col_5
    ,sub1.this:col6::VARCHAR AS col_6
    ,sub2.this:col7::VARCHAR AS col_7
    ,sub2.this:col8::VARCHAR AS col_8
    ,sub3.this:col9::VARCHAR AS col_9
    ,sub3.this:col10::VARCHAR AS col_10
    ,sub4.this:col11::VARCHAR AS col_11
    ,sub4.this:col12::VARCHAR AS col_12
    ,sub3.this:col13::VARCHAR AS col_13
FROM
    DB.SCHEMA.TABLE_1 tab1
    ,lateral flatten(input => obj1) sub1
    ,lateral flatten(input => obj2) sub2
    ,lateral flatten(input => obj3) sub3
    ,lateral flatten(sub3.this:obj4) sub4
GROUP BY ALL;

I had to add a "group by all" condition in the end because is generating thousands of rows, for example, with 4.500 rows, generates over 260k repeating each row multiple times,

Thoughts on how to approach this?? I really do hope there's a better way to do this because is taking a lot!

Thanks

Solution:

with this data AND sql you only get one row.

So if we see the the SQL without the GROUP BY:

with table_1(j) as (
select parse_json('{
    "col1": 5905,
    "col2": "4540",
    "col3": "23",
    "obj1": {
        "col4": 5905,
        "col5": 3100000000000,
        "col6": null
    },
    "obj2": [
        {
            "col7": null,
            "col8": 1
        },
        {
            "col7": null,
            "col8": 2
        }
    ],
    
    "obj3": {
        "col9": 2611937,
        "col10": "ABCD",
        "obj4": {
            "col11": "1-1PT-515",
            "col12": "XXXXXXX"
        },
        "col13": 152
    }
}')
)
select 
    tab1.j:col1::VARCHAR AS col_1
    ,tab1.j:col2::VARCHAR AS col_2
    ,tab1.j:col3::VARCHAR AS col_3
    ,sub1.this:col4::VARCHAR AS col_4
    ,sub1.this:col5::VARCHAR AS col_5
    ,sub1.this:col6::VARCHAR AS col_6
    ,sub2.this:col7::VARCHAR AS col_7
    ,sub2.this:col8::VARCHAR AS col_8
    ,sub3.this:col9::VARCHAR AS col_9
    ,sub3.this:col10::VARCHAR AS col_10
    ,sub4.this:col11::VARCHAR AS col_11
    ,sub4.this:col12::VARCHAR AS col_12
    ,sub3.this:col13::VARCHAR AS col_13
FROM
    TABLE_1 as tab1
    ,lateral flatten(input => j:obj1) as sub1
    ,lateral flatten(input => j:obj2) as sub2
    ,lateral flatten(input => j:obj3) as sub3
    ,lateral flatten(sub3.this:obj4) as sub4
--GROUP BY ALL
;

gives 48 rows:

so given you only have one array, obj2, that should be the only row flattened:

with table_1(j) as (
select parse_json('{
    "col1": 5905,
    "col2": "4540",
    "col3": "23",
    "obj1": {
        "col4": 5905,
        "col5": 3100000000000,
        "col6": null
    },
    "obj2": [
        {
            "col7": null,
            "col8": 1
        },
        {
            "col7": null,
            "col8": 2
        }
    ],
    
    "obj3": {
        "col9": 2611937,
        "col10": "ABCD",
        "obj4": {
            "col11": "1-1PT-515",
            "col12": "XXXXXXX"
        },
        "col13": 152
    }
}')
)
select 
    tab1.j:col1::VARCHAR AS col_1
    ,tab1.j:col2::VARCHAR AS col_2
    ,tab1.j:col3::VARCHAR AS col_3
    ,tab1.j:obj1:col4::VARCHAR AS col_4
    ,tab1.j:obj1:col5::VARCHAR AS col_5
    ,tab1.j:obj1:col6::VARCHAR AS col_6
    ,sub2.value:col7::VARCHAR AS col_7
    ,sub2.value:col8::VARCHAR AS col_8
    ,tab1.j:obj3:col9::VARCHAR AS col_9
    ,tab1.j:obj3:col10::VARCHAR AS col_10
    ,tab1.j:obj3:obj4:col11::VARCHAR AS col_11
    ,tab1.j:obj3:obj4:col12::VARCHAR AS col_12
    ,tab1.j:obj3:col13::VARCHAR AS col_13
FROM
    TABLE_1 as tab1
    ,lateral flatten(input => j:obj2) as sub2
;

gives: