EN VI

SQL - How to fill in missing values based on 2 columns?

2024-03-13 16:00:06
SQL - How to fill in missing values based on 2 columns
+-----------+----------------------+-----------------------+-----------------------+
| review_id |      old_group       |       new_group       |    desired_result     |
+-----------+----------------------+-----------------------+-----------------------+
|         1 | null                 | null                  | BATCH                 |
|         2 | null                 | null                  | BATCH                 |
|         3 | BATCH                | LEVEL_1               | LEVEL_1               |
|         4 | null                 | null                  | LEVEL_1               |
|         5 | LEVEL_1              | LEVEL_2               | LEVEL_2               |
|         6 | LEVEL_2              | L3                    | LEVEL_3               |
|         7 | null                 | null                  | LEVEL_3               |
|         8 | null                 | null                  | LEVEL_3               |
|         9 | null                 | null                  | LEVEL_3               |
+-----------+----------------------+-----------------------+-----------------------+

I am working in snowflake and need to write a SQL query that based one new_group and old_group will create a new column filling in the blanks Each review is assigned a new and old group, I need to combine those values in order to have a separate column that would show specific review group on that review.

I've tried FIRST_VALUE(new_group IGNORE NULLS) OVER (ORDER BY review_id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) but, it does not provide the right result: the last reviews are null as there is no upcoming group and the reviews are a bit shifted.

Solution:

Here is one approach, which uses a LAG() trick to detect when a new block of NULL missing values starts and ends, to be filled in with the non NULL value from that group.

WITH cte1 AS (
    SELECT t.*,
        CASE WHEN old_group IS NOT NULL AND
                  (LAG(old_group) OVER (ORDER BY review_id) IS NULL OR
                   LAG(old_group) OVER (ORDER BY review_id) <> old_group)
             THEN 1 ELSE 0 END AS old_flag,
        CASE WHEN new_group IS NOT NULL AND
                  (LAG(new_group) OVER (ORDER BY review_id) IS NULL OR
                   LAG(new_group) OVER (ORDER BY review_id) <> new_group)
             THEN 1 ELSE 0 END AS new_flag
    FROM yourTable t
),
cte2 AS (
    SELECT t.*, SUM(old_flag) OVER (ORDER BY review_id) AS old_grp,
                SUM(new_flag) OVER (ORDER BY review_id) AS new_grp
    FROM cte1 t
)

SELECT review_id, old_group, new_group,
       COALESCE(MAX(new_group) OVER (PARTITION BY new_grp),
                MAX(CASE WHEN new_grp = 1 THEN old_group END) OVER ()) AS desired_result
FROM cte2
ORDER BY review_id;

screen capture from demo link below

Demo

Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login