EN VI

Python - Comparing two data frames with condition and removing all that not qualified?

2024-03-14 07:00:04
How to Python - Comparing two data frames with condition and removing all that not qualified

I have two data frames. I have tried to generate a short data to explain what I am looking for, any suggestion or help is appreciated.

df = pd.DataFrame({'policy number':[11,22,33,44,55,66,77,88,99], ' policy status':['good', 'good', 'good', 'good', 'good','good', 'good', 'good', 'good']})

df_2 = pd.DataFrame({'policy number':[11,83,63,44,55,66,67,88,99,100], 'policy status':['bad','bad', 'good', 'good', 'bad', 'good','bad', 'good', 'average', 'good']})

I want to compare two data frames by policy number, if the column [policy status] is still good, I want to keep those policies. Else I want to remove them from my first data frame.

Is there any easier way for this? I have tried to iterate each rows of two data frames and compare them, but this takes a lot time, since I have bigger datasets.

Thanks in advance!

Solution:

If I understand you correctly, you can use pd.Series.isin for the task (for creating the boolean mask):

print(
    df[
        df["policy number"].isin(
            df_2.loc[df_2["policy status"] == "good", "policy number"]
        )
    ]
)

Prints:

   policy number  policy status
3             44           good
5             66           good
7             88           good
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login