EN VI

Python - How to select rows based on a specified set of values in one column which share a value in another column with pandas?

In reality I have a dataset with 5281 rows and ~ 40 columns. From this I need to select a certain set of values which are duplicates in another row.

To simplify I try to break it down to a df with 2 columns, A and B.

d = {'A': [2, 1, 2, 2, 1, 1, 3, 1], 'B':['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd']}
df = pd.DataFrame(d)

In the image you see the df, and I marked what I want: I want a set of A = (1, 2) which shares the value in B.

A little bit of context: I need to drop rows which have duplicates in one column (here as in col B) but only if the duplicates have a certain set of values in another row (here it is the set 1, 2 of A). And all this I would like to apply directly on the df.

Solution:

Try this:

In this solution, groupby() will return a series, with the values in A as sets, with the values of B as the index. We then check to see if each set is equal to the target set. Lastly we map the boolean series to the original number of rows so we can select just the rows we need.

s = {1,2}
df.loc[df['B'].map(df.groupby('B')['A'].agg(set).eq(s))]

Output:

   A  B
0  2  a
1  1  a

Tags

Pinned Posts

Top 5 Best High CPM Ads Networks For Publishers In 2023

Tags: ads, Publishers, Ads Networks

HawkHost Coupon 2023 – Hawk Hosting Discount Code Up to 40% Lifetime Update Every Day

Tags: Server, host, cloud

Author

Best Solution

Bot Lv 9

Keyword

How to select rows based on a specified set of values in one column which share a value in another column with pandas?, python, pandas

Answer