EN VI

Python - Take the last occurrence of a string value based on the latest date?

2024-03-12 17:30:08
How to Python - Take the last occurrence of a string value based on the latest date

There are 3 columns, say: Item, Item type and Sales. Now item type keeps changing frequently. Sometimes an item (say, Apple) might have item type as one thing on a particular date ("Healthy" on Jan 1) but might have something else on another date ("Vegan" on Jan 5). There are more than 500 such items.

Eg dataframe:

Item Item type Date Sales
Apple Healthy Jan 1 10
Apple Vegan Jan 5 5

The output needs to be:

Item Item type Sales
Apple Vegan 15

(assuming Vegan is the last entry in the data) and the cumulative total sales is 15.

Now if on Jan 7 another entry is made (Apple, Fruit, 2) for Item, item type, sales, the output changes to: Eg dataframe:

Item Item type Date Sales
Apple Healthy Jan 1 10
Apple Vegan Jan 5 5
Apple Fruit Jan 7 2

The required output:

Item Item type Sales
Apple Fruit 17

I tried using the df.groupby .last function but it's only applicable on numerical data.

Solution:

The groupby last function should work in this use case, as long as you can confirm that your data is being entered chronologically you can just do

import pandas

df = pandas.DataFrame(columns=['item', 'type', 'date', 'sales'])
df.loc[0] = ["Apple", "unripe", "Jan 1", 10]
df.loc[1] = ["Apple", "ripe", "Jan 10", 5]
df.loc[2] = ["Orange", "ripe", "Jan 12", 6]
df.groupby(["item"]).agg({'type': 'last', 'date': 'last', 'sales': sum})

That groupby ensures that the latest entries of date and type are used while sales is the sum of all the grouped entries.

The output looks like

        type    date    sales
item            
Apple   ripe    Jan 10  15
Orange  ripe    Jan 12  6
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login