There are 3 columns, say: Item, Item type and Sales. Now item type keeps changing frequently. Sometimes an item (say, Apple) might have item type as one thing on a particular date ("Healthy" on Jan 1) but might have something else on another date ("Vegan" on Jan 5). There are more than 500 such items.
Eg dataframe:
Item | Item type | Date | Sales |
---|---|---|---|
Apple | Healthy | Jan 1 | 10 |
Apple | Vegan | Jan 5 | 5 |
The output needs to be:
Item | Item type | Sales |
---|---|---|
Apple | Vegan | 15 |
(assuming Vegan is the last entry in the data) and the cumulative total sales is 15.
Now if on Jan 7 another entry is made (Apple, Fruit, 2) for Item, item type, sales, the output changes to: Eg dataframe:
Item | Item type | Date | Sales |
---|---|---|---|
Apple | Healthy | Jan 1 | 10 |
Apple | Vegan | Jan 5 | 5 |
Apple | Fruit | Jan 7 | 2 |
The required output:
Item | Item type | Sales |
---|---|---|
Apple | Fruit | 17 |
I tried using the df.groupby
.last
function but it's only applicable on numerical data.