EN VI

Python - How do I display row names instead of index numbers while doing df.compare()?

I'm using df.compare where I'm doing a diff between 2 csv's which have same index/row names, but when it does df.compare, it does the diff as expected but gives the row index numbers as 0,2,5,... where ever it find the diff between the csv's. What I'm looking out is instead of the row numbers where It finds the diff, I need df.compare to show me the row text.

diff_out_csv = old.compare(latest,align_axis=1).rename(columns={'self':'old','other':'latest'})

Current output -

            NUMBER1          NUMBER2       NUMBER3            
            old   latest     old  latest   old latest
    0  -14.1685 -14.0132 -1.2583 -1.2611   NaN    NaN
    2   -9.7875 -12.2739 -0.3532 -0.3541  86.0  100.0
    3   -0.0365  -0.0071 -0.0099 -0.0039   6.0    2.0
    4   -1.9459  -1.5258 -0.5402 -0.0492  73.0  131.0

Desired Output -

            NUMBER1          NUMBER2          NUMBER3       
            old   latest     old  latest      old latest
    JACK  -14.1685 -14.0132 -1.2583 -1.2611   NaN    NaN
    JASON  -9.7875 -12.2739 -0.3532 -0.3541   86.0  100.0
    JACOB  -0.0365 -0.0071  -0.0099 -0.0039    6.0    2.0
    JIMMY  -1.9459 -1.5258  -0.5402 -0.0492   73.0  131.0

I was able to replace the column names using df.compare.rename(columns={}) but how do I replace 0, 2, 3, 4 with the text names ?

Solution:

All you need to do here is set the index to the column(s) you want to show row index labels for in your output.

import pandas as pd  # 1.5.3

df1 = pd.DataFrame({
    'name':    ['Alice', 'Bob', 'Charlie', 'Dana', 'Edward', 'Favin'],
    'number1': [1, 2, 3, 4, 5, 6],
    'number2': [1, 2, 3, 4, 5, 6],
})
# print(df1)
#       name  number1  number2
# 0    Alice        1        1
# 1      Bob        2        2
# 2  Charlie        3        3
# 3     Dana        4        4
# 4   Edward        5        5
# 5    Favin        6        6

df2 = df1.assign(
    number1=[1,  2, -1, -1,  5,  6],
    number2=[1, -1, -1,  4,  5, -1],
)
# print(df2)
#       name  number1  number2
# 0    Alice        1        1
# 1      Bob        2       -1
# 2  Charlie       -1       -1
# 3     Dana       -1        4
# 4   Edward        5        5
# 5    Favin        6       -1


# ① set the index
df1 = df1.set_index('name')
df2 = df2.set_index('name')

# ② compare (use result_names=…) to avoid needing to use (.rename(…))
print(
    df1.compare(df2, result_names=('old', 'latest'))
)
#         number1        number2
#             old latest     old latest
# name
# Bob         NaN    NaN     2.0   -1.0
# Charlie     3.0   -1.0     3.0   -1.0
# Dana        4.0   -1.0     NaN    NaN
# Favin       NaN    NaN     6.0   -1.0