EN VI

Python - Unstack only the last three columns?

2024-03-12 20:00:06
How to Python - Unstack only the last three columns

We start with this data:

import numpy as np
import pandas as pd
data=pd.DataFrame(data=np.random.rand(10,5),columns=['headA','headB','tailA','tailB','tailC'])

enter image description here

Now I want to perform a certain unstack operation which unstacks only the last three columns. Hence, this should be the new index:

pd.MultiIndex.from_product([data.columns[-3:],data.index])

MultiIndex([('tailA', 0),('tailA', 1),('tailA', 2),('tailA', 3),('tailA', 4),('tailA', 5),('tailA', 6),('tailA', 7),('tailA', 8),('tailA', 9),('tailB', 0),('tailB', 1),('tailB', 2),('tailB', 3),('tailB', 4),('tailB', 5),('tailB', 6),('tailB', 7),('tailB', 8),('tailB', 9),('tailC', 0),('tailC', 1),('tailC', 2),('tailC', 3),('tailC', 4),('tailC', 5),('tailC', 6),('tailC', 7),('tailC', 8),('tailC', 9)],)

I think, to perform this, I have to put the first two columns into another level than the remaining three columns. I don't know how I can do that in an elegant way. Any suggestions?

Solution:

You could melt with ignore_index=False, then set the new variable column as index with set_index append=True:

out = (data.melt(data.columns[:-3], ignore_index=False, var_name='tail')
           .set_index('tail', append=True).swaplevel()
      )

Alternatively with a more classical stack:

cols = data.columns[:-3].tolist()

out = (data.set_index(cols, append=True).stack()
           .reset_index(cols, name='value')
           .swaplevel()
          #.sort_index() # only if order is important
      )

Or with lreshape:

cols = data.columns[-3:]
out = (pd.lreshape(data, {'value': cols})
         .set_index(pd.MultiIndex.from_product([cols, data.index]))
      )

Output:

            headA     headB     value
tail                                 
tailA 0  0.587533  0.472803  0.650371
      1  0.465910  0.384050  0.623001
      2  0.376165  0.724400  0.741483
      3  0.589693  0.243686  0.833794
      4  0.695016  0.808199  0.599436
      5  0.976881  0.594713  0.315471
      6  0.599184  0.263577  0.701009
      7  0.524138  0.786585  0.975606
      8  0.030385  0.903692  0.722221
      9  0.175104  0.600267  0.830920
tailB 0  0.587533  0.472803  0.031079
      1  0.465910  0.384050  0.693012
      2  0.376165  0.724400  0.455640
      3  0.589693  0.243686  0.331427
      4  0.695016  0.808199  0.971165
      5  0.976881  0.594713  0.277134
      6  0.599184  0.263577  0.720298
      7  0.524138  0.786585  0.786354
      8  0.030385  0.903692  0.701858
      9  0.175104  0.600267  0.854581
tailC 0  0.587533  0.472803  0.991319
      1  0.465910  0.384050  0.167928
      2  0.376165  0.724400  0.336915
      3  0.589693  0.243686  0.193647
      4  0.695016  0.808199  0.877337
      5  0.976881  0.594713  0.026955
      6  0.599184  0.263577  0.138335
      7  0.524138  0.786585  0.727579
      8  0.030385  0.903692  0.087960
      9  0.175104  0.600267  0.578285

timings

10 rows

# melt + set_index
2.01 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# stack (without sort_index)
1.5 ms ± 48.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# lreshape
1.42 ms ± 35 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

100k rows

# melt + set_index
38.3 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# stack (without sort_index)
12.6 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# stack (with sort_index)
45.2 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# lreshape
29.2 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login