EN VI

Python - Adding numpy arrays to cells of a pandas DataFrame depends on initialisation?

2024-03-13 17:00:04
How to Python - Adding numpy arrays to cells of a pandas DataFrame depends on initialisation

I was trying to add a list of numpy arrays as elements to the pandas DataFrame:

DataFrame

using:

df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]

Whether or not this is allowed seems to depend on how I initialise df:

Testing two different initialisations of df

Can someone explain to me what's going on?

Here's the code as text for everyone to try:

The code that's not working

df = pd.DataFrame(columns=['A','B'])
a = [1,2,0,4,5]
b = [3,4,4,7,3]

df['A'] = a
df['B'] = b

df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
df

The code that's working

df = pd.DataFrame(columns=['A','B'])
a = [1,2,0,4,5]
b = [3,4,4,7,3]

for i in range(len(a)):
    df.loc[i,'A'] = a[i]
    df.loc[i,'B'] = b[i]

df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]),np.array([2,3])]
df

Solution:

You have to create a Series with the correct index:

m = df['B']==4

df.loc[m, 'A'] = pd.Series([np.array([5, 6, 7, 8]), np.array([2, 3])],
                           index=df.index[m])

Note that in a future version of pandas this might trigger an error since the original dtype for A is integer. You would first need to convert to object:

df['A'] = df['A'].astype(object)

m = df['B']==4

df.loc[m, 'A'] = pd.Series([np.array([5, 6, 7, 8]), np.array([2, 3])],
                           index=df.index[m])

Output:

              A  B
0             1  3
1  [5, 6, 7, 8]  4
2        [2, 3]  4
3             4  7
4             5  3
why does the second approach work?

Not sure, most likely due to a peculiar internal state of the DataFrame (I suspect because it's initialized solely from a loop and an empty object DataFrame), but this is most likely not supposed to work and is very unstable.

For instance this would fail if you add another column (even object):

df = pd.DataFrame(columns=['A','B'])
a = [1,2,0,4,5]
b = [3,4,4,7,3]

df['C'] = 'X'  # we just add one extra column

for i in range(len(a)):
    df.loc[i,'A'] = a[i]
    df.loc[i,'B'] = b[i]

df.loc[df['B']==4,'A'] = [np.array([5, 6, 7, 8]), np.array([2, 3])]
# error

But creating the DataFrame from a single block object numpy array works:

a = [1,2,0,4,5]
b = [3,4,4,7,3]
df = pd.DataFrame(np.c_[a, b].astype('object'), columns=['A','B'])
df.loc[df['B']==4, 'A'] = [np.array([5, 6, 7, 8]), np.array([2, 3])]
# no error
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login