EN VI

Python - Dividing each column in Polars dataframe by column-specific scalar from another dataframe?

How to Python - Dividing each column in Polars dataframe by column-specific scalar from another dataframe

Polars noob, given an m x n Polars dataframe df and a 1 x n Polars dataframe of scalars, I want to divide each column in df by the corresponding scalar in the other frame.

import numpy as np
import polars as pl

cols = list('abc')
df = pl.DataFrame(np.linspace(1, 9, 9).reshape(3, 3),
                  schema=cols)
scalars = pl.DataFrame(np.linspace(1, 3, 3)[:, None],
                       schema=cols)

In [13]: df
Out[13]: 
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 1.0 ┆ 2.0 ┆ 3.0 │
│ 4.0 ┆ 5.0 ┆ 6.0 │
│ 7.0 ┆ 8.0 ┆ 9.0 │
└─────┴─────┴─────┘

In [14]: scalars
Out[14]: 
shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 1.0 ┆ 2.0 ┆ 3.0 │
└─────┴─────┴─────┘

I can accomplish this easily in Pandas as shown below by delegating to NumPy broadcasting, but was wondering what the best way to do this is without going back and forth between Polars / Pandas representations.

In [16]: df.to_pandas() / scalars.to_numpy()
Out[16]: 
     a    b    c
0  1.0  1.0  1.0
1  4.0  2.5  2.0
2  7.0  4.0  3.0

I found this similar question where the scalar constant is already a row in the original frame, but don't see how to leverage a row from another frame.

Best I can come up with thus far is combining the frames and doing some... nasty looking things :D

In [31]: (pl.concat([df, scalars])
    ...:    .with_columns(pl.all() / pl.all().tail(1))
    ...:    .head(-1))
Out[31]: 
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 1.0 ┆ 1.0 ┆ 1.0 │
│ 4.0 ┆ 2.5 ┆ 2.0 │
│ 7.0 ┆ 4.0 ┆ 3.0 │
└─────┴─────┴─────┘

Solution:

I think you found out a very unique/interesting and clever solution. Consider also just iterating over columns:

df.select(column / scalars[column.name] for column in df.iter_columns())

df.select(pl.col(k) / scalars[k] for k in df.columns)

df.with_columns(pl.col(k).truediv(scalars[k]) for k in df.columns)