EN VI
Posts (0)
Questions (4)
2024-03-11 11:30:04
You can use map_filter function, like below. df .withColumn( "filtered_map", expr("size(map_filter(Maptype_col, (k, v) -> v > 1 ))") ) +-------------+------------------------+------------+ |...
2024-03-11 18:00:12
Without using RDD's or rather DataFrames in the main these days, no parallelization will occur. Nor for pandas dataframe. That is to say, no point in running on Spark. Can run it on Databricks of cour...
2024-03-14 13:00:07
You can use arrays_zip to zip both arrays and inline to inline explode array column values. You can also inline or inline_outer functions df .select( $"name", inline( arrays_zip(...
Tags: scala apache-spark
2024-03-14 19:00:12
Each of the partition will ask the external data source to filter data at source, and send only the required data as per the condition, how the source implements is not spark's concern. in your case...

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login