FAQ

Is tidypolars slower than polars?

No, or just marginally. The objective of tidypolars is not to modify the data, simply to translate the tidyverse syntax to polars syntax. polars is still in charge of doing all the data manipulations under the hood.

Therefore, there might be minor overhead because we still need to parse the expressions and rewrite them in polars syntax (see the Parsing expressions vignette) but this should be marginal. Here’s a small benchmark to compare the performance of polars and tidypolars:

library(polars)
library(tidypolars)
library(dplyr, warn.conflicts = FALSE)

pl_test <- pl$DataFrame(
  grp = sample(letters, 2*1e7, TRUE),
  val1 = sample(1:1000, 2*1e7, TRUE),
  val2 = sample(1:1000, 2*1e7, TRUE)
)

bench::mark(
  polars = pl_test$
    group_by("grp")$
    agg(
      pl$col("val1")$mean()$alias("x"), 
      pl$col("val2")$sum()$alias("y"),
      pl$col("val1")$median()$alias("z")
    ),
  tidypolars = pl_test |> 
    group_by(grp) |> 
    summarize(
      x = mean(val1),
      y = sum(val2),
      z = median(val1)
    ),
  check = FALSE,
  iterations = 15
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars        327ms    347ms      2.85  415.75KB    0.204
#> 2 tidypolars    523ms    528ms      1.87    3.71MB    1.25

bench::mark(
  polars = pl_test$
    filter(pl$col("grp") == "a" | pl$col("grp") == "b"),
  tidypolars = pl_test |> 
    filter(grp == "a" | grp == "b"),
  check = FALSE,
  iterations = 15
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars       70.8ms   72.5ms      13.7    21.6KB     0   
#> 2 tidypolars   76.1ms   79.2ms      12.6    82.2KB     1.94

Am I stuck with tidypolars?

No, tidypolars will always return DataFrames, LazyFrames or Series. Therefore, if at some point you want to use polars because you need more control or because you want to reduce your number of dependencies, you can easily do so.

Do I still need to load polars?

Yes, because tidypolars doesn’t provide any functions to create polars DataFrame or LazyFrame or to read data. You’ll still need to use polars for this.

Can I see some benchmarks with other tools?

Making accurate benchmarks of data wrangling tools is difficult and I won’t try to do it here. You should refer to DuckDB benchmarks.