---
title: "FAQ"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{FAQ}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Is `tidypolars` slower than `polars`?
No, or just marginally. The objective of `tidypolars` is *not* to modify the
data, simply to translate the `tidyverse` syntax to `polars` syntax. `polars`
is still in charge of doing all the data manipulations under the hood.
Therefore, there might be minor overhead because we still need to parse the
expressions and rewrite them in `polars` syntax (see the [Parsing expressions](https://tidypolars.etiennebacher.com/articles/parsing-expressions.html)
vignette) but this should be marginal. Here's a small benchmark to compare the
performance of `polars` and `tidypolars`:
```{r}
library(polars)
library(tidypolars)
library(dplyr, warn.conflicts = FALSE)
pl_test <- pl$DataFrame(
grp = sample(letters, 2*1e7, TRUE),
val1 = sample(1:1000, 2*1e7, TRUE),
val2 = sample(1:1000, 2*1e7, TRUE)
)
bench::mark(
polars = pl_test$
group_by("grp")$
agg(
pl$col("val1")$mean()$alias("x"),
pl$col("val2")$sum()$alias("y"),
pl$col("val1")$median()$alias("z")
),
tidypolars = pl_test |>
group_by(grp) |>
summarize(
x = mean(val1),
y = sum(val2),
z = median(val1)
),
check = FALSE,
iterations = 15
)
bench::mark(
polars = pl_test$
filter(pl$col("grp") == "a" | pl$col("grp") == "b"),
tidypolars = pl_test |>
filter(grp == "a" | grp == "b"),
check = FALSE,
iterations = 15
)
```
# Am I stuck with `tidypolars`?
No, `tidypolars` will always return `DataFrame`s, `LazyFrame`s or `Series`.
Therefore, if at some point you want to use `polars` because you need more
control or because you want to reduce your number of dependencies, you can
easily do so.
# Do I still need to load `polars`?
Yes, because `tidypolars` doesn't provide any functions to create `polars`
`DataFrame` or `LazyFrame` or to read data. You'll still need to use `polars`
for this.
# Can I see some benchmarks with other tools?
Making accurate benchmarks of data wrangling tools is difficult and I won't try
to do it here. You should refer to [DuckDB benchmarks](https://duckdblabs.github.io/db-benchmark/).