---
title: "Getting started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Getting started}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{css, echo=FALSE}
.custom_note {
border: solid 3px #08505e;
background-color: #0b788e;
padding: 5px;
margin-bottom: 10px;
border-radius: 3px;
}
.custom_note > p, .custom_note > p > code {
color: white;
}
```
The first thing to do before using `tidypolars` is to get some data as a Polars
`DataFrame` or `LazyFrame`. You can read files with `polars::pl$read_*()`
functions (to import them as `DataFrame`s) or with `polars::pl$scan_*()`
functions (to import them as `LazyFrame`s). `polars` can read various file
formats, such as CSV, Parquet, or JSON.
You could also read data with other packages and then convert it with
`as_polars_df()` (or `as_polars_lf()` if you want to make it a
`LazyFrame`).
Note: as_polars_df()
and as_polars_lf()
are merely
convenience functions to quickly convert data to a polars object, which is
useful for showcase purposes. However, converting data from R to polars has
some cost. In real-life usecases, be sure to load the data with the
pl\$scan_\*()
or the pl\$read_\*()
functions.
Here, we're going to use the `who` dataset that is available in the `tidyr`
package. I import it both as a classic R `data.frame` and as a Polars `DataFrame`
so that we can easily compare `dplyr` and `tidypolars` functions.
```{r setup}
library(polars)
library(tidypolars)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)
who_df <- tidyr::who
who_pl <- as_polars_df(tidyr::who)
```
`tidypolars` provides methods for `dplyr` and `tidyr` S3 generics. In simpler words, it
means that you can use the same functions on a Polars `DataFrame` or `LazyFrame`
as in a classic `tidyverse` workflow and it should just work (if it doesn't,
please [open an issue](https://github.com/etiennebacher/tidypolars/issues)).
Note that you still need to load `dplyr` and `tidyr` in your code.
Here's an example of some `dplyr` and `tidyr` code on the classic R `data.frame`:
```{r}
who_df |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
head()
```
We can simply use our Polars dataset instead:
```{r}
who_pl |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
head()
```
If you use Polars lazy API, you need to call `compute()` at the end of the
chained expression to evaluate the query:
```{r}
who_pl_lazy <- as_polars_lf(tidyr::who)
who_pl_lazy |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
compute() |>
head()
```
`tidypolars` also supports many functions from `base`, `lubridate` or `stringr`.
When these are used inside `filter()`, `mutate()` or `summarize()`, `tidypolars`
will automatically convert them to use the Polars engine under the hood. Take
a look at the vignette ["R and Polars expressions"](https://tidypolars.etiennebacher.com/articles/r-and-polars-expressions) for more information.