--- title: "Getting started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{css, echo=FALSE} .custom_note { border: solid 3px #08505e; background-color: #0b788e; padding: 5px; margin-bottom: 10px; border-radius: 3px; } .custom_note > p, .custom_note > p > code { color: white; } ``` The first thing to do before using `tidypolars` is to get some data as a Polars `DataFrame` or `LazyFrame`. You can read files with `polars::pl$read_*()` functions (to import them as `DataFrame`s) or with `polars::pl$scan_*()` functions (to import them as `LazyFrame`s). `polars` can read various file formats, such as CSV, Parquet, or JSON. You could also read data with other packages and then convert it with `as_polars_df()` (or `as_polars_lf()` if you want to make it a `LazyFrame`).

Note: as_polars_df() and as_polars_lf() are merely convenience functions to quickly convert data to a polars object, which is useful for showcase purposes. However, converting data from R to polars has some cost. In real-life usecases, be sure to load the data with the pl\$scan_\*() or the pl\$read_\*() functions.

Here, we're going to use the `who` dataset that is available in the `tidyr` package. I import it both as a classic R `data.frame` and as a Polars `DataFrame` so that we can easily compare `dplyr` and `tidypolars` functions. ```{r setup} library(polars) library(tidypolars) library(dplyr, warn.conflicts = FALSE) library(tidyr, warn.conflicts = FALSE) who_df <- tidyr::who who_pl <- as_polars_df(tidyr::who) ``` `tidypolars` provides methods for `dplyr` and `tidyr` S3 generics. In simpler words, it means that you can use the same functions on a Polars `DataFrame` or `LazyFrame` as in a classic `tidyverse` workflow and it should just work (if it doesn't, please [open an issue](https://github.com/etiennebacher/tidypolars/issues)). Note that you still need to load `dplyr` and `tidyr` in your code. Here's an example of some `dplyr` and `tidyr` code on the classic R `data.frame`: ```{r} who_df |> filter(year > 1990) |> drop_na(newrel_f3544) |> select(iso3, year, matches("^newrel(.*)_f")) |> arrange(iso3, year) |> rename_with(.fn = toupper) |> head() ``` We can simply use our Polars dataset instead: ```{r} who_pl |> filter(year > 1990) |> drop_na(newrel_f3544) |> select(iso3, year, matches("^newrel(.*)_f")) |> arrange(iso3, year) |> rename_with(.fn = toupper) |> head() ``` If you use Polars lazy API, you need to call `compute()` at the end of the chained expression to evaluate the query: ```{r} who_pl_lazy <- as_polars_lf(tidyr::who) who_pl_lazy |> filter(year > 1990) |> drop_na(newrel_f3544) |> select(iso3, year, matches("^newrel(.*)_f")) |> arrange(iso3, year) |> rename_with(.fn = toupper) |> compute() |> head() ``` `tidypolars` also supports many functions from `base`, `lubridate` or `stringr`. When these are used inside `filter()`, `mutate()` or `summarize()`, `tidypolars` will automatically convert them to use the Polars engine under the hood. Take a look at the vignette ["R and Polars expressions"](https://tidypolars.etiennebacher.com/articles/r-and-polars-expressions) for more information.