| Title: | Open-Access Computational Biology Datasets |
|---|---|
| Description: | Efficiently access the 'Bedrock Bio' library of open-access computational biology datasets. Lazily query datasets backed by 'DuckDB' and 'Apache Iceberg', with support for predicate pushdown and column projection to the cloud storage backend. This enables quick, iterative access to otherwise massive, unwieldy datasets without downloading them in full. See <https://bedrock.bio> for available datasets and documentation. |
| Authors: | Liam Abbott [aut, cre, cph] |
| Maintainer: | Liam Abbott <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.3.1 |
| Built: | 2026-05-18 18:13:53 UTC |
| Source: | https://github.com/bedrock-bio/bedrock-bio-client |
Describe a table's metadata, citation, and columns
describe_table(name)describe_table(name)
name |
Table identifier (e.g., "ukb_ppp.pqtls") |
A named list with name, description, citation, source_url, license, and columns.
## Not run: library(bedrockbio) info <- describe_table("ukb_ppp.pqtls") info$name ## End(Not run)## Not run: library(bedrockbio) info <- describe_table("ukb_ppp.pqtls") info$name ## End(Not run)
List available tables in the Bedrock Bio library
list_tables()list_tables()
A character vector of table identifiers
## Not run: library(bedrockbio) list_tables() ## End(Not run)## Not run: library(bedrockbio) list_tables() ## End(Not run)
Lazily query a table
load_table(name)load_table(name)
name |
Table identifier (e.g., "ukb_ppp.pqtls") |
A lazy tbl backed by DuckDB, compatible with dplyr verbs.
Use describe_table() to see partition columns and per-column allowed
values; filter on partition columns for fastest reads.
## Not run: library(bedrockbio) library(dplyr) df <- load_table("dbsnp.vcf") |> filter(assembly == "GRCh38", chromosome == "22") |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect() ## End(Not run)## Not run: library(bedrockbio) library(dplyr) df <- load_table("dbsnp.vcf") |> filter(assembly == "GRCh38", chromosome == "22") |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect() ## End(Not run)