--- title: "NCBI" output: rmarkdown::html_vignette description: > This vignette describes how webseq can interact with the NCBI databases. vignette: > %\VignetteIndexEntry{NCBI} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r} library(webseq) ``` ## Download genome assemblies Download genbank file for GCF_003007635.1. The function will access files within this directory: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/007/635/ Let's download GCF_003007635.1_ASM300763v1_genomic.gbff.gz! ```{r, eval = FALSE} ncbi_download_genome("GCF_003007635.1", type = "genomic.gbff") ``` If we try to download it again, the function will indicate that the file is already downloaded. ```{r, eval = FALSE} ncbi_download_genome("GCF_003007635.1", type = "genomic.gbff") ``` ## Download metadata Let's download some metadata for this assembly! ```{r, eval = FALSE} assembly_meta <- ncbi_get_meta("GCF_003007635.1", db = "assembly") ``` Using this metadata we can find the BioSample ID associated with the assembly. Let's use this ID to get the BioSample UID of the sample within the NCBI BioSample database. ```{r, eval = FALSE} biosample_uid <- ncbi_get_uid(assembly_meta$biosample, db = "biosample") biosample_uid ``` And then get the metadata itself ```{r, eval = FALSE} biosample_meta <- get_meta(biosample_uid$uid, db = "biosample") biosample_meta ``` ## Parse metadata files Accessing metadata for one assembly at a time can take quite a while if you have a large number fo queries. However, if you want to access metadata for all hits of a search term, you can follow a hybrid approach: download the metadata manually and parse it with webseq. Let's download assembly metadata for all Thiobacillus denitrificans genomes! NCBI link: https://www.ncbi.nlm.nih.gov/assembly/?term=Thiobacillus+denitrificans upper right corner -> send to -> file -> format = xml -> create file Download the file and then parse it. ```{r, eval = FALSE} ncbi_parse_assembly_xml("assembly_summary.xml") ``` Let's download biosample metadata as well for all Thiobacillus denitrificans samples! NCBI link: https://www.ncbi.nlm.nih.gov/biosample/?term=Thiobacillus+denitrificans upper right corner -> send to -> file -> format = full (text) -> create file Download the file and then parse it. ```{r, eval = FALSE} ncbi_parse_biosample_txt("biosample_summary.txt") ```