.. _usagescenarios: ############# Case Studies ############# ============= Case Study 1 ============= Consider a scenario where somone is interested in searching for single-cell RNA-seq datasets. In particular, the interest is in studying retina: :: $ pysradb search --query "single-cell rna-seq retina" study_accession experiment_accession experiment_title sample_taxon_id sample_scientific_name experiment_library_strategy experiment_library_source experiment_library_selection sample_accession sample_alias experiment_instrument_model pool_member_spots run_1_size run_1_accession run_1_total_spots run_1_total_bases SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other SRS7946094 GSM4995565 Illumina NovaSeq 6000 55435867 2637580797 SRR13329759 55435867 6874047508 SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946093 GSM4995564 Illumina NovaSeq 6000 96123725 4107807391 SRR13329758 96123725 12688331700 SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946092 GSM4995563 Illumina NovaSeq 6000 94345783 4056010488 SRR13329757 94345783 12453643356 SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946091 GSM4995562 Illumina NovaSeq 6000 99487074 4240172698 SRR13329756 99487074 13132293768 SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7946090 GSM4995561 Illumina NovaSeq 6000 88048461 3817540828 SRR13329755 88048461 11622396852 SRP257758 SRX9537754 GSM4916438: Pou4f2-tdTomato/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743995 GSM4916438 Illumina HiSeq 2500 364683840 8246658699 SRR13091939 364683840 32456861760 SRP257758 SRX9537753 GSM4916437: Atoh7-zsGreen/lacZ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743994 GSM4916437 Illumina HiSeq 2500 530456067 11895864680 SRR13091938 530456067 47210589963 SRP257758 SRX9537752 GSM4916436: Atoh7-zsGreen/+ E17.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743993 GSM4916436 Illumina HiSeq 2500 389849416 8671923722 SRR13091937 389849416 34696598024 SRP257758 SRX9537751 GSM4916435: Atoh7-zsGreen/lacZ E14.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743992 GSM4916435 Illumina HiSeq 2500 328878355 7875737709 SRR13091936 328878355 29270173595 SRP257758 SRX9537750 GSM4916434: Atoh7-zsGreen/+ E14.5 scRNA-seq; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA SRS7743991 GSM4916434 Illumina HiSeq 2500 522040155 12760941656 SRR13091935 522040155 46461573795 ERP118072 ERX3614517 NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606 Homo sapiens OTHER TRANSCRIPTOMIC Oligo-dT ERS3920269 SAMEA6120013 NextSeq 500 5818488 43355751 ERR3619129 1457318 109897743 ERP118072 ERX3614516 NextSeq 500 sequencing; 3' mRNA-seq of protrusions and cell bodies of BJ, PC-3M, RPE-1, U-87 and WM-266.4 cells 9606 Homo sapiens OTHER TRANSCRIPTOMIC Oligo-dT ERS3920268 SAMEA6120012 NextSeq 500 5422441 40645479 ERR3619125 1359663 102468758 SRP288715 SRX9369597 RPE1_SS119_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591452 RPE1_SS119_p10.bam Illumina HiSeq 2000 5062938 88426773 SRR12904705 5062938 202517520 SRP288715 SRX9369596 RPE1_SS119_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591451 RPE1_SS119_p0.bam Illumina HiSeq 2000 978835 19219630 SRR12904706 978835 39153400 SRP288715 SRX9369595 RPE1_SS111_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591450 RPE1_SS111_p10.bam Illumina HiSeq 2000 6205827 108129733 SRR12904707 6205827 248233080 SRP288715 SRX9369594 RPE1_SS111_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591449 RPE1_SS111_p0.bam Illumina HiSeq 2000 928703 18488436 SRR12904708 928703 37148120 SRP288715 SRX9369593 RPE1_SS51_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591448 RPE1_SS51_p10.bam Illumina HiSeq 2000 6088168 106065537 SRR12904709 6088168 243526720 SRP288715 SRX9369592 RPE1_SS51_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591447 RPE1_SS51_p0.bam Illumina HiSeq 2000 1624227 30610200 SRR12904710 1624227 64969080 SRP288715 SRX9369591 RPE1_SS48_p10 9606 Homo sapiens OTHER GENOMIC other SRS7591446 RPE1_SS48_p10.bam Illumina HiSeq 2000 8117881 139408135 SRR12904711 8117881 324715240 SRP288715 SRX9369590 RPE1_SS48_p0 9606 Homo sapiens OTHER GENOMIC other SRS7591445 RPE1_SS48_p0.bam Illumina HiSeq 2000 776140 15821200 SRR12904712 776140 31045600 By default search returns first 20 hits. ``SRP299803`` seems like a project of interest. However the information outputted by the ``search`` command is pretty limited. We want to look up more detailed information about this project: :: $ pysradb metadata SRP299803 | head study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other PAIRED SRS7946094 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 55435867 2637580797 SRR13329759 55435867 6874047508 SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946093 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 96123725 4107807391 SRR13329758 96123725 12688331700 SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946092 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 94345783 4056010488 SRR13329757 94345783 12453643356 SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946091 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 99487074 4240172698 SRR13329756 99487074 13132293768 SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946090 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 88048461 3817540828 SRR13329755 88048461 11622396852 It is also possible to get more detailed information using the ``--detailed`` flag: :: $ pysradb metadata SRP075720 --detailed run_accession study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_total_spots run_total_bases run_alias sra_url experiment_alias source_name strain background genotype tissue/cell type molecule subtype ena_fastq_http ena_fastq_http_1 ena_fastq_http_2 ena_fastq_ftp ena_fastq_ftp_1 ena_fastq_ftp_2 SRR13329759 SRP299803 SRX9756769 GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq GSM4995565: scATAC_Retina_WT; Mus musculus; ATAC-seq 10090 Mus musculus ATAC-seq GENOMIC other PAIRED SRS7946094 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 55435867 2637580797 55435867 6874047508 GSM4995565_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329759 GSM4995565 wild type_retina C57BL/6 wild type retina http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/059/SRR13329759/SRR13329759_2.fastq.gz SRR13329758 SRP299803 SRX9756768 GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq GSM4995564: scRNA_Retina_VSX2SEKO_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946093 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 96123725 4107807391 96123725 12688331700 GSM4995564_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra70/SRR/013017/SRR13329758 GSM4995564 Vsx2SE Δ/Δ_retina C57BL/6 Vsx2SE {delta}/{delta} retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/058/SRR13329758/SRR13329758_2.fastq.gz SRR13329757 SRP299803 SRX9756767 GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq GSM4995563: scRNA_Retina_VSX2SEKO_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946092 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 94345783 4056010488 94345783 12453643356 GSM4995563_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra79/SRR/013017/SRR13329757 GSM4995563 Vsx2SE Δ/Δ_retina C57BL/6 Vsx2SE {delta}/{delta} retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/057/SRR13329757/SRR13329757_2.fastq.gz SRR13329756 SRP299803 SRX9756766 GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq GSM4995562: scRNA_Retina_WT_Rep2; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946091 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 99487074 4240172698 99487074 13132293768 GSM4995562_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/013017/SRR13329756 GSM4995562 wild type_retina C57BL/6 wild type retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/056/SRR13329756/SRR13329756_2.fastq.gz SRR13329755 SRP299803 SRX9756765 GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq GSM4995561: scRNA_Retina_WT_Rep1; Mus musculus; RNA-Seq 10090 Mus musculus RNA-Seq TRANSCRIPTOMIC cDNA PAIRED SRS7946090 Illumina NovaSeq 6000 Illumina NovaSeq 6000 ILLUMINA 88048461 3817540828 88048461 11622396852 GSM4995561_r1 https://sra-download.ncbi.nlm.nih.gov/traces/sra72/SRR/013017/SRR13329755 GSM4995561 wild type_retina C57BL/6 wild type retina 3' RNA http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_1.fastq.gz era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR133/055/SRR13329755/SRR13329755_2.fastq.gz Having made sure this dataset is indeed of interest, we want to save some work and see if the processed dataset has been made available on GEO by the authors: :: $ pysradb srp-to-gse SRP299803 study_accession study_alias SRP299803 GSE164044 So indeed a GEO project exists for this SRA dataset. Notice, that the GEO information was also visible in the ``metadata --detailed`` operation. Assume we were in posession of the GSM id of one of the experiments to start off with, say ``GSE4995565``. Starting from this GSM id, we want to get the following information: * SRP id of the project * GSE id of the project * SRX id of the experiment * SRR id(s) corresponding to the experiment Get SRP id: :: $ pysradb gsm-to-srp GSM4995565 experiment_alias study_accession GSM4995565 SRP299803 Get GSE id: :: $ pysradb gsm-to-gse GSM4995565 experiment_alias study_alias GSM4995565 GSE164044 Get SRX id: :: $ pysradb gsm-to-srx GSM4995565 experiment_alias experiment_accession GSM4995565 SRX9756769 Getting SRR id(s): :: $ pysradb gsm-to-srr GSM4995565 experiment_alias run_accession GSM4995565 SRR13329759 ============= Case Study 2 ============= Our first case study included metadata search. Next, we explore downloading datasets. We have a SRP id to start off with: ``SRP000941``. We want to quickly checkout its contents: :: $ pysradb metadata SRP000941 --detailed| head study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_name library_strategy library_source library_selection library_layout sample_accession sample_title instrument instrument_model instrument_model_desc total_spots total_size run_accession run_total_spots run_total_bases SRP000941 SRX056722 Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells 9606 Homo sapiens SAK270 ChIP-Seq GENOMIC ChIP SINGLE SRS184466 Illumina HiSeq 2000 Illumina HiSeq 2000 ILLUMINA 26900401 531654480 SRR179707 26900401 807012030 SRP000941 SRX027889 Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606 Homo sapiens SAK201 ChIP-Seq GENOMIC ChIP SINGLE SRS116481 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 37528590 779578968 SRR067978 37528590 1351029240 SRP000941 SRX027888 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens LLH1U ChIP-Seq GENOMIC RANDOM SINGLE SRS116483 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 13603127 3232309537 SRR067977 13603127 489712572 SRP000941 SRX027887 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens DM219 ChIP-Seq GENOMIC RANDOM SINGLE SRS116562 Illumina Genome Analyzer II Illumina Genome Analyzer II ILLUMINA 22430523 506327844 SRR067976 22430523 807498828 This project is a collection of multiple assays. :: $ pysradb metadata SRP000941 --detailed | tr -s ' ' | cut -f5 -d ' ' | sort | uniq -c 999 Bisulfite-Seq 768 ChIP-Seq 1 library_strategy 121 OTHER 353 RNA-Seq 28 WGS We want to however only download ``RNA-seq`` samples: :: $ pysradb metadata SRP000941 --detailed | grep 'study\|RNA-Seq' | pysradb download This will download all ``RNA-seq`` samples coming from this project using ``aspera-client``, if available. Alternatively, it can also use ``wget``. Downloading an entire project is easy: :: $ pysradb download -p SRP000941 Downloads are organized by ``SRP/SRX/SRR`` mimicking the hiererachy of SRA projects.