When you think bioinformaticians are really dumb.
seen from United States
seen from China

seen from United States
seen from Malaysia
seen from Malaysia
seen from Russia
seen from T1
seen from South Korea
seen from Netherlands
seen from Saudi Arabia
seen from United States

seen from Malaysia
seen from China
seen from Malaysia
seen from China
seen from South Korea
seen from China
seen from United States
seen from China
seen from South Korea
When you think bioinformaticians are really dumb.
UC San Diego researchers team up with Illumina to speed-read your microbiome
UC San Diego researchers team up with Illumina to speed-read your microbiome
The human microbiome — the total collection of bacteria, viruses and other microorganisms living in and on your body — has been linked to a variety of health and disease states, including obesity, allergies, asthma, and a rapidly growing list of other conditions. But as researchers try to sort out the complex relationship between microbial populations and human health and use that information to…
View On WordPress
UC San Diego researchers team up with Illumina to speed-read your microbiome
The human microbiome -- the total collection of bacteria, viruses and other microorganisms living in and on your body -- has been linked to a variety of health and disease states, including obesity, allergies, asthma, and a rapidly growing list of other conditions. But as researchers try to sort out the complex relationship between microbial populations and human health and use that information to diagnose or treat disease, they are generating a deluge of microbial sequence data that first needs to be organized and analyzed.
To this end, University of California, San Diego School of Medicine's Rob Knight, PhD , and his team built a microbiome analysis platform called QIIME (pronounced "chime" and short for "Quantitative Insights Into Microbial Ecology"). This software will now be more readily accessible to hundreds of thousands of researchers around the world through BaseSpace , a cloud-based app store offered by Illumina, a San Diego-based company that develops life science tools for the analysis of genetic variation.
"Previously, we relied on personal contacts and scientific publications to spread the word about QIIME, and then users needed to download several different software packages to their own computers. Users also needed some technical programming skills to use QIIME," said Knight, professor of pediatrics and computer science and engineering. "By working with Illumina, not only will many more researchers now be able to access QIIME from the cloud, the BaseSpace interface will make it much easier for non-technical researchers to analyze their data. This advancement will significantly ease the bottleneck in a variety of human and environmental microbiome studies."
Working in #qiime #equinedata #dontlogmeoutoriwillkillyou #pickingotus #bioinformatics #hopenobodyneedthecomputer #itsgoingtobetwodays #evillaugh #soorynotsorry (at Ford Hall)
It's been an up and down computer day.
It was "have a backup drive" day in lab. So, our computer guru (you know how in some labs everyone has a set of pipettes and graduated cylinders? In my lab, we report to the computer guru who pays us and hands out used mac books, keyboards and power adaptors when we ask nicely) when around today passing out 2TB drives so everyone can back up their machines before we move. He said I could have any color I wanted as long as it was black, and then he found me a red drive. So... now i have space to back up my hard drive.
My code got merged! (CELEBRATION!) Like, now people who are not me can analyze the statistical power of their microbiome data. Which no one else has had an approach for previously because most microbiome data doesn't follow normal distributions. So, you can't just apply cohen's d and expect to get observations that match with reality.
And then I tried to get the official version of my merged code and managed to royally fuck up my Qiime install. Im smart enough to use Qiime, but Im not smart enough to install Qiime.
And now, my rat who is supposed to be helping with data analysis is using his claws to stick to my chest. He's like, hanging horizontally from my clavicle. Not that this is related. Except that maybe it's a magical position for figuring out why I no longer have a fortran complier so we can't install scipy.
After lots of variations on this analysis pipeline, I've found that joining paired ends unnecessarily removes lots of sequences. Likely this is due to quality dropoff at about 250bp (fig below).
So I've concatenated both MiSeq runs (not joining) and then clipping all at 250bp. The UPARSE documentation demands this, but it is not very straightforward in their pipeline. So I used FastX toolkit to do this.
So this script is a mix of UPARSE, QIIME, and FastX. This has resulted in keeping more sequences (especially in low yield samples) than any other method I tried.
############# commands to run interactive node on aciss cluster qsub -q fatnodes -I module load usearch module load fastx_toolkit/0.0.13 module load qiime/1.8.0 printenv PATH # check to make sure lots of qiime dependencies are loaded. # Qiime has so many dependencies, it is difficult to load them all. # So might have to reload module version. This makes no sense. source /usr/local/packages/Modules/setmodule 3.2.9 source /usr/local/packages/Modules/setmodule 3.2.10 ################ ##### Check quality - this goes into R script Pickle2014/R/quality/ # split_libraries_fastq.py -v -q 0 -i raw1/r1readTRIM.fastq -b raw1/barcodesRenamed.fastq -o splitLib1F/ -m map.txt --barcode_type 16 # split_libraries_fastq.py -v -q 0 -i raw2/r1readTRIM.fastq -b raw2/barcodesRenamed.fastq -o splitLib2F/ -m map.txt --barcode_type 16 # split_libraries_fastq.py -v -q 0 -i raw1/r2readTRIM.fastq -b raw1/barcodesRenamed.fastq -o splitLib1R/ -m map.txt --barcode_type 16 # split_libraries_fastq.py -v -q 0 -i raw2/r2readTRIM.fastq -b raw2/barcodesRenamed.fastq -o splitLib2R/ -m map.txt --barcode_type 16 # Ended up combining forward reads and just analyzing these instead of joining. cat raw1/r1readTRIM.fastq raw2/r1readTRIM.fastq > seqs.fastq cat raw1/barcodesRenamed.fastq raw2/barcodesRenamed.fastq > barcodes.fastq split_libraries_fastq.py -v -q 0 --store_demultiplexed_fastq -i seqs.fastq -b barcodes.fastq -o splitLib/ -m map.txt --barcode_type 16 # -n 300 # trim to 250 length. This is not straightforward or well documented in UPARSE, so # farm out to fastx. fastx_trimmer -l 250 -i splitLib/seqs.fastq -o splitLib/seqs.trimmed.fastq -Q33 # get quality stats usearch -fastq_stats splitLib/seqs.trimmed.fastq -log splitLib/seqs.stats.log # remove low quality reads mkdir qF usearch -fastq_filter splitLib/seqs.trimmed.fastq -fastq_maxee 0.5 -fastaout qF/seqs.filtered.fasta # dereplicate sequences. Last step with files separate. mkdir deRep usearch -derep_fulllength qF/seqs.filtered.fasta -output deRep/seqs.filtered.derep.fasta -sizeout # filter singletons - This rids sigletons - Decided to do without # mkdir filterSingles # usearch -sortbysize deRep/seqs.filtered.derep.fasta -minsize 2 -output filterSingles/seqs.filtered.derep.mc2.fasta # clusterOTUs mkdir OTUs usearch -cluster_otus deRep/seqs.filtered.derep.fasta -otus OTUs/seqs.filtered.derep.repset.fasta # reference chimera check mkdir chiCheck usearch -uchime_ref OTUs/seqs.filtered.derep.repset.fasta -db scripts/gold.fa -strand plus -nonchimeras chiCheck/seqs.filtered.derep.repset.nochimeras.fasta # label OTUs using puthon script from UPARSE mkdir labelOTUs python scripts/fasta_number.py chiCheck/seqs.filtered.derep.repset.nochimeras.fasta OTU_ > labelOTUs/seqs.filtered.derep.repset.nochimeras.otus.fasta # match original quality filtered reads back to otus - this is with bash derep workaround. mkdir matchOTUs usearch -usearch_global qF/seqs.filtered.fasta -db labelOTUs/seqs.filtered.derep.repset.nochimeras.otus.fasta -strand plus -id 0.97 -uc matchOTUs/otu.map.uc # make otu table mkdir otuTable python scripts/uc2otutab_mod.py matchOTUs/otu.map.uc > otu-table.txt # convert to biom biom convert --table-type="OTU table" -i otu-table.txt -o otu-table.biom # **use QIIME 1.7, not 1.8** Dependency problem # assign taxonomy assign_taxonomy.py -t gg_13_5_otus/taxonomy/97_otu_taxonomy.txt -r gg_13_5_otus/rep_set/97_otus.fasta -i labelOTUs/seqs.filtered.derep.repset.nochimeras.otus.fasta -o assigned_taxonomy # add taxonomy to BIOM table biom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp assigned_taxonomy/seqs.filtered.derep.repset.nochimeras.otus_tax_assignments.txt -i otu-table.biom -o otu_table.biom # check sequencing depth. # print_biom_table_summary.py -i otu_table.biom ## for qiime <1.8 biom summarize-table -i otu_table.biom -o otu_table_summary.txt # for qiime >=1.8
I've been trying to find alternative methods to the standard QIIME OTU clustering. UPARSE (published here)
Here is the workflow I am currently using to process sequence data using a combination of QIIME and UPARSE. Ann W put this together based on Mike Robeson's post here.
The only step that takes lots of time is the OTU table python script. No idea why it is so slow, and it might be worth rewriting that script to speed things up.
# Split libraries with QIIME split_libraries_fastq.py -v -q 0 --store_demultiplexed_fastq -i $COMBINED/seqs.fastq -b $COMBINED/barcodes.fastq -o splitLib/ -m map.txt --barcode_type 16 # -n 300 # get quality stats usearch -fastq_stats splitLib/seqs.fastq -log splitLib/seqs.stats.log # remove low quality reads - trimmed short seqs - presumeably didn"t join correctly. mkdir qF usearch -fastq_filter splitLib/seqs.fastq -fastq_maxee 0.5 -fastaout qF/seqs.filtered.tmp.fasta -fastq_minlen 400 # -fastq_trunclen 296 sed 's/>/>barcodelabel=/' qF/seqs.filtered.tmp.fasta > qF/seqs.filtered.fasta # dereplicate sequences. Last step with files separate. mkdir deRep usearch -derep_fulllength qF/seqs.filtered.fasta -output deRep/seqs.filtered.derep.fasta -sizeout # filter singletons mkdir filterSingles usearch -sortbysize deRep/seqs.filtered.derep.fasta -minsize 2 -output filterSingles/seqs.filtered.derep.mc2.fasta # clusterOTUs mkdir OTUs usearch -cluster_otus filterSingles/seqs.filtered.derep.mc2.fasta -otus OTUs/seqs.filtered.derep.mc2.repset.fasta # reference chimera check mkdir chiCheck usearch -uchime_ref OTUs/seqs.filtered.derep.mc2.repset.fasta -db scripts/gold.fa -strand plus -nonchimeras chiCheck/seqs.filtered.derep.mc2.repset.nochimeras.fasta # label OTUs using puthon script from UPARSE mkdir labelOTUs python scripts/fasta_number.py chiCheck/seqs.filtered.derep.mc2.repset.nochimeras.fasta OTU_ > labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta # match original quality filtered reads back to otus - this is with bash derep workaround. mkdir matchOTUs usearch -usearch_global qF/seqs.filtered.fasta -db labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta -strand plus -id 0.97 -uc matchOTUs/otu.map.uc # make otu table mkdir otuTable # python scripts/uc2otutab.py matchOTUs/otu.map.uc > otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt python scripts/uc2otutab_jl.py matchOTUs/otu.map.uc > otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt #### still slow - running pbs script Monday afternoon. # convert to biom biom convert --table-type="OTU table" -i otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt -o otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.biom # assign taxonomy assign_taxonomy.py -t gg_13_5_otus/taxonomy/97_otu_taxonomy.txt -r gg_13_8_otus/rep_set/97_otus.fasta -i labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta -o assigned_taxonomy # add taxonomy to BIOM table biom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp assigned_taxonomy/seqs.filtered.derep.mc2.repset.nochimeras.OTUs_tax_assignments.txt -i otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.biom -o otuTable/otu_table.biom
Working to run new Urban Air sequencing data through QIIME recently. While trying to parallelize the pick_open_reference_otus.py script I ran in to a problem where 1 of the 12 parallel jobs failed. It turns out there's a poller.py script that's running waiting for all the jobs to finish and put their output in a specific directory before the script moves to subsequent steps. So if one of the jobs fails, the poller continues to look for these output files indefinitely -- see more details here.
As those more detailed instructions say, you just need to extract and rerun the commands for the failed job in the file ending with _jobs.txt found in the output directory for the parallel script. There is also a script to help you figure out which job failed: identify_missing_files.py.
For example, this is part of a PBS script to rerun the failed job:
##execute program here: pick_otus.py -i picked_otus/prefilter_otus/POTU_B57u_/POTU_B57u_.10.fasta -r /home4/adamea/greengenes_13_5/gg_13_5_otus/rep_set/97_otus.fasta -m uclust_ref --suppress_new_clusters -o picked_otus/prefilter_otus/POTU_B57u_ -s 0.6 --max_accepts 20 --max_rejects 500 --stepwords 20 --w 12 ; mv picked_otus/prefilter_otus/POTU_B57u_/POTU_B57u_.10_otus.log picked_otus/prefilter_otus//POTU_B57u_.10_otus.log; mv picked_otus/prefilter_otus/POTU_B57u_/POTU_B57u_.10_otus.txt picked_otus/prefilter_otus//POTU_B57u_.10_otus.txt; mv picked_otus/prefilter_otus/POTU_B57u_/POTU_B57u_.10_failures.txt picked_otus/prefilter_otus//POTU_B57u_.10_failures.txt; mv picked_otus/prefilter_otus/POTU_B57u_/POTU_B57u_10_clusters.uc picked_otus/prefilter_otus//POTU_B57u_10_clusters.uc ; exit
So next time you're waiting for a large parallel QIIME script to finish, make sure it's not just one failed job that's holding you back.