Loading...
Loading...
Loading...
---
title: '454 SOP'
redirect_from: '/wiki/454_SOP'
---
**NOTE:** Although this is an SOP, it is something of a work in progress
and continues to be modified as we learn more. If you are using this
protocol in a paper, you must cite the Schloss et al. 2011 PLoS ONE
paper and cite the date you accessed this page:
Schloss PD, Gevers D, Westcott SL. (2011). Reducing the effects of PCR
amplification and sequencing artifacts on 16S rRNA-based studies. PloS
ONE. 6:e27310.
This goal of this tutorial is to demonstrate the standard operating
procedure (SOP) that the Schloss lab uses to process their 16S rRNA gene
sequences that are generated by 454 pyrosequencing. Throughout the
tutorial several alternative approaches will be mentioned along with why
we might deviate from the SOP to use these approaches. This SOP is
largely the product of a series of manuscripts that we have published
and users are advised to consult these for more details and background
data. The workflow is being divided into several parts shown here in the
table of contents for the tutorial:
## Logistics
The first step of the tutorial is to understand how we typically set up
a plate of barcodes for sequencing. First, you can obtain a set of
barcodes from multiple sources. We utilize the "Broad" barcodes that
were used for the HMP 16S rRNA gene sequencing effort. You can find
these barcodes and primers to sequence the V13, V35, and V69 regions
online. Next, it is worth noting that the sequencing is done in the
"reverse direction" - e.g. starting at the 3' end of the V5 region,
we sequence back towards the 5' end of the V3 region. We picked the V35
region because we liked these primers the best for the types of samples
that we generally process (i.e. human and mouse). You may choose to use
another region based on the expected biodiversity of your samples.
Second, this set of barcodes will allow us to simultaneously sequence 96
samples. A priori, we dedicate two of these barcodes to controls. The
first is for resequencing a mock community made up of a defined
consortia of 16S rRNA gene sequences where we know the true sequence.
This is helpful in assessing sequencing errors and drift in the process.
The second is a "generous donor" sample, which is a representative
sample (e.g. stool) that we resequence on each plate. This provides a
more realistic understanding of how processing drift might be affecting
our analysis. Finally, we generally utilize half of a 454 plate for each
set of 96 barcodes. A typical plate has two halves (duh) and it is not
advisable to put the same barcode-primer combination on these two halves
if they are used for different samples as there is leakage across the
gasket.
Starting out we need to first determine, what was our question? The
Schloss lab is interested in understanding the effect of normal
variation in the gut microbiome on host health. To that end we collected
fresh feces from mice on a daily basis for 365 days post weaning (we're
accepting applications). During the first 150 days post weaning (dpw),
nothing was done to our mice except allow them to eat, get fat, and be
merry. We were curious whether the rapid change in weight observed
during the first 10 dpw affected the stability microbiome compared to
the microbiome observed between days 140 and 150. We will address this
question in this tutorial using a combination of OTU, phylotype, and
phylogenetic methods. To make this tutorial easier to execute, we are
providing only part of the data - you are given the flow files for one
animal at 10 time points (5 early and 5 late). In addition, to
sequencing samples from mice fecal material, we resequenced a mock
community composed of genomic DNA from 21 bacterial and archaeal
strains. We will use the 10 fecal samples to look at how to analyze
microbial communities and the mock community to measure the error rate
and its effect on other analyses.
When we get our sequences back they arrive as an sff file. Often times
people will only get back a fasta file or they may get back the fasta
and qual files. You really need the sff file or at least the fasta,
qual, and flow file data. If your sequence provider can't or won't
provide these data find a new sequence provider. It is also critical to
know the barcode and primer sequence used for each sample. We have heard
reports from many users that their sequence provider has already trimmed
and quality trimmed the data for them. Ahemm\... don't trust them. If
your sequence provider won't give you your primer or barcode sequences,
move on and use someone else. For this tutorial you will need several
sets of files. To speed up the tutorial we provide some of the
downstream files that take awhile to generate (e.g. the output of
shhh.flows):
- [ example data from schloss lab](https://mothur.s3.us-east-2.amazonaws.com/wiki/sopdata.zip) that
will be used with this tutorial. It was extracted from the very
large [ full sff file](https://mothur.s3.us-east-2.amazonaws.com/wiki/sop_sff.bz2)
- [ lookup data for use with shhh.flows](/wiki/Lookup_files)
- [ SILVA-based bacterial reference
alignment](https://mothur.s3.us-east-2.amazonaws.com/wiki/silva.bacteria.zip)
- [ mothur-formatted version of the RDP training set
(v.9)](https://mothur.s3.us-east-2.amazonaws.com/wiki/trainset9_032012.pds.zip)
It is generally easiest to decompress these files and to then move the
contents of the Trainset9\_032012.pds and the silva.bacteria folders
into the Schloss\_SOP folder. You will also want to move the contents of
the mothur executable folder there as well. If you are a sysadmin wiz
(or novice) you can probably figure out how to put mothur in your path,
but this will get you what you need for now.
In addition, you probably want to get your hands on the following\...
- [textwranger](https://www.barebones.com/products/textwrangler/) /
[emacs](https://www.gnu.org/software/emacs/) /
[vi](https://www.vim.org/) / or some other text editor
- [R](https://www.r-project.org/), Excel, or another program to graph
data
- Adobe Illustrator, Safari, or [inkscape](https://inkscape.org/)
- [figtree](https://github.com/rambaut/figtree/)
It is generally easiest to use the "current" option for many of the
commands since the file names get very long. Because this tutorial is
meant to show people how to use mothur at a very nuts and bolts level,
we will only selectively use the current option to demonstrate how it
works. Generally, we will use the full file names.
## Getting started
Because of the large size of the original sff file (\~800 MB) and that
the data are not published, the folder you pulled down above does not
contain the sff file. Instead we are giving you 21 of the 96 flow files
that were outputted from the first several commands here. This tutorial
will tell you when to pick up with the data in the folder\...
As mentioned above we get data in the form of an sff file and will use
mothur's implementation of [sffinfo](/wiki/sffinfo) to extract the
fasta, qual, and flow data from the binary file:
mothur > sffinfo(sff=GQY1XT001.sff, flow=T)
Sometimes it is not possible to get the sff file because your sequence
provider is sketchy or because you are pulling data down from some other
scientist's study who didn't provide the data. In those cases you may
need to skip that step. Regardless, let's see what our sequences look
like that are in the fasta file usig the
[summary.seqs](/wiki/summary.seqs) command:
mothur > summary.seqs(fasta=GQY1XT001.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 40 40 0 2 1
2.5%-tile: 1 117 117 0 4 19085
25%-tile: 1 455 455 0 4 190844
Median: 1 513 513 0 5 381687
75%-tile: 1 540 540 0 5 572530
97.5%-tile: 1 559 559 1 6 744289
Maximum: 1 1041 1041 28 31 763373
Mean: 1 471 471 0 4
# of Seqs: 763373
This is the sequence profile for the entire half plate. In the flow
files we have provided to you, there are a total of \~69000 sequences.
## Reducing sequencing error
### Using flowgrams
Our SOP will use the [shhh.flows](/wiki/shhh.flows) command, which
is the mothur implementation of the PyroNoise component of the
AmpliconNoise suite of programs. Others that don't have the
computational capacity or don't have their flow data may want to jump
ahead to the trim.seqs section of the tutorial. The rest of the SOP will
assume that you are using the SOP approach.
To run shhh.flows, we need to process our flow data to do several
things. First, we need to separate each flowgram according to the
barcode and primer combination. We also need to make sure that our
sequences are a minimum length and we need to cap the the number of
flowgrams to that length. The default of
[trim.flows](/wiki/trim.flows) is 450 flows. If you are using GSFLX
data you will probably want to play around with something in the order
of 300-350 flows. We will also allow for 1 mismatch to the barcode and 2
mismatches to the primer. Running [trim.flows](/wiki/trim.flows)
will make use of the [oligos\_file](/wiki/Oligos_File) that is
provided with the SOP Data that you pulled down from the wiki.
mothur > trim.flows(flow=GQY1XT001.flow, oligos=GQY1XT001.oligos, pdiffs=2, bdiffs=1)
This will create 96 separate trim.flows and scrap.flows files as well as
a file called GQY1XT001.flow.files, which holds the names of each of the
trim.flows files. The folder you have pulled down has 21 of these flow
files along with a file called GQY1XT001.flow.files. Next we will want
to run [shhh.flows](/wiki/shhh.flows) to de-noise our sequence
data. If you don't want take the time to run shhh.flows, you can use
the processed files that are in the folder you pulled down above.
shhh.flows can be run several ways depending on your hardware or
operating system. The first way is to use the precomputed output for
this dataset that came with the files you downloaded. Clearly this
won't work for your own data :). Alternatively, we can stay within
mothur and run the following:
mothur > shhh.flows(file=GQY1XT001.flow.files)
It took me about 17 minutes to run shhh.flows on the 96 samples
in GQY1XT001.flow.files with 16 processors. Note that if you are using
data recently generated on the new 454 platform and chemistry there is a
different flow order. If this is your case, you'll want to include
order=B in trim.flows and shhh.flows. If you're using IonTorrent
(why!?), you'll want to use order=I.
The two important files from running shhh.flows are the shhh.fasta and
shhh.names. You will now feed these into
[trim.seqs](/wiki/trim.seqs) to actually remove the barcode and
primer sequences, make sure everything is 200 bp long, remove sequences
with homoopolymers longer than 8 bp, and get the reverse complement for
each sequence. This will also create a new [count file](/wiki/count_file), which maps counts of redundant
sequences to a unique sequence and indicates what group or sample each
sequence came from:
mothur > trim.seqs(fasta=GQY1XT001.shhh.fasta, name=GQY1XT001.shhh.names, oligos=GQY1XT001.oligos, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T)
Lets see what the new fasta data look like:
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.fasta, count=GQY1XT001.shhh.trim.count_table)
To demonstrate the current option, notice that you can also run this
command as follows and get the same output:
mothur > summary.seqs(count=current)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 247 247 0 3 1
2.5%-tile: 1 255 255 0 4 1725
25%-tile: 1 261 261 0 4 17244
Median: 1 266 266 0 4 34488
75%-tile: 1 274 274 0 5 51732
97.5%-tile: 1 292 292 0 6 67251
Maximum: 1 312 312 0 8 68975
Mean: 1 268.629 268.629 0 4.36483
# of unique seqs: 21886
total # of seqs: 68975
If you are ever curious what is "current" and what isn't, you can run
the [get.current](/wiki/get.current) command:
mothur > get.current()
### Using quality scores
If for some reason you are unable to run shhh.flows with your data, a
good alternative is to use the [trim.seqs](/wiki/trim.seqs) command
using a 50-bp sliding window and to trim the sequence when the average
quality score over that window drops below 35. Our results suggest that
the sequencing error rates by this method are very good, but not quite
as good as by shhh.flows and that the resulting sequences tend to be a
bit shorter.
mothur > trim.seqs(fasta=GQY1XT001.fasta, oligos=GQY1XT001.oligos, qfile=GQY1XT001.qual, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50)
If you are in a bind and do not have access to the quality score data
the following will get you through\...
mothur > trim.seqs(fasta=GQY1XT001.fasta, oligos=GQY1XT001.oligos, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, keepfirst=200)
Basically what we've found is that the sequence quality really nose
dives after 250 bp for Titanium data. Please send all complaints to 454,
not me. Let's take a look at what the sequences look like from this
approach:
mothur > summary.seqs(fasta=GQY1XT001.trim.fasta)
Again, you will have fewer sequences and they will be shorter than those
obtained by the shhh.flows approach.
## Processing improved sequences
Again, this SOP uses the output from the shhh.flows approach. If you
used the trim.seqs approach, you will need to adjust the file names
appropriately. The next step is to simplify the dataset by working with
only the unique sequences. We aren't chucking anything here, we're
just making the life of your CPU and RAM a bit easier. We will do this
with the [unique.seqs](/wiki/unique.seqs) command. Because we
already have a count file going, we will need to provide that as input
with our fasta sequence:
mothur > unique.seqs(fasta=GQY1XT001.shhh.trim.fasta, count=GQY1XT001.shhh.trim.count_table)
Looking at the fasta data we see\...
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.unique.fasta, count=GQY1XT001.shhh.trim.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 247 247 0 3 1
2.5%-tile: 1 255 255 0 4 1725
25%-tile: 1 261 261 0 4 17244
Median: 1 266 266 0 4 34488
75%-tile: 1 274 274 0 5 51732
97.5%-tile: 1 292 292 0 6 67251
Maximum: 1 312 312 0 8 68975
Mean: 1 268.629 268.629 0 4.36483
# of unique seqs: 14255
total # of seqs: 68975
We have basically simplified the dataset by about 5-fold. This means
that a number of the downstream analyses will be 5 to 25 times faster
than if we were working with the unique data. Next, we need to generate
an alignment of our data using the [align.seqs](/wiki/align.seqs)
command by aligning our data to the SILVA-compatible [alignment
database](/wiki/alignment_database) reference alignment. I prefer
the [silva reference alignment](/wiki/silva_reference_files),
for the reasons I articulated in a recent [PLoS Computational
Biology](https://www.ncbi.nlm.nih.gov/pubmed/20011594) paper that I
published. For now, if your computer has less than 2 GB of RAM you
should probably stick with the [ greengenes reference
alignment](https://mothur.s3.us-east-2.amazonaws.com/wiki/greengenes.alignment.zip) and tell your PI
to [order you some more
RAM](https://www.google.com/search?client=safari&rls=en&q=ram&ie=UTF-8&oe=UTF-8#q=ram&hl=en&client=safari&rls=en&prmd=ivnsr&source=univ&tbs=shop:1&tbo=u&sa=X&ei=ywtoTbuxOJC6tgftipHmAw&ved=0CHwQrQQ&biw=1290&bih=1468&bav=on.1,or.&fp=61ddd4f1c4c47812).
mothur > align.seqs(fasta=GQY1XT001.shhh.trim.unique.fasta, reference=silva.bacteria.fasta)
Alternatively, you can run it as\...
mothur > align.seqs(reference=silva.bacteria.fasta)
I takes about 3 seconds to run on my MacBook Pro. Taking a look at the
newly aligned sequences we see:
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.unique.align, count=GQY1XT001.shhh.trim.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 15695 27583 247 0 3 1
2.5%-tile: 16116 27659 255 0 4 1725
25%-tile: 16312 27659 261 0 4 17244
Median: 16352 27659 266 0 4 34488
75%-tile: 16421 27659 274 0 5 51732
97.5%-tile: 21280 27659 292 0 6 67251
Maximum: 21304 28429 312 0 8 68975
Mean: 17044 27659 268 0 4
# of unique seqs: 14255
total # of seqs: 68975
We can see that pretty much all of our sequences end at position 27659
of the alignment space (the full alignment is 50,000 columns long). We
also see that 97.5% of our sequences are at least 255 bp long.
To make sure that all of the sequences overlap in the same alignment
space we need to remove those sequences that are outside the desired
range using the [screen.seqs](/wiki/screen.seqs) command. There are
several ways to run this command that are perfectly acceptable. The goal
is to think about and then optimize the criteria so that you have as
many long sequences as possible - these two factors are inversely
related to each other. Our preferred approach is to set the start and
end positions. These parameters indicate the position by which all
sequences must start by and end after. Setting these will allow you to
dictate the alignment coordinates of your sequences so that they all
overlap. We prefer this to setting a minimum length because sequences
(especially in the V1 region) vary in length when they cover the same
region. We can do this as follows:
mothur > screen.seqs(fasta=GQY1XT001.shhh.trim.unique.align, count=GQY1XT001.shhh.trim.count_table, end=27659, optimize=start, criteria=95)
Optimizing start to 21274
What this is doing is to require all sequences to end after position
27659 and it will select a start position such that 95% of your
sequences start before that position. You can also "hard code" the
start value or alter the criteria value to have more short sequences or
fewer long sequences. Play with this a bit and see what options you
think work best for your dataset.
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.unique.good.align, count=GQY1XT001.shhh.trim.good.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 15695 27659 252 0 3 1
2.5%-tile: 15969 27659 258 0 4 1617
25%-tile: 16310 27659 262 0 4 16168
Median: 16352 27659 269 0 4 32336
75%-tile: 16421 27659 275 0 5 48504
97.5%-tile: 21274 27659 292 0 6 63055
Maximum: 21274 28429 312 0 8 64671
Mean: 16763 27659 269 0 4
# of unique seqs: 13839
total # of seqs: 64671
We see that all of our sequences start by position 21274 and end after
position 27659. Also, all of the sequences are at least 249 bp in
length. The sequence that is 312 bp is suspicious, but oh well, let's
move on. Next, we need to filter our alignment so that all of our
sequences only overlap in the same region and to remove any columns in
the alignment that don't contain data. We do this by running the
[filter.seqs](/wiki/filter.seqs) command. Sometimes people run this
and find that they don't have anything left; this is typically because
of problems in the screen.seqs step above.
mothur > filter.seqs(fasta=GQY1XT001.shhh.trim.unique.good.align, vertical=T, trump=.)
In this command trump=. will remove any column that has a "."
character, which indicates missing data. The vertical=T option will
remove any column that contains exclusively gaps. We should see that our
alignment is now 438 columns long. That's another pretty significant
reduction in data that we have to process, plus it makes our analysis
more robust. Because we forced everything to the same alignment space
with no overhangs, we probably have a number of sequences that are now
redundant over this region. Let's run unique.seqs again to further
simplify the dataset:
mothur > unique.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.fasta, count=GQY1XT001.shhh.trim.good.count_table)
This will reduce the number of sequences from 13839 to 10740. The final
step we can take to reduce our sequencing error is to use the
[pre.cluster](/wiki/pre.cluster) command to merge sequence counts
that are within 2 bp of a more abundant sequence. As a rule of thumb we
use a difference of 1 bp per 100 bp of sequence length:
mothur > pre.cluster(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.count_table, diffs=2)
This implementation of the command will split the sequences by group and
then within each group it will pre-cluster those sequences that are with
1 or 2 bases of a more abundant sequence. It will then merge all the
sequences back into one file that is outputted as
GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.fasta and a
count file that is
GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.count_table. Let's
see what we have left
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 438 247 0 3 1
2.5%-tile: 1 438 254 0 4 1617
25%-tile: 1 438 256 0 4 16168
Median: 1 438 258 0 4 32336
75%-tile: 1 438 258 0 5 48504
97.5%-tile: 1 438 258 0 6 63055
Maximum: 2 438 264 0 8 64671
Mean: 1 438 257 0 4
# of unique seqs: 7528
total # of seqs: 64671
We still have the same total number of sequences, but we now have 7528
unique sequences.
## Removing chimeras
To this point we have been concerned with removing sequencing errors.
The next thing we need to do is to identify chimeras. mothur has a
number of methods for [ tools for detecting
chimeras](/wiki/chimera.vsearch). Our preferred method is to use
[chimera.vsearch](/wiki/chimera.vsearch) using the sequences as their
own reference:
mothur > chimera.vsearch(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.count_table, dereplicate=t)
The above method will first split the sequences by group and then check
each sequence within a group using the more abundant sequences as
reference sequences. This enables us to parallelize the entire process
by putting each group onto a separate processor. The upside of this
approach is that you get to use your more abundant sequences as the
reference database. The idea is that chimeras should be rarer than their
more abundant parent sequences. By default dereplicate=f, meaning if
[chimera.vsearch](/wiki/chimera.vsearcg) calls a sequence as chimeric
in one group, it considers it a chimera in all samples and will flag all
for removal. Setting dereplicate=t will force mothur to only remove sequences from the samples they were found to be chimeric in.
The chimera.vsearch command will automatically remove the chimeras from your files for you.
Let's see what we've got left\...
mothur > summary.seqs(count=current)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 438 248 0 3 1
2.5%-tile: 1 438 254 0 4 1324
25%-tile: 1 438 256 0 4 13231
Median: 1 438 258 0 4 26462
75%-tile: 1 438 258 0 4 39692
97.5%-tile: 1 438 258 0 6 51599
Maximum: 2 438 264 0 8 52922
Mean: 1 438 257 0 4
# of unique seqs: 2481
total # of seqs: 52922
## Removing "contaminants"
There is one more step we can take to improve our data quality. Because
chloroplasts and mitochondria are generally considered to be "former"
bacteria and not an active component to a microbial community, the next
thing we will do is to remove sequences affiliated with organelles from
our dataset. To do this, we first need to classify our sequences using
the mothur version of the "Bayesian" classifier. We can do this with
the [classify.seqs](/wiki/classify.seqs) command using the RDP
reference files you downloaded above:
mothur > classify.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.count_table, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax)
Now let's use the [remove.lineage](/wiki/remove.lineage) command
to remove those sequences that classified as "Chloroplast",
"Mitochondria", or "unknown" (those sequences that could not be
classified at the Kingdom level) as well as archaeal and eukaryotic
16S/18S rRNAs, since our primers are not designed to amplify these
populations:
mothur > remove.lineage(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.count_table, taxonomy=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.taxonomy, taxon=Mitochondria-Chloroplast-Archaea-Eukaryota-unknown)
mothur > summary.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 438 248 0 3 1
2.5%-tile: 1 438 254 0 4 1324
25%-tile: 1 438 256 0 4 13231
Median: 1 438 258 0 4 26461
75%-tile: 1 438 258 0 4 39691
97.5%-tile: 1 438 258 0 6 51598
Maximum: 2 438 264 0 8 52920
Mean: 1 438 257 0 4
# of unique seqs: 2479
total # of seqs: 52920
While not a huge reduction, we see that there was 1 Chloroplast and 1
Mitochondrial sequence in the dataset. Later we will come back and
assess how good our data actually look, but let's kick that down the
road and get going with the analysis. If you have a good reason to think
that some sequences are contaminants, you can use a similar approach to
remove those.
## Error analysis
Before we get rolling with the actual analysis of the data, let's take
a gander at the quality of our data to this point. In the folder you
pulled down was a called HMP\_MOCK.v35.align which is a full alignment
of the sequences that were put into the mock community and sequenced.
Because our sequences have been filtered, we need to use the filter that
was generated when we ran [filter.seqs](/wiki/filter.seqs)
previously to filter this alignment. We do this as follows:
mothur > filter.seqs(fasta=HMP_MOCK.v35.align, hard=GQY1XT001.filter)
Next, we need to get the sequences that correspond to our mock community
(the group name is MOCK.GQY1XT001):
mothur > get.groups(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, groups=MOCK.GQY1XT001)
Now we can use the [seq.error](/wiki/seq.error) command to get the
error rate for those sequences by comparing our sequences to the
references:
mothur > seq.error(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, reference=HMP_MOCK.v35.filter.fasta)
Overall error rate: 7.30556e-05
Errors Sequences
0 5882
1 7
2 3
3 13
4 6
5 4
6 1
7 0
8 0
9 1
It took 1 secs to check 101 sequences.
That's right, we've reduced the sequencing error rate from \~0.8% to 0\.007%.
That rocks, eh? We can now cluster the sequences into OTUs to see how many spurious OTUs we have:
mothur > dist.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.fasta, cutoff=0.03)
mothur > cluster(column=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.dist, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table)
mothur > make.shared(list=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.opti_mcc.list, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table)
mothur > rarefaction.single(shared=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.opti_mcc.shared)
This string of commands will produce a file for you called GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.opti_mcc.list.groups.rarefaction. Open it. You'll see that for 6001 sequences, we'd have 41 OTUs from the Mock community. This number of course includes some stealthy chimeras that escaped our detection methods. This is not a perfect world. But this is pretty darn good!
So now we know how good (or bad) our data are. If you wanted you could
start back at the step after the sffinfo command and start messing
around and seeing how much this output changes. I've done this for 90
mock communities. What you've been doing is what I came up with.
Clearly there is room for improvement in the area of chimera prevention
and detection, but I'd say that we've got the actual sequencing error
down pretty well. I'll add that the only way we can do this type of
analysis is because we dedicated one barcode to a mock community. If we
were sequencing on numerous plates we could see whether there was
variation in the error rates or in the relative abundance of the
OTUs/phylotypes. Finally, if your PI (or sequencing center) freaks out
that you "threw away" a bunch of data, feel free to reanalyze these
data however you (or he/she) see fit and show them how it changes the
output.
## Preparing inputs for analysis
Let's remove the mock community data so we're only using our "real"
data.
mothur > remove.groups(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=/GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.pick.taxonomy, groups=MOCK.GQY1XT001)
We pride ourselves on generating files with the longest possible file
names. I mean who doesn't love typing out
GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta
final.fasta? Let's use the mothur [rename.file](/wiki/rename.file)
command to make copies of some files so that we don't develop carpel
tunnel syndrome.
mothur > rename.file(fasta=current, count=current, prefix=final, deleteold=false)
To get ready for doing the fun part of analysis, we need to remember
that there are three general approaches to analyzing 16S rRNA gene
sequences. You can analyze sequences binned into OTUs or phylotypes or
you can use the phylogenetic tree-based approaches.
### OTUs
**Option 1** To build OTUs within mothur there are several options. The
SOP in the Schloss lab is to first generate a distance matrix using the
[dist.seqs](/wiki/dist.seqs) command. We will use a cutoff of 0.03,
which means we'll chuck any pairwise distance larger than 0.03:
mothur > dist.seqs(fasta=final.fasta, cutoff=0.03)
If the output of this command is in the \>100 GBs of memory range you
have probably done something incorrectly above or chosen to ignore some
of my advice\... Next, we want to cluster these sequences into OTUs
based on the newly created distance matrix and the name file we've
been keeping track of along the way using the
[cluster](/wiki/cluster) command:
mothur > cluster(column=final.dist, count=final.count_table)
Typically we are interested in cutoffs of 0.03 for further analysis.
**Option 2** Generally, I'll set up the cluster command and let it rip
overnight and all will be well in the morning. If I'm in a pinch (e.g.
I have to give a talk in an hour and still don't have my slides done),
I'll use the cluster.split command. It is wicked fast and the only
reason I don't put it in the SOP is because I think that sometimes
things that are hard are better. Regardless, if you use a taxlevel of
2-3 (Phylum or Class) the output should be about the same and you won't
owe Gaia any carbon credits:
mothur > cluster.split(fasta=final.fasta, taxonomy=final.taxonomy, count=final.count_table, taxlevel=3)
Like I said above, the output from Option 2 should be about the same as
Option 1. The remainder of this tutorial uses Option 1. Now that we have
a list file from one of these options, we would like to create a table
that indicates the number of times an OTU shows up in each sample. This
is called a [shared file](/wiki/shared_file) and can be created
using the [make.shared](/wiki/make.shared) file:
mothur > make.shared(list=final.opti_mcc.list, count=final.count_table)
The final step to getting good OTU data is to normalize the number of
sequences in each sample. We have observed that the number of spurious
OTUs is correlated with the number of sequences. Because of this, we
standardize the number of sequences across samples so that they all have
the same opportunity to be "wrong". Most of these spurious OTUs are
chimeras that we could not detect. To do this we will pick a minimum
number of sequences that a sample has and randomly select this number of
sequences from each sample. This is not the ideal solution because you
could get lucky/unlucky in how you pick sequences (filed under things to
worry about in the middle of the night if you've run out of other
things to worry about). First we need to know how many sequences are in
each step. You can do this with the
[count.groups](/wiki/count.groups) command:
mothur > count.groups()
From the output we see that the sample with the fewest sequences had
3844 sequences in it. That seems like a reasonable size, so let's
sub-sample all the samples to this number of seqeunces:
mothur > sub.sample(shared=final.opti_mcc.shared, size=3844)
Typically, this will remove a few groups that that didn't have enough
sequences. Get on the horn and yell at the sequencing center or read a
book about the probability of getting 96 evenly represented samples with
5000 sequences each.
The last thing we'd like to do is to get the taxonomy information for
each of our OTUs. To do this we will use the
[classify.otu](/wiki/classify.otu) command to give us the majority
consensus taxonomy:
mothur > classify.otu(list=final.opti_mcc.list, count=final.count_table, taxonomy=final.taxonomy, label=0.03)
### Phylotype
With the completion of that last step, I generally have no reason to
assign sequences to phylotypes. We do the following steps to make
clinicians and people that think the names mean something happy. The
[phylotype](/wiki/phylotype) command goes through the [taxonomy
file](/wiki/taxonomy_file) and bins sequences together that have
the same taxonomy. Here we do it to the genus level:
mothur > phylotype(taxonomy=final.taxonomy, count=final.count_table, label=1)
Like above, we want to make a shared file and standardize the number of
sequences in each group:
mothur > make.shared(list=final.tx.list, count=final.count_table, label=1)
mothur > sub.sample(shared=final.tx.shared, size=3844)
Finally, just to keep things consistent, we get the taxonomy of each
phylotype:
mothur > classify.otu(list=final.tx.list, count=final.count_table, taxonomy=final.taxonomy, label=1)
### Phylogenetic tree
There are many ways to construct phylogenetic trees. We have ported
[clearcut](/wiki/clearcut) into mothur to generate neighbor joining
trees. By default we do not use their heuristic, so these are real
neighbor joining trees which you may or may not think are "real".
First we need to subsample the sequences from each group and then
construct a phylip-formatted distance matrix, which we calculate with
[dist.seqs](/wiki/dist.seqs):
mothur > dist.seqs(fasta=final.fasta, output=phylip)
Now we call on [clearcut](/wiki/clearcut):
mothur > clearcut(phylip=final.phylip.dist)
## Analysis
Moving on, let's do something more interesting and actually analyze our
data. Remember that our initial question had to do with the stability
and change in community structure in these samples when comparing early
and late samples. Keep in mind that the group names have either a F or M
(sex of animal) followed by a number (number of animal) followed by a D
and a three digit number (number of days post weaning).
### OTU-based analysis
#### Alpha diversity
Let's start our analysis by analyzing the alpha diversity of the
samples. First we will generate collector's curve of the [
Chao1](/wiki/chao) richness estimators and the [ inverse
Simpson](/wiki/invsimpson) diversity index. To do this we will use
the [collect.single](/wiki/collect.single) command:
mothur > collect.single(shared=final.opti_mcc.shared, calc=chao-invsimpson, freq=100)
This command will generate file ending in \*.chao and \*.invsimpson for
each sample, which can be plotted in your favorite graphing software
package. When you look at these plots you will see that the Chao1 curves
continue to climb with sampling; however, the inverse Simpson diversity
indices are relatively stable. In otherwords, it isn't really possible
to compare the richness of these communities based on the Chao1 index,
but it is with the inverse Simpson index. As a quick aside, it is
important to point out that Chao1 is really a measure of the minimum
richneess in a community, not the full richness of the community. One
method often used to get around this problem is to look at rarefaction
curves describing the number of OTUs observed as a function of sampling
effort. We'll do this with the
[rarefaction.single](/wiki/rarefaction.single) command:
mothur > rarefaction.single(shared=final.opti_mcc.shared, calc=sobs, freq=100)
This will generate files ending in \*.rarefaction, which again can be
plotted in your favorite graphing software package. Alas, rarefaction is
not a measure of richness, but a measure of diversity. If you consider
two communities with the same richness, but different evenness then
after sampling a large number of individuals their rarefaction curves
will asymptote to the same value. Since they have different evennesses
the shapes of the curves will differ. Therefore, selecting a number of
individuals to cutoff the rarefaction curve isn't allowing a researcher
to compare samples based on richness, but their diversity. Finally,
let's get a table containing the [ number of
sequences](/wiki/nseqs), the sample
[coverage](/wiki/coverage), the number of [ observed
OTUs](/wiki/sobs), and the [invsimpson](/wiki/invsimpson)
diversity estimate using the [summary.single](/wiki/summary.single)
command. To standardize everything, let's randomly select 3844
sequences from each sample 1000 times and calculate the average:
mothur > summary.single(calc=nseqs-coverage-sobs-invsimpson, subsample=t)
These data will be outputted to a table in a file called
final.an.groups.ave-std.summary. Interestingly, the sample coverages
were all above 97%, indicating that we did a pretty good job of sampling
the communities. Plotting the richness or diversity of the samples would
show that there was little difference between the different animals or
between the early and late time points. You could follow this up with a
repeated-measures ANOVA and find that there was no significant
difference based on sex or early vs. late.
#### Beta diversity measurements
Let's use two non-parametric estimators to see what the predicted
minimum number of overlapping OTUs is for the same samples using the
[summary.shared](/wiki/summary.shared) command:
mothur > summary.shared(calc=sharedchao, groups=F003D000-F003D002-F003D004-F003D006, all=T)
Opening the final.opti_mcc.multiple.summary file
we see a prediction that over these four time points the core microbiome
contained at least 85 OTUs.
Next, let's generate a dendrogram to describe the similarity of the
samples to each other. We will generate a dendrogram using the
[jclass](/wiki/jclass) and [thetayc](/wiki/thetayc)
calculators within the [tree.shared](/wiki/tree.shared) command:
mothur > tree.shared(calc=thetayc-jclass, subsample=T)
This command generates two newick-formatted tree files -
final.opti_mcc.thetayc.0.03.ave.tre and final.opti_mcc.jclass.0.03.ave.tre - that
are the result of subsampling 3844 sequences 1000 times. The trees can
be visualized in software like TreeView or FigTree. Inspection of the
both trees shows that individuals' communities cluster with themselves
to the exclusion of the others. We can test to deterine whether the
clustering within the tree is statistically significant or not using by
choosing from the [parsimony](/wiki/parsimony),
[unifrac.unweighted](/wiki/unifrac.unweighted), or
[unifrac.weighted](/wiki/unifrac.weighted) commands. To run these
we will first need to create a design file that indicates which
treatment each sample belongs to. There is a file called
mouse.sex\_time.design in the folder you downloaded that looks vaguely
like this:
F003D000 F003Early
F003D002 F003Early
F003D004 F003Early
F003D006 F003Early
F003D008 F003Early
F003D142 F003Late
F003D144 F003Late
F003D146 F003Late
F003D148 F003Late
F003D150 F003Late
Using the parsimony command let's look at the pairwise comparisons.
Specifically, let's focus on the early vs. late comparisons for each
mouse:
mothur > parsimony(tree=final.opti_mcc.thetayc.0.03.ave.tre, group=mouse.sex_time.design, groups=all)
1 F003Early-F003Late 1.000000 0.1080
There was clearly a significant difference between the clustering of the
early and late time points. Recall that this method ignores the branch
length. Let's incorporate the branch length using the unifrac commands:
mothur > unifrac.weighted(tree=final.opti_mcc.thetayc.0.03.ave.tre, group=mouse.sex_time.design, random=T)
Tree# Groups WScore WSig
1 F003Early-F003Late 0.952297 0.1190
Clearly when we incorporate the branch length from the dendrogram and
incorporate the weighting, the early and late time series are
significantly different for each mouse. Let's do the same analysis but
without correcting for the weightings:
mothur > unifrac.unweighted(tree=final.opti_mcc.thetayc.0.03.ave.tre, group=mouse.sex_time.design, random=T, groups=all)
Tree# Groups UWScore UWSig
1 F003Early-F003Late 0.954742 0.1190
Here we see a similar result to what we observed for the weighted
UniFrac.
Another popular way of visualizing beta-diversity information is through
ordination plots. We can calculate distances between samples using the
[dist.shared](/wiki/dist.shared) command:
mothur > dist.shared(shared=final.opti_mcc.shared, calc=thetayc-jclass, subsample=T)
The two resulting distance matrices (i.e.
final.opti_mcc.thetayc.0.03.lt.ave.dist and final.opti_mcc.jclass.0.03.lt.ave.dist)
can then be visualized using the [pcoa](/wiki/pcoa) or
[nmds](/wiki/nmds) plots. Principal Coordinates (PCoA) uses an
eigenvector-based approach to represent multidimensional data in as few
dimesnsions as possible. Our data is highly dimensional (\~9
dimensions).
mothur > pcoa(phylip=final.opti_mcc.thetayc.0.03.lt.ave.dist)
mothur > pcoa(phylip=final.opti_mcc.jclass.0.03.lt.ave.dist)
The output of these commands are three files ending in \*dist, \*pcoa,
and \*pcoa.loadings. The final.opti_mcc.thetayc.0.03.lt.ave.pcoa.loadings file
will tell you what fraction of the total variance in the data are
represented by each of the axes. In this case the first and second axis
represent about 39 and 23% of the variation (62% of the total) for the
thetaYC distances. The output to the screen indicates that the R-squared
between the original distance matrix and the distance between the points
in 2D PCoA space was 0.51, but that if you add a third dimension the
R-squared value increases to 0.95. All in all, not bad.
Alternatively, non-metric multidimensional scaling (NMDS) tries to
preserve the distance between samples using a user defined number of
dimensions. We can run our data through NMDS with 2 dimensions with the
following commands
mothur > nmds(phylip=final.opti_mcc.thetayc.0.03.lt.ave.dist)
mothur > nmds(phylip=final.opti_mcc.jclass.0.03.lt.ave.dist)
Opening the final.opti_mcc.thetayc.0.03.lt.ave.nmds.stress file we can inspect
the stress and R2 values, which describe the quality of the ordination.
Each line in this file represents a different iteration and the
configuration obtained in the iteration with the lowest stress is
reported in the final.an.thetayc.0.03.lt.ave.nmds.axes file. In this
file we find that the lowest stress value was 0.13 with an R-squared
value of 0.90; that stress level is actually pretty good. You can test
what happens with three dimensions by the following:
mothur > nmds(phylip=final.opti_mcc.thetayc.0.03.lt.ave.dist, mindim=3, maxdim=3)
The stress value drops to 0.05 and the R2 value goes up to 0.98. Not
bad. In general, you would like a stress value below 0.20 and a value
below 0.10 is even better. Thus, we can conclude that, NMDS is better
than PCoA. We can plot the three dimensions of the NMDS data by plotting
the contents of final.opti_mcc.subsample.pick.thetayc.0.03.lt.nmds.axes.
Again, it is clear that the early and late samples cluster separately
from each other. Ultimately, ordination is a data visualization tool. We
might ask if the spatial separation that we see between the early and
late plots in the NMDS plot is statistically significant. To do this we
have two statistical tools at our disposal. The first analysis of
molecular variance ([amova](/wiki/amova)), tests whether the
centers of the clouds representing a group are more separated than the
variation among samples of the same treatment. This is done using the
distance matrices we created earlier and does not actually use
ordination.
mothur > amova(phylip=final.opti_mcc.thetayc.0.03.lt.ave.dist, design=mouse.sex_time.design)
F003Early-F003Late Among Within Total
SS 0.195146 0.311797 0.506943
df 1 8 9
MS 0.195146 0.0389746
Fs: 5.00702
p-value: 0.018*
Here we see from the AMOVA that the "cloud" early and late time points
has a significantly different centroid for this mouse. Thus, the
observed separation in early and late samples is statistically
significant.
Next, we might ask which OTUs are responsible for shifting the samples
along the two axes. We can determine this by measuring the correlation
of the relative abundance of each OTU with the two axes in the NMDS
dataset. We do this with the [corr.axes](/wiki/corr.axes) command:
mothur > corr.axes(axes=final.opti_mcc.thetayc.0.03.lt.ave.nmds.axes, shared=final.opti_mcc.0.03.subsample.shared, method=spearman, numaxes=3)
This command generates the
final.opti_mcc.0.03.subsample.spearman.corr.axes file. The data for
the first five OTUs look like this\...
OTU axis1 p-value axis2 p-value axis3 p-value length
Otu001 -0.733333 0.015801 -0.103030 0.757252 -0.466667 0.161513 0.875312
Otu002 0.503030 0.131276 -0.890909 0.000542 0.309091 0.353785 1.068782
Otu003 0.127273 0.702596 -0.660606 0.037588 0.466667 0.161513 0.818765
Otu004 -0.376901 0.258180 0.462008 0.165739 0.303953 0.361843 0.669249
Otu005 0.127660 0.701734 -0.188451 0.571834 0.607906 0.062255 0.649122
Otu006 0.418182 0.209644 -0.260606 0.434321 0.769697 0.009222 0.913906
Otu007 -0.236364 0.478268 0.200000 0.548506 0.187879 0.573002 0.362169
Otu008 0.175758 0.598004 0.127273 0.702596 0.309091 0.353785 0.377659
Otu009 -0.806061 0.004862 -0.006061 0.985494 -0.769697 0.009222 1.114542
Otu010 -0.684848 0.028883 -0.115152 0.729753 -0.806061 0.004862 1.063960
What these results show is that OTUs 1 and 2 are responsible for moving
points in a negative direction along axis 3, whereas OTU 3 moves it
along the negative direction on axis 2. Recalling that we classified
each OTU earlier, we can open final.opti_mcc.0.03.cons.taxonomy to see that
these first 2 OTUs are mainly members of the Porphyromonadaceae. This
helps to illustrate the power of OTUs over phylotypes since each of
these OTUs is behaving differently. These data can be plotted in what's
known as a biplot where lines radiating from the origin (axis1=0,
axis2=0, axis3=0) to the correlation values with each axis are mapped on
top of the PCoA or NMDS plots. Later, using the metastats command, we
will see another method for describing which populations are responsible
for differences seen between specific treatments.
An alternative approach to building a biplot would be to provide data
indicating metadata about each sample. For example, we may know the
weight, height, blood pressure, etc. of the subjects in these samples.
For discussion purposes the file mouse.dpw.metadata is provided and
looks something like this:
group age
F003D000 0
F003D002 2
F003D004 4
...
mothur > corr.axes(axes=final.opti_mcc.thetayc.0.03.lt.ave.nmds.axes, metadata=mouse.dpw.metadata, method=spearman, numaxes=3)
Opening the file mouse.dpw.spearman.corr.axes, we see:
Feature axis1 p-value axis2 p-value axis3 p-value length
dpw -0.903030 0.000344 0.248485 0.455997 -0.284848 0.392803 0.978952
Indicating that as the dpw increases the communities shift to in the
negative direction along axis 1 and positively along the axis 2.
#### Population-level analysis
In addition to the use of [corr.axes](/wiki/corr.axes) we can also
use [metastats](/wiki/metastats) to determine whether there are any
OTUs that are differentially represented between the samples from men
and women in this study. Run the following in mothur:
mothur > metastats(shared=final.opti_mcc.0.03.subsample.shared, design=mouse.sex_time.design)
Looking in the first 10 OTUs from
final.opti_mcc.0.03.subsample.0.03.F003Late_F003Early.metastats file
we see the following\...
OTU mean(group1) variance(group1) stderr(group1) mean(group2) variance(group2) stderr(group2) p-value
Otu001 0.137009 0.000189 0.006147 0.073030 0.000493 0.009927 0.000512
Otu002 0.073342 0.000269 0.007339 0.124265 0.003668 0.027085 0.078354
Otu003 0.078127 0.000532 0.010310 0.081612 0.000841 0.012969 0.873280
Otu004 0.071730 0.000448 0.009464 0.066008 0.003067 0.024766 0.871890
Otu005 0.056489 0.000041 0.002865 0.065020 0.000242 0.006952 0.300787
Otu006 0.048999 0.000181 0.006019 0.069129 0.000464 0.009631 0.086360
Otu007 0.057581 0.000187 0.006110 0.055657 0.000134 0.005172 0.852256
Otu008 0.039740 0.000599 0.010943 0.054616 0.000563 0.010615 0.388159
Otu009 0.055241 0.000162 0.005696 0.019922 0.000127 0.005045 0.000970
...
These data tell us that OTU 1 was significantly different between the
early and late samples.
### Phylotype-based analysis
Phylotype-based analysis is the same as OTU-based analysis, but at a
different taxonomic scale. We will leave you on your own to replicate
the OTU-based analyses described above with the phylotype data.
### Phylogeny-based analysis
OTU and phylotype-based analyses are taxonomic approaches that depend on
a binning procedure. In contrast, phylogeny-based approaches attempt
similar types of analyses using a phylogenetic tree as input instead of
a shared file. Because of this difference these methods compare the
genetic diversity of different communities.
#### Alpha diversity
When using phylogenetic methods, alpha diversity is calculated as the
total of the unique branch length in the tree. This is done using the
[phylo.diversity](/wiki/phylo.diversity) command. Because of
differences in sampling depth we will rarefy the output:
mothur > phylo.diversity(tree=final.phylip.tre, count=final.count_table, rarefy=T)
This will generate a file called final.phylip.1.phylodiv.rarefaction.
#### Beta diversity
The unifrac-based metrics are used to assess the similarity between two
communities membership
([unifrac.unweighted](/wiki/unifrac.unweighted)) and structure
([unifrac.weighted](/wiki/unifrac.weighted)). We will use these
metrics and generate PCoA plots to compare our samples. There are two
beta-diversity metrics that one can use - [
unweighted](/wiki/unifrac.unweighted) and [
weighted](/wiki/unifrac.weighted). We will also have mothur
subsample the trees 1000 times and report the average:
mothur > unifrac.unweighted(tree=final.phylip.tre, count=final.count_table, distance=lt, random=F, subsample=t)
mothur > unifrac.weighted(tree=final.phylip.tre, count=final.count_table, distance=lt, random=F, subsample=t)
These commands will distance matrices
(final.phylip.tre1.unweighted.ave.dist and
final.phylip.tre1.weighted.ave.dist) that can be analyzed using all of
the beta diversity approaches described above for the OTU-based
analyses.
## Conclusion
There are many other ways that one could analyze these data (hey, we've
gotta save something for the paper!). I encourage you to go back and
change the settings, use different calculators, come up with a
hypothesis using the data and test it. If you think of an analysis that
you wish mothur would do, please let us know and we'll see about adding
it to the package. There is a certain "pipeline" aspect to this
analysis; however, it is also very much an art of working with
sequences. If you want to basically do everything that was described
above, you can use a batch file and use mothur in the [batch mode](/wiki/batch_mode)
as follows:
$ ./mothur batch
The file "batch" would contain each line you used above on its own
line in a text file
## Revisions
* 1.47.0 Updates to reflect changes in 1.47.0
어떠한 문서나 스크립트가 다른 **프로토콜 / 포트 / 호스트** 에 있는 리소스 사용하는 것을 제한하는 정책. 예를 들어, 다음과 같은 사이트에서 리소스를 다른 곳으로 요청한다고 하자.
* **Production MDB**: updated monthly.
This document outlines the mandatory procedures for developing and verifying VCR elements (shaders, manifests, and assets) to ensure high-fidelity, centered, and non-clipping renders.
http://localhost:8000