Releases: grunwaldlab/poppr
Poppr version 2.8.3
This version of poppr has a bug fix, a much-needed improvement to the display of single-population nodes in MSN (by @fdchevalier), and minor changes to stability that should not be noticed by users.
The new MSN no longer draw the single-population nodes as a pie, so they look more presentable
I also accidentally forgot to include the NEWS file in the official release, but you can still keep track of that here:
BUG FIX
read.genalex()
now correctly parses strata when the user imports data that
contains duplicated data AND has some individuals named as integers less than
the number of samples in the data (prepended by zeroes)
(See #202).
NEW FEATURES
- MSN functions: nodes with single populations displayed as circles instead of
pies. (@fdchevalier, #203)
MISC
mlg.vector()
is now safer as it now uses a for loop instead of a
function with the out-of-scope operator (<<-
) (see #205)shufflepop()
is now safer as it now uses a for loop instead of a
function with the out-of-scope operator (<<-
) (see #205)- The MLG class gains a new
distenv
slot, which will store the environment
where the distance function or matrix exists. This is accompanied by an
accessor of the same name (see #206). "mlg.filter<-"()
replacement methods will no longer search the global
environment when evaluating the distance function or matrix (see #206).- Tests for
mlg.filter()
no longer assign objects to the global environment - DOIs for the publications have been added to the DESCRIPTION
Poppr version 2.8.2
This is a maintenance release that has no visible impact for users.
Version 2.8.1
This is a maintenance release for poppr.
BUG FIX
- An error that appeared in some AMOVA calls with genind objects with character-
based alleles was fixed (see #190 for details)
DOCUMENTATION
aboot()
documentation was updated to add the citation and make clear its
purpose and limitations.
MISC
- DOIs have been hyperlinked to doi.org instead of dx.doi.org (#188, @katrinleinweber)
poppr version 2.8.0
This release contains an updated win.ia()
, AMOVA for genlight objects, and a faster and more efficient calculation of Euclidean distance for genlight objects (see image).
NEWS
BUG FIX
win.ia()
now has more consistent behavior with chromosome structure and will
no longer result in an integer overflow.
(see #179). Thanks to @MarisaMiller
for the detailed bug report.plot_filter_stats()
will plot stats if supplied a list of thresholds.
ALGORITHMIC CHANGE
win.ia()
may result in slightly different results because of two changes:- The windows will now always start at position one on any given chromosome.
This will result in some windows at the beginning of chromosomes having a
value ofNA
if the first variant starts beyond the first window. - Windows are now calculated for each chromosome independently. The previous
version first concatenated chromosomes with at least a window-sized gap
between the chromosomes, but failed to ensure that the window always started
at the beginning of the chromosome. This version fixes that issue.
(see #179).
- The windows will now always start at position one on any given chromosome.
DEPRECATION
- The
chromosome_buffer
argument forwin.ia()
has been permanently set to
TRUE
and deprecated as it is no longer used.
NEW FEATURES
-
poppr.amova()
will now handle genlight/snpclone objects.
See #185 for details. -
bitwise.dist()
now has two new options:euclidean
andscale_missing
.
When both of these are set toTRUE
, the distance measured will be Euclidean
scaled for the amount of missing data in each comparison. This matches the
output of base R'sdist()
function at a fraction of time and memory.
See #176 for details. -
make_haplotypes()
is now a generic defined for both genind and genlight. -
genind2genalex()
will no longer write to "genalex.csv" by default. Instead,
it will warn the user and write to a temporary file.
See #175 for details. -
genind2genalex()
now has anoverwrite
parameter set toFALSE
to prevent
accidental overwriting of files. -
win.ia()
has a new argumentname_window
, which will give each element in
the result the designation of the terminal position of that window. Thanks to
@MarisaMiller for the suggestion! -
pair.ia()
can now calculate p-values via permutations (I forgot to add this in the official NEWS)
DOCUMENTATION
cutoff_predictor()
was added to the MLG vignette
poppr version 2.8.0 release candidate
This release contains an updated win.ia()
, AMOVA for genlight objects, and a faster and more efficient calculation of Euclidean distance for genlight objects (see image).
NEWS
BUG FIX
win.ia()
now has more consistent behavior with chromosome structure and will
no longer result in an integer overflow.
(see #179). Thanks to @MarisaMiller
for the detailed bug report.plot_filter_stats()
will plot stats if supplied a list of thresholds.
ALGORITHMIC CHANGE
win.ia()
may result in slightly different results because of two changes:- The windows will now always start at position one on any given chromosome.
This will result in some windows at the beginning of chromosomes having a
value ofNA
if the first variant starts beyond the first window. - Windows are now calculated for each chromosome independently. The previous
version first concatenated chromosomes with at least a window-sized gap
between the chromosomes, but failed to ensure that the window always started
at the beginning of the chromosome. This version fixes that issue.
(see #179).
- The windows will now always start at position one on any given chromosome.
DEPRECATION
- The
chromosome_buffer
argument forwin.ia()
has been permanently set to
TRUE
and deprecated as it is no longer used.
NEW FEATURES
-
poppr.amova()
will now handle genlight/snpclone objects.
See #185 for details. -
bitwise.dist()
now has two new options:euclidean
andscale_missing
.
When both of these are set toTRUE
, the distance measured will be Euclidean
scaled for the amount of missing data in each comparison. This matches the
output of base R'sdist()
function at a fraction of time and memory.
See #176 for details. -
make_haplotypes()
is now a generic defined for both genind and genlight. -
genind2genalex()
will no longer write to "genalex.csv" by default. Instead,
it will warn the user and write to a temporary file.
See #175 for details. -
genind2genalex()
now has anoverwrite
parameter set toFALSE
to prevent
accidental overwriting of files. -
win.ia()
has a new argumentname_window
, which will give each element in
the result the designation of the terminal position of that window. Thanks to
@MarisaMiller for the suggestion!
DOCUMENTATION
cutoff_predictor()
was added to the MLG vignette
poppr version 2.7.1
This minor release updates the AMOVA documentation since it was accidentally lost in version 2.7.0.
Polysat is also added to imports since it's a lightweight package
poppr version 2.7.0
Changes in poppr version 2.7
Poppr version 2.7 introduces a change to how AMOVA is calculated (thanks to Patrick Meirmans for the impetus and sample data) and two new functions for data conversion:
make_haplotypes()
for splitting data into pseudo-haplotypesas.genambig()
for converting genind/genclone objects to polysat's
genambig class.
The changes will be outlined here.
Calculating (\rho) --- AMOVA from allele frequencies
Rho is a method of calculating population differentiation in the AMOVA
framework without considering within-individual variance and is analogous
to Fst for use with autotetraploid organisms (Ronfort et al 1998;
Meirmans and Liu 2018). The process uses the Euclidean distance of allele
frequencies and can be performed by setting within = FALSE
.
library("poppr")
data("Pinf")
Pinf
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 tetraploid individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
# be sure to recode your polyploid data so that there are no zeroes for placeholders
(prc <- recode_polyploids(Pinf, newploidy = TRUE))
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 diploid (55) and triploid (31) individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
# calculate rho
rho <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1)
rho$statphi
#> Phi
#> Phi-samples-total 0.12713922
#> Phi-samples-Continent 0.05269217
#> Phi-Continent-total 0.07858802
Here, the value of (\rho) is 0.1271392.
Changes in AMOVA for poppr 2.7 can affect your results
The process of calculating AMOVA in poppr involved four steps:
- If the data were diploid, genotypes were split into pseudo-haplotypes
- A distance matrix was calculated using
diss.dist()
and the square root was taken - The matrix and hierarchy were prepared for either ade4 or pegas
- AMOVA was calculated using either ade4 or pegas
In this new version of poppr, you now have access to the function that splits
haplotypes called make_haplotypes()
.
The major change in poppr 2.7 is that the dist()
has replaced diss.dist()
Changing diss.dist()
to dist()
The default distance calculation for all AMOVA was diss.dist()
, which is a
dissimilarity distance. For haploid or pseudo-haploid data, this is
equivalent to a squared Euclidean distance, and was appropriate for
calculating the distance for use when the within = TRUE
option was set
(which was default). This method, however, was not appropriate when not
considering within-individual variation.
For example, this is how the previous versions of poppr would have calculated
(\rho):
dissim <- diss.dist(prc)
old <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1,
dist = dissim, squared = TRUE)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#> Phi
#> Phi-samples-total 0.15032208
#> Phi-samples-Continent 0.12439575
#> Phi-Continent-total 0.02960965
If we compare this result to the one above, we can see that there is a
distinct difference in the values of (\rho).
AMOVA with missing data
The dist()
function handles missing data differently than diss.dist()
, so
you may see small differences in your results (for details, see this
StackOverflow answer: https://stackoverflow.com/a/18117751/2752888).
For example, the nancycats data set has an average of 2.3% missing data. This
results in a small shift in the (\Phi) statistics. Here are the results with
version 2.7:
data(nancycats)
strata(nancycats) <- data.frame(colony = pop(nancycats))
new <- poppr.amova(nancycats, ~colony, cutoff = .1)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
new$statphi
#> Phi
#> Phi-samples-total 0.1971382
#> Phi-samples-colony 0.1235778
#> Phi-colony-total 0.0839327
To show the results from previous versions, we need to use the new
make_haplotypes()
function to create pseudo-haplotypes:
nanhaps <- make_haplotypes(nancycats)
# confirm that the number of individuals is double that of the original data
nInd(nanhaps)
#> [1] 474
2 * nInd(nancycats)
#> [1] 474
# calculate squared Euclidean distance
d2n <- diss.dist(nanhaps)
# calculate AMOVA
old <- poppr.amova(nanhaps, ~colony/Individual, cutoff = .1,
dist = d2n, squared = TRUE)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#> Phi
#> Phi-samples-total 0.19292024
#> Phi-samples-colony 0.12109840
#> Phi-colony-total 0.08171772
The different treatment of the missing data has created a difference of
0.004218 in (\Phi_{ST}).
Converting genind/genclone to polysat
Polysat is a package that works with polyploid microsatellite data. You can
install it from CRAN with install.packages("polysat")
. The poppr function
as.genambig()
will convert from genind to genambig:
library("polysat") # load polysat
Pinf
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 tetraploid individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
Pinf.ga <- as.genambig(Pinf) # Convert to genambig
summary(Pinf.ga) # Show the summary of the contents
#> Dataset with allele copy number ambiguity.
#> Insert dataset description here.
#> Number of missing genotypes: 10
#> 86 samples, 11 loci.
#> 2 populations.
#> Ploidies: 2 3 NA
#> Length(s) of microsatellite repeats: NA
Once you have your genambig object, you can use all the functions polysat has
available.
Created on 2018-03-16 by the reprex package (v0.2.0).
References
Ronfort, Joëlle, Eric Jenczewski, Thomas Bataillon, and François Rousset. "Analysis of population structure in autotetraploid species." Genetics 150, no. 2 (1998): 921-930.
Patrick G. Meirmans and Shenglin Liu. "Analysis of Molecular Variance (AMOVA) for autopolyploids" Submitted (2018)
poppr version 2.6.1
This is a bugfix release. The bug attempted to read computer memory that wasn't allocated.
BUG FIX
- An out-of-bounds memory access error in
bitwise.dist()
was fixed.
See #169 for details.
poppr version 2.6.0
The biggest feature of this release is the scaling of nodes by area in the minimum spanning networks. More details on that here: https://zkamvar.github.io/blog/poppr-2-6-0-better-network-plotting/
NEW FUNCTIONS
- The new function
boot.ia()
is conceptually similar toresample.ia()
,
except it resamples with replacement.
NEW FEATURES
- The function
resample.ia()
now can resample individuals weighted by their
Psex value. - The minimum spanning networks will now scale nodes by area instead of radius.
This gives a more accurate picture of the differences between MLGs. See
#154 for details. - A legend for samples/node is now added to all minimum spanning networks. See
#158 for details. - The imsn() option for node size scale has been changed to a slider.
BUG FIX
- An issue where data with sample names containing apostrophes could not be
imported was fixed (Identified in
#156). - a bug in
imsn()
where custom MLGs would result in an error was fixed. See
#155 for details. - a bug in
plot_poppr_msn()
where settingscale.leg = FALSE
would result in a
very small MSN plot was fixed. mlg()
now works properly for snpclone and genlight objects. See
#155 for details.
DEPENDENCIES
- The minimum version of igraph has been set to 1.0.0.
poppr version 2.5.0
Version 2.5.0 of poppr contains very important bug fixes for read.genalex()
and all functions that use Bruvo's distance (see details).
ALGORITHMIC CHANGE
-
Identified in #139, Bruvo's distance will now consider all possible combinations
of ordered alleles in the calculation under the genome addition and loss models
for missing data. This will affect those who have polyploid data that contain
more than one missing allele at any genotypeTo facilitate comparison, the global option
old.bruvo.model
, has been created.
By default it is set to FALSE, indicating that poppr should use the ordered
allele combinations. If the user wants to use the method considering unorderd
allele combinations, they can setoptions(old.bruvo.model = TRUE)
It must be repeated that this does not affect haploid or diploid comparisons,
those that use the infinite alleles model, or those who do not have more than
one missing allele at any genotype.
DEPRECATION
- The warning for a short repeat length vector for Bruvo's distance is
deprecated and will become an error in the future jack.ia()
is deprecated in favor ofresample.ia()
for clarity.
BUG FIX
- A bug in
read.genalex()
where removed samples would have incorrect strata
labels was fixed. Thanks to Hernán Dario Capador-Barreto for identifying it.
See #147
MISC
- The internal plotting function for mlg.table now uses tidy evaluation for
dplyr versions > 0.5.0 - The package reshape2 was removed from imports and replaced with base functions
(see #144 for details)
NEW IMPORTS
- Due to the migration to dplyr version 0.7.0, poppr now imports the
!!
operator from the rlang package