-
Notifications
You must be signed in to change notification settings - Fork 24
3. Using the stereology module
Stereology is a set of mathematical methods designed to provide quantitative information about a three-dimensional feature from measurements made on two-dimensional sections. Unlike tomography, which aims to reconstruct the 3D geometry of a material, stereology is used to estimate specific 3D features. Here we will use two stereological methods to approximate the true grain size distribution from the apparent grain size distribution measured on a section: 1) the Saltykov method, and 2) the two-step method.
What is it?
The Saltykov method is a stereological technique that approximates the true grain size distribution from the histogram of the distribution of apparent grain size sections. The method does not assume any particular type of statistical distribution.What do I use it for?
In the geosciences, the Saltykov method is used primarily to estimate the volume fraction of a given range of grain sizes, but also to estimate the actual average grain size (see cautionary note on this later).What are its limitations?
Despite its utility, the Saltykov method has several limitations when applied to rocks:
- Assumption of Non-Touching Spheres. The method assumes non-touching spheres uniformly distributed in a matrix (e.g. bubbles in a piece of glass), a condition rarely met in polycrystalline rocks. To apply the method, the grains should be at least approximately equiaxed, which is typically the case for recrystallized grains.
- Dependence on Histogram Classes. The accuracy of the method is affected by the number of classes in the histogram. There's a trade-off: fewer classes improve the numerical stability of the method but worsen the approximation of the target distribution and vice versa. Currently, there is no exact method for finding the optimal number of classes, and this must be set by the user or determined by a rule of thumb. Using the histogram also means that we cannot get a complete description of the grain size distribution.
- Lack of Error Estimation Formulation. The Saltykov method lacks a formulated procedure for estimating errors during the unfolding process, which limits the ability to assess the reliability of the results
- Inability to Estimate True Average Grain Size. It's impossible to obtain an estimate of the true average grain size (3D) because individual data is lost when using the histogram. In other words, the Saltykov method attempts to reconstruct the histogram of the true grain size population, not to convert each apparent diameter into the true one. While there are methods to estimate an average from the resulting histogram, it's important to emphasize that this estimate is derived from a stereological model, not actual empirical data.
To apply this method, we use the Saltykov()
function of the stereology module. This function has the following parameters:
diameters : array_like
the apparent diameters of the grains.
numbins : positive integer, optional
the number of bins/classes of the histogram. If not declared,
is set to 10 by default.
calc_vol : positive scalar or None, optional
if the user specifies a diameter, the function will return the volume
occupied by the grain fraction up to that diameter.
text_file : string or None, optional
if the user specifies a name, the function will store a csv or tsv/txt file
with that name containing the Saltykov output.
return_data : bool, optional
if True the function will return the position of the midpoints and
the frequencies.
left_edge : positive scalar or 'min', optional
set the left edge of the histogram. Default is zero.
The only input required is the distribution of apparent diameters (diameters
). From here the user can specify the number of classes (numbins
), calculate the volume occupied by a grain size fraction up to a specific size (calc_vol
), generate a tabular (csv, tsv or txt) file with the data (text_file
), retrieve some data for further calculations (return_data
) or set the initial (left) limit of the histogram (left_edge
) if needed. We will see some examples below.
# Import a dataset with grain sectional areas
filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt'
dataset = pd.read_csv(filepath, sep='\t')
# estimate equivalent circular diameters from sectional areas
dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi)
# apply the Saltykov method from stereology module
fig1, (ax1, ax2) = stereology.Saltykov(dataset['diameters'], numbins=11, calc_vol=50)
=================================================
volume fraction (up to 50 microns) = 41.65 %
=================================================
calculated bin size = 14.24
In the example above, in addition to providing the apparent diameters, the unfolding of the histogram was set to use eleven classes and to determine the volume fraction represented by particles smaller than 50 microns. By default, the function returns a figure with the unfolded histogram approximating the actual grain size distribution (left) and the cumulative volume as a function of particle diameter (right), bin/class size, and the required volume fraction.
If we want to save the data generated by Saltykov's method to a text file, we need to specify the file name and type, csv, tsv or txt, in the text_file
parameter as follows.
stereology.Saltykov(dataset['diameters'], numbins=11, text_file='mydata.csv')
This will generate a file with four columns containing the midpoints of each class, their frequencies, their frequencies normalised to one, and the cumulative volume for each class.
The Saltykov function can be used to directly extract the density and midpoint values of the classes as follows
mid_points, densities = stereology.Saltykov(dataset['diameters'], numbins=11, return_data=True)
print(densities)
[1.31536871e-03 2.17302235e-02 2.25631643e-02 1.45570771e-02
6.13586532e-03 2.24830266e-03 1.29306084e-03 3.60326809e-04
0.00000000e+00 0.00000000e+00 4.11071036e-05]
As you can see, these densities do not add up to 1 or 100.
np.sum(densities)
0.07024449636922886
This is because the script has normalised the frequencies of the different classes so that the integral over the range (not the sum) is one (see FAQs for an explanation of this). If you want to calculate the relative proportion for each class, you need to multiply the value of the densities by the bin size. Then you can check that the relative densities sum to one (i.e. they are proportions relative to one).
corrected_densities = densities * 14.236
print(f'Sum = {np.sum(corrected_densities):.1f} %')
Sum = 1.0 %
What is it?
The two-step method is a stereological technique that approximates the true grain size distribution from the histogram of the distribution of apparent grain size intervals. It differs from the Saltykov method in that the population is not described by a histogram but by a mathematical distribution. The method is thus distribution dependent, i.e. it assumes that the grain sizes follow a lognormal distribution. The method fits a lognormal distribution to the output of the Saltykov method, hence the name "two-step method". More information here.What do I use it for?
The Two-Step Method is primarily used to estimate the lognormal distribution of grain size, which involves determining the shape and location of the distribution. It can also be used to estimate the volume fraction of a particular range of grain sizes.What are its limitations?
- Distribution Dependence. The method assumes a lognormal grain size distribution, which may not accurately represent certain aggregates. Besides, the method only works corectly for unimodal grain size distributions.
- Inherited limitations from the Saltykov method. The method is partly based on the Saltykov method and therefore inherits some of its limitations. However, the method does not require a specific number of classes to be defined.
To apply this method, we use the two_step()
function of the stereology module. This function has the following parameters:
diameters : array_like
the apparent diameters of the grains
class_range : tupe or list with two values, optional
the range of classes considered. The algorithm will estimate the optimal
number of classes within the defined range. Default = (10, 20)
The only data required is the distribution of apparent diameters (diameters
). From here, the user can only specify a range of number of classes (numbins
), if necessary, an algorithm will take care of estimating the optimal number of classes within this range.
fig2, axe = stereology.two_step(dataset['diameters'])
=======================================
OPTIMAL VALUES
Number of classes: 11
MSD (lognormal shape) = 1.63 ± 0.06
Geometric mean (scale) = 36.05 ± 1.27
=======================================
The method returns the values of the lognormal distribution that best fits our model, in this case defined by two values: the geometric mean (scale) and its MSD (shape) with an estimate of their errors.
Note
Understanding the MSD Value and its Purpose
MSD, or Multiplicative Standard Deviation, is a parameter that characterizes the shape of a grain size distribution using a single value, under the assumption that the distribution follows a lognormal pattern. In simpler terms, the MSD value provides a measure of the asymmetry or skewness of the grain size distribution. An MSD value equal to one corresponds to a normal (Gaussian) distribution, while values greater than one indicate log-normal distributions with varying degrees of asymmetry (Figure below). Scale-Independent Comparison. The advantage of this approach is that a single parameter, the MSD, can define the shape of the grain size distribution independently of its scale (Figure below). This makes it very convenient for comparing the shape or asymmetry of two or more grain size distributions.
Figure. Probability density functions of selected lognormal distributions taken from Lopez-Sanchez and Llana-Fúnez (2016). (a) Lognormal distributions with different MSD values (shapes) and the same median/geometric mean (4). (b) Lognormal distributions with the same shape corresponding to an MSD value (1.5) and different medians/geometric means (note that different medians/geometric means imply different scales in the horizontal and vertical directions). Let's see some examples.
The lognormal distribution can also be used to calculate other averages. For example, to estimate the arithmetic mean or the frequency peak (mode) we use the following formulation
where
geo_mean = 36.05
msd = 1.63
mu = np.log(geo_mean)
sigma = np.log(msd)
amean = np.exp(mu + (sigma**2 / 2))
mode = np.exp(mu - sigma**2)
print(f'mean = {amean:.2f}')
print(f'mode = {mode:.2f}')
mean = 40.62
mode = 28.39
Note
In a perfect lognormal distribution, the median and geometric mean coincide, so there is no need to calculate the median. In real distributions, which never match a perfect model, there is a difference between these two values.
Caution
Why prefer averages from apparent grain size to those estimated from unfolded grain size distributions in palaeopiezometry?
While one might be tempted to use a stereological method to estimate the midpoint of the modal interval or some other unidimensional parameter based on the calculated grain size distribution, we argue that this approach offers no advantages and comes with serious disadvantages.
The reason for this is that the grain size distributions estimated by stereological methods are based on a stereological model. This means that the accuracy of the estimates depends not only on measurement errors but also on the robustness of the model itself. Unfortunately, stereological methods are based on weak geometric assumptions, and their results will always be, at best, approximate. As a result, the precision and accuracy of averages estimated from unfolded grain size distributions are significantly inferior in performance and reliability to those based on the original distribution of grain sections. The latter, although estimating an apparent grain size, is based on real data rather than a model.
Recommendation. In summary, it's advisable to use stereological methods only when there’s a need to estimate the volume occupied by a particular grain size fraction, to investigate the shape of the true grain size distribution or when you need to use an average based on actual grain sizes (e.g. when you need to compare the average grain size calculated by a tomographic technique with that estimated from a section). Otherwise, for better precision and accuracy, opt for averages based on the apparent grain size distribution.
Using the optimal lognormal parameters calculated by the two step method, one can estimate a volume fraction occupied by any range of grain sizes using the calc_volume_fraction
function from the Stereology module as follows:
# lognormal parameters
geomean = 36.05
std = np.log(1.63)
# set min and max grain size in the population
min_size = 0
max_size = dataset['diameters'].max()
# get the volume fraction of a specific grain size
stereology.calc_volume_fraction(lognorm_params=(geomean, std),
total_size_range=(min_size, max_size),
interest_size_range=(0, 50))
Volume fraction occupied by grains between 0 and 50 microns: 22.7 %
Important
If the grain size population follows a lognormal distribution, you should use this method. It is far superior to the Saltykov method for calculating volume fractions within specific grain size ranges. Note that the volume fraction below 50 microns calculated by this method is very different from that calculated by the Saltykov method due to inherent limitations of the Saltykov method. As an example, try using different numbers of bins in the Saltykov methods on the same population of grain sizes and see how the estimated volume fraction changes for a similar range of grain sizes.