Merge branch 'develop'

svkucheryavski · Apr 7, 2018 · e6b949e · e6b949e
2 parents c2d4507 + c2a4384
commit e6b949e
Show file tree

Hide file tree

Showing 7 changed files with 68 additions and 68 deletions.
diff --git a/docs/_main_files/figure-html/unnamed-chunk-43-1.png b/docs/_main_files/figure-html/unnamed-chunk-43-1.png
diff --git a/docs/calibration-and-validation.html b/docs/calibration-and-validation.html
@@ -184,7 +184,7 @@ <h1>
             <section class="normal" id="section-">
 <div id="calibration-and-validation" class="section level2 unnumbered">
 <h2>Calibration and validation</h2>
-<p>The model calibration is similar to PCA, but there are several additional arguments, which are important for classification. First of all it is a class name. Class name is a string, which can be used later e.g. for identifying class members for testing. The second important argument is a level of significance, <code>alpha</code>. This parameter is used for calculation of statistical limits and can be considered as probability for false negatives. The default value is 0.05. Finally the parameter <code>lim.type</code> allows to select the method for compuring critical limits for the residuals, as it is described in the PCA chapter.</p>
+<p>The model calibration is similar to PCA, but there are several additional arguments, which are important for classification. First of all it is a class name. Class name is a string, which can be used later e.g. for identifying class members for testing. The second important argument is a level of significance, <code>alpha</code>. This parameter is used for calculation of statistical limits and can be considered as probability for false negatives. The default value is 0.05. Finally the parameter <code>lim.type</code> allows to select the method for compuring critical limits for the distances, as it is described in the PCA chapter.</p>
 <p>In this chapter as well as for describing other classification methods we will use a famous Iris dataset, available in R. The dataset includes 150 measurements of three Iris species: <em>Setosa</em>, <em>Virginica</em> and <em>Versicola</em>. The measurements are length and width of petals and sepals in cm. Use <code>?iris</code> for more details.</p>
 <p>Let’s get the data and split it to calibration and test sets.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">data</span>(iris)
@@ -298,10 +298,10 @@ <h3>Predictions and validation with a test set</h3>
 <div id="class-belonging-probabilities" class="section level3 unnumbered">
 <h3>Class belonging probabilities</h3>
 <p>In addition to the array with predicted class, the object with SIMCA results also contains an array with class beloning probabilities. The probabilities are calculated depending on how close a particular object is to the the critical limit border.</p>
-<p>To compute the probability we use the theoretical distribution for Q and T<sup>2</sup> residuals as for computing critical values (defined by the parameter <code>lim.type</code>). The distribution is used to calculate a p-value — chance to get object with given residual distance value or larger. The p-value is then compared with signidicance level, <span class="math inline">\(\alpha\)</span>, and the probability, <span class="math inline">\(\pi\)</span> is calculated as follows:</p>
+<p>To compute the probability we use the theoretical distribution for Q and T<sup>2</sup> distances as for computing critical values (defined by the parameter <code>lim.type</code>). The distribution is used to calculate a p-value — chance to get object with given distance value or larger. The p-value is then compared with signidicance level, <span class="math inline">\(\alpha\)</span>, and the probability, <span class="math inline">\(\pi\)</span> is calculated as follows:</p>
 <p><span class="math display">\[\pi = 0.5 (p / \alpha) \]</span></p>
 <p>So if p-value is the same as significance level (which happens when object is lying exactly on the acceptance line) the probability is 0.5. If p-value is e.g. 0.04, <span class="math inline">\(\pi = 0.4\)</span>, or 40%, and the object will be rejected as a stranger (here we assume that the <span class="math inline">\(\alpha = 0.05\)</span>). If the p-value is e.g. 0.06, <span class="math inline">\(\pi = 0.6\)</span>, or 60%, and the object will be accepted as a member of the class. If p-value is larger than <span class="math inline">\(2\times\alpha\)</span> the probability is set to 1.</p>
-<p>In case of rectangular acceptance area (<code>lim.type = 'jm'</code> or <code>'chisq'</code>) the probability is computed separately for Q and T<sup>2</sup> residuals and the smallest of the two is taken. In case of triangular acceptance area (<code>lim.type = 'ddmoments'</code> or <code>'ddrobust'</code>) the probability is calculated for a combination of the residuals.</p>
+<p>In case of rectangular acceptance area (<code>lim.type = 'jm'</code> or <code>'chisq'</code>) the probability is computed separately for Q and T<sup>2</sup> values and the smallest of the two is taken. In case of triangular acceptance area (<code>lim.type = 'ddmoments'</code> or <code>'ddrobust'</code>) the probability is calculated for a combination of the distances.</p>
 <p>Here is how to show the probability values, that correspond to the predictions shown in the previous code chunk.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">show</span>(res<span class="op">$</span>p.pred[<span class="dv">31</span><span class="op">:</span><span class="dv">40</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">3</span>, <span class="dv">1</span>])</code></pre></div>
 <pre><code>##    Comp 1 Comp 2     Comp 3

diff --git a/docs/plotting-methods.html b/docs/plotting-methods.html
@@ -189,7 +189,7 @@ <h2>Plotting methods</h2>
 <span class="kw">mdaplot</span>(m<span class="op">$</span>calres<span class="op">$</span>scores, <span class="dt">type =</span> <span class="st">&#39;p&#39;</span>, <span class="dt">show.labels =</span> T, <span class="dt">show.lines =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>))
 <span class="kw">mdaplot</span>(m<span class="op">$</span>loadings, <span class="dt">type =</span> <span class="st">&#39;p&#39;</span>, <span class="dt">show.labels =</span> T, <span class="dt">show.lines =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>))</code></pre></div>
 <p><img src="_main_files/figure-html/unnamed-chunk-55-1.png" width="864" /></p>
-<p>To simplify this routine, every model and result class also has a number of functions for visualization. Thus for PCA the function list includes scores and loadings plots, explained variance and cumulative explained variance plots, T<sup>2</sup> vs. Q residuals and many others.</p>
+<p>To simplify this routine, every model and result class also has a number of functions for visualization. Thus for PCA the function list includes scores and loadings plots, explained variance and cumulative explained variance plots, T<sup>2</sup> distances vs. Q residuals and many others.</p>
 <p>A function that does the same for different models and results has always the same name. For example, <code>plotPredictions</code> will show predicted vs. measured plot for PLS model and PLS result, MLR model and MLR result, PCR model and PCR result and so on. The first argument must always be either a model or a result object.</p>
 <p>The major difference between plots for model and plots for result is following. A plot for result always shows one set of data objects — one set of points, lines or bars. For example, predicted vs. measured values for calibration set or scores values for test set and so on. For such plots method <code>mdaplot()</code> is used and you can provide any arguments, available for this method (e.g. color group scores for calibration results).</p>
 <p>And a plot for a model in most cases shows several sets of data objects, e.g. predicted values for calibration and validation. In this case, a corresponding method uses <code>mdaplotg()</code> and, therefore, you can adjust the plot using arguments described for this method.</p>

diff --git a/docs/randomized-pca-algorithms.html b/docs/randomized-pca-algorithms.html
@@ -201,12 +201,12 @@ <h2>Randomized PCA algorithms</h2>
 t1 =<span class="st"> </span><span class="kw">system.time</span>({m1 =<span class="st"> </span><span class="kw">pca</span>(D, <span class="dt">ncomp =</span> <span class="dv">2</span>)})
 <span class="kw">show</span>(t1)</code></pre></div>
 <pre><code>##    user  system elapsed 
-##  60.262   3.103  65.328</code></pre>
+##  59.397   3.086  62.987</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># randomized SVD with p = 5 and q = 1</span>
 t2 =<span class="st"> </span><span class="kw">system.time</span>({m2 =<span class="st"> </span><span class="kw">pca</span>(D, <span class="dt">ncomp =</span> <span class="dv">2</span>, <span class="dt">rand =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">1</span>))})
 <span class="kw">show</span>(t2)</code></pre></div>
 <pre><code>##    user  system elapsed 
-##  34.870   3.322  42.607</code></pre>
+##  34.448   2.643  37.416</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># compare variances</span>
 <span class="kw">summary</span>(m1)</code></pre></div>
 <pre><code>## 
@@ -215,8 +215,8 @@ <h2>Randomized PCA algorithms</h2>
 ## Info:
 ## 
 ##        Eigvals Expvar Cumexpvar
-## Comp 1 112.597  62.18     62.18
-## Comp 2  49.704  27.45     89.63</code></pre>
+## Comp 1 112.699  62.17     62.17
+## Comp 2  49.897  27.52     89.69</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">summary</span>(m2)</code></pre></div>
 <pre><code>## 
 ## PCA model (class pca) summary
@@ -226,33 +226,33 @@ <h2>Randomized PCA algorithms</h2>
 ## 
 ## Parameters for randomized algorithm: q = 5, p = 1
 ##        Eigvals Expvar Cumexpvar
-## Comp 1 112.597  62.18     62.18
-## Comp 2  49.704  27.45     89.63</code></pre>
+## Comp 1 112.699  62.17     62.17
+## Comp 2  49.897  27.52     89.69</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># compare loadings</span>
 <span class="kw">show</span>(m1<span class="op">$</span>loadings[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, ])</code></pre></div>
 <pre><code>##              Comp 1        Comp 2
-##  [1,]  3.049489e-06 -0.0000979183
-##  [2,] -3.900622e-02  0.0692210369
-##  [3,] -6.844649e-02  0.0747632444
-##  [4,] -8.140565e-02  0.0121164621
-##  [5,] -7.440682e-02 -0.0612545176
-##  [6,] -4.905702e-02 -0.0782478934
-##  [7,] -1.158937e-02 -0.0229581801
-##  [8,]  2.871800e-02  0.0533882207
-##  [9,]  6.184087e-02  0.0801510534
-## [10,]  7.973446e-02  0.0329061559</code></pre>
+##  [1,]  7.799545e-05 -1.616151e-05
+##  [2,] -3.861521e-02 -6.939509e-02
+##  [3,] -6.814709e-02 -7.538469e-02
+##  [4,] -8.140091e-02 -1.254632e-02
+##  [5,] -7.476187e-02  6.095674e-02
+##  [6,] -4.946691e-02  7.770438e-02
+##  [7,] -1.173607e-02  2.264508e-02
+##  [8,]  2.901043e-02 -5.343363e-02
+##  [9,]  6.228145e-02 -8.011626e-02
+## [10,]  7.983831e-02 -3.260527e-02</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">show</span>(m2<span class="op">$</span>loadings[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, ])</code></pre></div>
 <pre><code>##              Comp 1        Comp 2
-##  [1,]  3.049489e-06 -0.0000979183
-##  [2,] -3.900622e-02  0.0692210369
-##  [3,] -6.844649e-02  0.0747632444
-##  [4,] -8.140565e-02  0.0121164621
-##  [5,] -7.440682e-02 -0.0612545176
-##  [6,] -4.905702e-02 -0.0782478934
-##  [7,] -1.158937e-02 -0.0229581801
-##  [8,]  2.871800e-02  0.0533882207
-##  [9,]  6.184087e-02  0.0801510534
-## [10,]  7.973446e-02  0.0329061559</code></pre>
+##  [1,]  7.799545e-05  1.616151e-05
+##  [2,] -3.861521e-02  6.939509e-02
+##  [3,] -6.814709e-02  7.538469e-02
+##  [4,] -8.140091e-02  1.254632e-02
+##  [5,] -7.476187e-02 -6.095674e-02
+##  [6,] -4.946691e-02 -7.770438e-02
+##  [7,] -1.173607e-02 -2.264508e-02
+##  [8,]  2.901043e-02  5.343363e-02
+##  [9,]  6.228145e-02  8.011626e-02
+## [10,]  7.983831e-02  3.260527e-02</code></pre>
 <p>As you can see the explained variance values, eigenvalues and loadings are identical in the two models and the second method is about twice faster.</p>
 <p>It is possible to make PCA decomposition even faster if only loadings and scores are needed. In this case you can use method <code>pca.run()</code> and skip other steps, like calculation of residuals, variances, critical limits and so on. But in this case data matrix must be centered (and scaled if necessary) manually prior to the decomposition. Here is an example using the data generated in previous code.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">D =<span class="st"> </span><span class="kw">scale</span>(D, <span class="dt">center =</span> T, <span class="dt">scale =</span> F)
@@ -261,37 +261,37 @@ <h2>Randomized PCA algorithms</h2>
 t1 =<span class="st"> </span><span class="kw">system.time</span>({P1 =<span class="st"> </span><span class="kw">pca.run</span>(D, <span class="dt">method =</span> <span class="st">&#39;svd&#39;</span>, <span class="dt">ncomp =</span> <span class="dv">2</span>)})
 <span class="kw">show</span>(t1)</code></pre></div>
 <pre><code>##    user  system elapsed 
-##  26.312   0.272  27.052</code></pre>
+##  25.966   0.261  26.293</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># randomized SVD with p = 5 and q = 1</span>
 t2 =<span class="st"> </span><span class="kw">system.time</span>({P2 =<span class="st"> </span><span class="kw">pca.run</span>(D, <span class="dt">method =</span> <span class="st">&#39;svd&#39;</span>, <span class="dt">ncomp =</span> <span class="dv">2</span>, <span class="dt">rand =</span> <span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">1</span>))})
 <span class="kw">show</span>(t2)</code></pre></div>
 <pre><code>##    user  system elapsed 
-##   2.120   0.045   2.166</code></pre>
+##   2.085   0.043   2.130</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># compare loadings</span>
 <span class="kw">show</span>(P1<span class="op">$</span>loadings[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, ])</code></pre></div>
 <pre><code>##                [,1]          [,2]
-##  [1,]  3.049489e-06 -0.0000979183
-##  [2,] -3.900622e-02  0.0692210369
-##  [3,] -6.844649e-02  0.0747632444
-##  [4,] -8.140565e-02  0.0121164621
-##  [5,] -7.440682e-02 -0.0612545176
-##  [6,] -4.905702e-02 -0.0782478934
-##  [7,] -1.158937e-02 -0.0229581801
-##  [8,]  2.871800e-02  0.0533882207
-##  [9,]  6.184087e-02  0.0801510534
-## [10,]  7.973446e-02  0.0329061559</code></pre>
+##  [1,]  7.799545e-05 -1.616151e-05
+##  [2,] -3.861521e-02 -6.939509e-02
+##  [3,] -6.814709e-02 -7.538469e-02
+##  [4,] -8.140091e-02 -1.254632e-02
+##  [5,] -7.476187e-02  6.095674e-02
+##  [6,] -4.946691e-02  7.770438e-02
+##  [7,] -1.173607e-02  2.264508e-02
+##  [8,]  2.901043e-02 -5.343363e-02
+##  [9,]  6.228145e-02 -8.011626e-02
+## [10,]  7.983831e-02 -3.260527e-02</code></pre>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">show</span>(P2<span class="op">$</span>loadings[<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, ])</code></pre></div>
 <pre><code>##                [,1]          [,2]
-##  [1,]  3.049489e-06 -0.0000979183
-##  [2,] -3.900622e-02  0.0692210369
-##  [3,] -6.844649e-02  0.0747632444
-##  [4,] -8.140565e-02  0.0121164621
-##  [5,] -7.440682e-02 -0.0612545176
-##  [6,] -4.905702e-02 -0.0782478934
-##  [7,] -1.158937e-02 -0.0229581801
-##  [8,]  2.871800e-02  0.0533882207
-##  [9,]  6.184087e-02  0.0801510534
-## [10,]  7.973446e-02  0.0329061559</code></pre>
+##  [1,]  7.799545e-05  1.616151e-05
+##  [2,] -3.861521e-02  6.939509e-02
+##  [3,] -6.814709e-02  7.538469e-02
+##  [4,] -8.140091e-02  1.254632e-02
+##  [5,] -7.476187e-02 -6.095674e-02
+##  [6,] -4.946691e-02 -7.770438e-02
+##  [7,] -1.173607e-02 -2.264508e-02
+##  [8,]  2.901043e-02  5.343363e-02
+##  [9,]  6.228145e-02  8.011626e-02
+## [10,]  7.983831e-02  3.260527e-02</code></pre>
 <p>As you can see the loadings are still the same but the probabilistic algorithm is about 15 times faster.</p>
 
 </div>