-
Notifications
You must be signed in to change notification settings - Fork 1
/
feature-scaling.html
41 lines (40 loc) · 10.3 KB
/
feature-scaling.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<!doctype html><html lang=en-uk><head><script data-goatcounter=https://ruivieira-dev.goatcounter.com/count async src=//gc.zgo.at/count.js></script><script src=https://unpkg.com/@alpinejs/intersect@3.x.x/dist/cdn.min.js></script><script src=https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js></script><script type=module src=https://ruivieira.dev/js/deeplinks/deeplinks.js></script><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/fonts/firacode/FiraCode-Regular.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/fonts/vollkorn/Vollkorn-Regular.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=stylesheet href=https://ruivieira.dev/css/kbd.css type=text/css><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><title>Feature scaling · Rui Vieira</title>
<link rel=canonical href=https://ruivieira.dev/feature-scaling.html><meta name=viewport content="width=device-width,initial-scale=1"><meta name=robots content="all,follow"><meta name=googlebot content="index,follow,snippet,archive"><meta property="og:title" content="Feature scaling"><meta property="og:description" content="TechniquesThe most common techniques for feature scaling are normalisation and standardisation. For the examples, we will use the reference dataframe
import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.DataFrame({'x': np.random.rand(100)*10.0, 'y': np.random.rand(100)*2.0}) print(df) x y 0 7.338272 0.963962 1 9.282307 0.799143 2 2.505291 0.664340 3 3.212283 0.137100 4 4.370920 0.383998 .. ... ... 95 1.454787 0.773893 96 3.847065 1.478079 97 4.198221 0.308595 98 9.986268 0."><meta property="og:type" content="article"><meta property="og:url" content="https://ruivieira.dev/feature-scaling.html"><meta property="article:section" content="posts"><meta property="article:modified_time" content="2023-09-04T00:15:05+01:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Feature scaling"><meta name=twitter:description content="TechniquesThe most common techniques for feature scaling are normalisation and standardisation. For the examples, we will use the reference dataframe
import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.DataFrame({'x': np.random.rand(100)*10.0, 'y': np.random.rand(100)*2.0}) print(df) x y 0 7.338272 0.963962 1 9.282307 0.799143 2 2.505291 0.664340 3 3.212283 0.137100 4 4.370920 0.383998 .. ... ... 95 1.454787 0.773893 96 3.847065 1.478079 97 4.198221 0.308595 98 9.986268 0."><link rel=stylesheet href=https://ruivieira.dev/css/styles.css><!--[if lt IE 9]><script src=https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js></script><script src=https://oss.maxcdn.com/respond/1.4.2/respond.min.js></script><![endif]--><link rel=icon type=image/png href=https://ruivieira.dev/images/favicon.ico></head><body class="max-width mx-auto px3 ltr" x-data="{currentHeading: undefined}"><div class="content index py4"><div id=header-post><a id=menu-icon href=#><i class="fas fa-eye fa-lg"></i></a>
<a id=menu-icon-tablet href=#><i class="fas fa-eye fa-lg"></i></a>
<a id=top-icon-tablet href=# onclick='$("html, body").animate({scrollTop:0},"fast")' style=display:none aria-label="Top of Page"><i class="fas fa-chevron-up fa-lg"></i></a>
<span id=menu><span id=nav><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></span><br><div id=share style=display:none></div><div id=toc><h4>Contents</h4><nav id=TableOfContents><ul><li><a href=#example :class="{'toc-h2':true, 'toc-highlight': currentHeading == '#example' }">Example</a></li></ul></nav><h4>Related</h4><nav><ul><li class="header-post toc"><span class=backlink-count>1</span>
<a href=https://ruivieira.dev/machine-learning.html>Machine Learning</a></li></ul></nav></div></span></div><article class=post itemscope itemtype=http://schema.org/BlogPosting><header><h1 class=posttitle itemprop="name headline">Feature scaling</h1><div class=meta><div class=postdate>Updated <time datetime="2023-09-04 00:15:05 +0100 BST" itemprop=datePublished>2023-09-04</time>
<span class=commit-hash>(<a href=https://ruivieira.dev/log/index.html#0514585>0514585</a>)</span></div></div></header><div class=content itemprop=articleBody><h1 id=techniques x-intersect="currentHeading = '#techniques'">Techniques</h1><p>The most common techniques for feature scaling are <em>normalisation</em> and <em>standardisation</em>.
For the examples, we will use the reference dataframe</p><div class=highlight><pre tabindex=0 style=background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span><span style=font-weight:700>import</span> <span style=color:#555>pandas</span> <span style=font-weight:700>as</span> <span style=color:#555>pd</span>
</span></span><span style=display:flex><span><span style=font-weight:700>import</span> <span style=color:#555>numpy</span> <span style=font-weight:700>as</span> <span style=color:#555>np</span>
</span></span><span style=display:flex><span><span style=font-weight:700>import</span> <span style=color:#555>matplotlib.pyplot</span> <span style=font-weight:700>as</span> <span style=color:#555>plt</span>
</span></span><span style=display:flex><span>
</span></span><span style=display:flex><span>df <span style=font-weight:700>=</span> pd<span style=font-weight:700>.</span>DataFrame({<span style=color:#b84>'x'</span>: np<span style=font-weight:700>.</span>random<span style=font-weight:700>.</span>rand(<span style=color:#099>100</span>)<span style=font-weight:700>*</span><span style=color:#099>10.0</span>,
</span></span><span style=display:flex><span> <span style=color:#b84>'y'</span>: np<span style=font-weight:700>.</span>random<span style=font-weight:700>.</span>rand(<span style=color:#099>100</span>)<span style=font-weight:700>*</span><span style=color:#099>2.0</span>})
</span></span><span style=display:flex><span><span style=color:#999>print</span>(df)
</span></span></code></pre></div><pre><code> x y
0 7.338272 0.963962
1 9.282307 0.799143
2 2.505291 0.664340
3 3.212283 0.137100
4 4.370920 0.383998
.. ... ...
95 1.454787 0.773893
96 3.847065 1.478079
97 4.198221 0.308595
98 9.986268 0.298912
99 4.940190 0.916740
[100 rows x 2 columns]
</code></pre><h1 id=min-max-scaler x-intersect="currentHeading = '#min-max-scaler'">Min-Max scaler</h1><p>A common scaler which transforms the original space between $[A, B]$ to another space $[A^{\prime}, B^{\prime}]$.
Typically, $[A^{\prime}, B^{\prime}]=[0, 1]$. The transformation is:</p><p>$$
x^{\prime}=\frac{x-x_{min}}{x_{max}-x_{min}}
$$</p><p>The Min-Max scaler works best when no normality is assumed and it is very sensitive to outliers.</p><h2 id=example x-intersect="currentHeading = '#example'">Example</h2><div class=highlight><pre tabindex=0 style=background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span><span style=font-weight:700>from</span> <span style=color:#555>sklearn.preprocessing</span> <span style=font-weight:700>import</span> MinMaxScaler
</span></span><span style=display:flex><span>
</span></span><span style=display:flex><span>scaler <span style=font-weight:700>=</span> MinMaxScaler()
</span></span><span style=display:flex><span>df_scaled <span style=font-weight:700>=</span> pd<span style=font-weight:700>.</span>DataFrame(scaler<span style=font-weight:700>.</span>fit_transform(df), columns<span style=font-weight:700>=</span>[<span style=color:#b84>'x'</span>,<span style=color:#b84>'y'</span>])
</span></span></code></pre></div><p><img src=https://ruivieira.dev/Feature%20scaling_files/figure-gfm/cell-4-output-1.png alt loading=lazy></p></div></article><div id=footer-post-container><div id=footer-post><div id=nav-footer style=display:none><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></div><div id=toc-footer style=display:none><nav id=TableOfContents><ul><li><a href=#example>Example</a></li></ul></nav></div><div id=share-footer style=display:none></div><div id=actions-footer><a id=menu-toggle class=icon href=# onclick='return $("#nav-footer").toggle(),!1' aria-label=Menu><i class="fas fa-bars fa-lg" aria-hidden=true></i> Menu</a>
<a id=toc-toggle class=icon href=# onclick='return $("#toc-footer").toggle(),!1' aria-label=TOC><i class="fas fa-list fa-lg" aria-hidden=true></i> TOC</a>
<a id=share-toggle class=icon href=# onclick='return $("#share-footer").toggle(),!1' aria-label=Share><i class="fas fa-share-alt fa-lg" aria-hidden=true></i> share</a>
<a id=top style=display:none class=icon href=# onclick='$("html, body").animate({scrollTop:0},"fast")' aria-label="Top of Page"><i class="fas fa-chevron-up fa-lg" aria-hidden=true></i> Top</a></div></div></div><footer id=footer><div class=footer-left>Copyright © 2024 Rui Vieira</div><div class=footer-right><nav><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></nav></div></footer></div></body><link rel=stylesheet href=https://ruivieira.dev/css/fa.min.css><script src=https://ruivieira.dev/js/jquery-3.6.0.min.js></script><script src=https://ruivieira.dev/js/mark.min.js></script><script src=https://ruivieira.dev/js/main.js></script><script>MathJax={tex:{inlineMath:[["$","$"],["\\(","\\)"]]},svg:{fontCache:"global"}}</script><script type=text/javascript id=MathJax-script async src=https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js></script></html>