Skip to content

Commit

Permalink
Site updated: 2024-09-11 22:17:12
Browse files Browse the repository at this point in the history
  • Loading branch information
cxzlw committed Sep 11, 2024
1 parent a1d4f74 commit 9a90788
Show file tree
Hide file tree
Showing 69 changed files with 61 additions and 91 deletions.
26 changes: 7 additions & 19 deletions 2023/07/05/zhihu-aac-old/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
<meta property="og:locale" content="zh_CN">
<meta property="og:image" content="https://blog.cxzlw.top/img/image.png">
<meta property="article:published_time" content="2023-07-04T17:49:31.000Z">
<meta property="article:modified_time" content="2024-09-11T13:57:29.450Z">
<meta property="article:modified_time" content="2024-09-11T14:16:47.517Z">
<meta property="article:author" content="cxzlw">
<meta property="article:tag" content="cxzlw">
<meta property="article:tag" content="Python">
Expand Down Expand Up @@ -336,9 +336,7 @@ <h1 id="seo-header">聊聊知乎盐选反爬 (回答页篇)</h1>

<p>最近,知乎上线了针对专栏<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="专栏反爬现已更新,故本文只以回答反爬为演示。">[1]</span></a></sup>中盐选文章的反爬系统,随后该系统也被运用在知乎回答页面中的盐选文章上。具体表现为爬取的文章内容中出现大量的错乱词汇。而在本篇文章中,我们将一步步带领各位解开这些乱码。在这个过程中,我们将对字体反爬有更深入的认识,并学到运用字体反爬时需要注意的问题。</p>
<h2 id="一、知乎反爬效果"><a href="#一、知乎反爬效果" class="headerlink" title="一、知乎反爬效果"></a>一、知乎反爬效果</h2><p>来自知乎回答<a target="_blank" rel="noopener" href="https://www.zhihu.com/question/41922324/answer/3073556909">不被爱是一种什么样的感受? - 知乎</a></p>
<p>
<img src="/../img/image.png.webp" srcset="/img/loading.gif" lazyload alt="乱码示意图">
</p>
<p><img src="/../img/image.png" srcset="/img/loading.gif" lazyload alt="乱码示意图"> </p>
<p>如图所示,在页面源码中出现了大量乱码,例如(原字,错字):<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="由于知乎回答页反爬使用了两套字体,故本文所有截图,代码运行结果等内容可能与实际不符。你可以选择以实际为主或刷新页面直到页面显示的内容与本文一致。">[2]</span></a></sup></p>
<ul>
<li>中 -&gt; 在</li>
Expand All @@ -348,27 +346,17 @@ <h2 id="一、知乎反爬效果"><a href="#一、知乎反爬效果" class="hea
<p>这些乱码使得文章可读性大大下降,那么乱码是怎么产生的?又如何解决这个问题呢?</p>
<h2 id="二、找寻乱码真凶"><a href="#二、找寻乱码真凶" class="headerlink" title="二、找寻乱码真凶"></a>二、找寻乱码真凶</h2><p>观察上述现象,页面源码中的字,在被显示到页面后,居然变成了正确的字。因此我们初步推断知乎在该页面运用了字体反爬。</p>
<p>接下来我们打开 F12 -&gt; Network 页面,选择 Font,观察知乎加载的字体。</p>
<p>
<img src="/../img/image-1.png.webp" srcset="/img/loading.gif" lazyload alt="知乎加载的字体">
</p>
<p><img src="/../img/image-1.png" srcset="/img/loading.gif" lazyload alt="知乎加载的字体"></p>
<p>右键选择 Open in new tab 将字体保存下来。</p>
<p>
<img src="/../img/image-2.png.webp" srcset="/img/loading.gif" lazyload alt="下载的字体文件">
</p>
<p><img src="/../img/image-2.png" srcset="/img/loading.gif" lazyload alt="下载的字体文件"></p>
<p>将字体后缀名改为 .ttf <sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label=".ttf 是因为 `data:font/ttf;...` 代表该字体是 ttf 格式的。">[3]</span></a></sup> 并打开。</p>
<div class="group-image-container"><div class="group-image-row"><div class="group-image-wrap">
<img src="/../img/image-3.png.webp" srcset="/img/loading.gif" lazyload alt="正常字体">
</div><div class="group-image-wrap">
<img src="/../img/image-4.png.webp" srcset="/img/loading.gif" lazyload alt="反爬字体">
</div></div></div>
<div class="group-image-container"><div class="group-image-row"><div class="group-image-wrap"><img src="/../img/image-3.png" srcset="/img/loading.gif" lazyload alt="正常字体"></div><div class="group-image-wrap"><img src="/../img/image-4.png" srcset="/img/loading.gif" lazyload alt="反爬字体"></div></div></div>
<figcaption aria-hidden="true" class="image-caption">左:正常字体 右:反爬字体</figcaption>

<p>与正常字体对比,我们下载的字体明显替换了部分字体,这便是知乎用于反爬的字体了。接下来我们将分析这个字体并给出应对方案。</p>
<h2 id="三、致命缺陷"><a href="#三、致命缺陷" class="headerlink" title="三、致命缺陷"></a>三、致命缺陷</h2><p>字体反爬的根本原理是替换原本的字为一个新字,再用字体将新字渲染为原字,这样对程序而言就只见到新字而不是旧字了,而用户看到的还是原本的内容。因此只要找到新字与原字间的对应关系便可解决该反爬。而要找到这个对应关系,抓住字体中各个字形的特征是必不可少的一环。</p>
<p>我们打开 <a target="_blank" rel="noopener" href="https://fontdrop.info/">FontDrop!</a> 加载字体,向下翻,观察字形的特征。</p>
<p>
<img src="/../img/image-5.png.webp" srcset="/img/loading.gif" lazyload alt="字体中的字形">
</p>
<p><img src="/../img/image-5.png" srcset="/img/loading.gif" lazyload alt="字体中的字形"></p>
<p>我们发现字形的 Glyph 为 uni662F 而 Unicode 为 65F6,接下来我们试着查询这两个十六进制数对应的字:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs python">glyph = <span class="hljs-string">&quot;\u662F&quot;</span><br>unicode = <span class="hljs-string">&quot;\u65F6&quot;</span><br><span class="hljs-built_in">print</span>(glyph, unicode)<br><span class="hljs-comment"># output: 是 时</span><br></code></pre></td></tr></table></figure>

Expand Down Expand Up @@ -812,5 +800,5 @@ <h4 class="modal-title w-100 font-weight-bold">搜索</h4>
<noscript>
<div class="noscript-warning">博客在允许 JavaScript 运行的环境下浏览效果更佳</div>
</noscript>
<!-- hexo injector body_end start --><script async src="/js/image-ng.js"></script><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
<!-- hexo injector body_end start --><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
</html>
4 changes: 2 additions & 2 deletions 2023/07/06/zerotier-planet-convert/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
<meta property="og:description" content="由于国内特殊的网络原因,Zerotier 官方提供的 Planet 用户体验不佳。为此,不少人选择自建私有 Moon,甚至私有 Planet 服务器。然而,正如官方文档所说,使用私有 Planet 服务器会使你的节点无法找到其他的标准节点。本文试图提出一种方案在使用私有 Planet 服务器的同时与标准节点通信。">
<meta property="og:locale" content="zh_CN">
<meta property="article:published_time" content="2023-07-06T04:37:41.000Z">
<meta property="article:modified_time" content="2024-09-11T13:57:29.450Z">
<meta property="article:modified_time" content="2024-09-11T14:16:47.517Z">
<meta property="article:author" content="cxzlw">
<meta property="article:tag" content="cxzlw">
<meta property="article:tag" content="Zerotier">
Expand Down Expand Up @@ -752,5 +752,5 @@ <h4 class="modal-title w-100 font-weight-bold">搜索</h4>
<noscript>
<div class="noscript-warning">博客在允许 JavaScript 运行的环境下浏览效果更佳</div>
</noscript>
<!-- hexo injector body_end start --><script async src="/js/image-ng.js"></script><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
<!-- hexo injector body_end start --><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
</html>
4 changes: 2 additions & 2 deletions 2023/08/04/permission-system-design-share/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
<meta property="og:description" content="最近在参与某 OJ 的开发,过程中我们需要一个权限系统。作为一个热爱 MC 的开发者,我很喜欢 luckperms 的设计,于是这个小东西就出来了。在这里给大家分享我们的权限系统设计。">
<meta property="og:locale" content="zh_CN">
<meta property="article:published_time" content="2023-08-04T07:32:59.000Z">
<meta property="article:modified_time" content="2024-09-11T13:57:29.450Z">
<meta property="article:modified_time" content="2024-09-11T14:16:47.517Z">
<meta property="article:author" content="cxzlw">
<meta property="article:tag" content="cxzlw">
<meta property="article:tag" content="权限系统">
Expand Down Expand Up @@ -776,5 +776,5 @@ <h4 class="modal-title w-100 font-weight-bold">搜索</h4>
<noscript>
<div class="noscript-warning">博客在允许 JavaScript 运行的环境下浏览效果更佳</div>
</noscript>
<!-- hexo injector body_end start --><script async src="/js/image-ng.js"></script><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
<!-- hexo injector body_end start --><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
</html>
16 changes: 5 additions & 11 deletions 2023/08/31/cell-structure/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<meta property="og:image" content="https://blog.cxzlw.top/img/plant.jpg">
<meta property="og:image" content="https://blog.cxzlw.top/img/prokaryotic.jpg">
<meta property="article:published_time" content="2023-08-31T15:11:29.000Z">
<meta property="article:modified_time" content="2024-09-11T13:57:29.450Z">
<meta property="article:modified_time" content="2024-09-11T14:16:47.517Z">
<meta property="article:author" content="cxzlw">
<meta property="article:tag" content="cxzlw">
<meta property="article:tag" content="Blog">
Expand Down Expand Up @@ -437,9 +437,7 @@ <h3 id="Ribosomes"><a href="#Ribosomes" class="headerlink" title="Ribosomes"></a
<li>Make proteins by translating RNA codes</li>
</ul>
<h2 id="Animal-cell"><a href="#Animal-cell" class="headerlink" title="Animal cell"></a>Animal cell</h2><p>Animal cell has all above.</p>
<h3 id="Diagram"><a href="#Diagram" class="headerlink" title="Diagram"></a>Diagram</h3><p>
<img src="/../img/animal.jpg.webp" srcset="/img/loading.gif" lazyload alt="Animal cell diagram">
</p>
<h3 id="Diagram"><a href="#Diagram" class="headerlink" title="Diagram"></a>Diagram</h3><p><img src="/../img/animal.jpg" srcset="/img/loading.gif" lazyload alt="Animal cell diagram"></p>
<h2 id="Plant-cell"><a href="#Plant-cell" class="headerlink" title="Plant cell"></a>Plant cell</h2><p>Plant cell especially also have:</p>
<h3 id="Cell-wall"><a href="#Cell-wall" class="headerlink" title="Cell wall"></a>Cell wall</h3><p>Rigid to keep the shape of the cell, strengthens the cell.</p>
<ul>
Expand All @@ -455,19 +453,15 @@ <h3 id="Chloroplast"><a href="#Chloroplast" class="headerlink" title="Chloroplas
<ul>
<li>In green parts of plants</li>
</ul>
<h3 id="Diagram-1"><a href="#Diagram-1" class="headerlink" title="Diagram"></a>Diagram</h3><p>
<img src="/../img/plant.jpg.webp" srcset="/img/loading.gif" lazyload alt="Plant cell diagram">
</p>
<h3 id="Diagram-1"><a href="#Diagram-1" class="headerlink" title="Diagram"></a>Diagram</h3><p><img src="/../img/plant.jpg" srcset="/img/loading.gif" lazyload alt="Plant cell diagram"></p>
<h2 id="Prokaryotic-cell"><a href="#Prokaryotic-cell" class="headerlink" title="Prokaryotic cell"></a>Prokaryotic cell</h2><p>Prokaryotic cell <strong>do not</strong> have Mitochondria and Nucleus but organelles below.</p>
<h3 id="Cell-wall-1"><a href="#Cell-wall-1" class="headerlink" title="Cell wall"></a>Cell wall</h3><p>Rigid to keep the shape of the cell, strengthens the cell.</p>
<ul>
<li>Made of <strong>peptidoglycan</strong> not cellulose</li>
</ul>
<h3 id="Circular-DNA"><a href="#Circular-DNA" class="headerlink" title="Circular DNA"></a>Circular DNA</h3><p>Instead of chromosomes, also called <em>Nucleoid</em>, is essential for controlling the acticity and reproduction of the<br>prokaryotic cell.</p>
<h3 id="Plasmid"><a href="#Plasmid" class="headerlink" title="Plasmid"></a>Plasmid</h3><p>Small circles of DNA.</p>
<h3 id="Diagram-2"><a href="#Diagram-2" class="headerlink" title="Diagram"></a>Diagram</h3><p>
<img src="/../img/prokaryotic.jpg.webp" srcset="/img/loading.gif" lazyload alt="Prokaryotic cell diagram">
</p>
<h3 id="Diagram-2"><a href="#Diagram-2" class="headerlink" title="Diagram"></a>Diagram</h3><p><img src="/../img/prokaryotic.jpg" srcset="/img/loading.gif" lazyload alt="Prokaryotic cell diagram"></p>
<h2 id="Differences"><a href="#Differences" class="headerlink" title="Differences"></a>Differences</h2><h3 id="Organelles-in-cells"><a href="#Organelles-in-cells" class="headerlink" title="Organelles in cells"></a>Organelles in cells</h3><table>
<thead>
<tr>
Expand Down Expand Up @@ -931,5 +925,5 @@ <h4 class="modal-title w-100 font-weight-bold">Search</h4>
<noscript>
<div class="noscript-warning">Blog works best with JavaScript enabled</div>
</noscript>
<!-- hexo injector body_end start --><script async src="/js/image-ng.js"></script><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
<!-- hexo injector body_end start --><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
</html>
22 changes: 6 additions & 16 deletions 2023/10/03/busuanzi-bug/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
<meta property="og:image" content="https://blog.cxzlw.top/img/img-20231004-chrome-busuanzi.png">
<meta property="og:image" content="https://blog.cxzlw.top/img/image-8.png">
<meta property="article:published_time" content="2023-10-03T15:06:16.000Z">
<meta property="article:modified_time" content="2024-09-11T13:57:29.450Z">
<meta property="article:modified_time" content="2024-09-11T14:16:47.517Z">
<meta property="article:author" content="cxzlw">
<meta property="article:tag" content="Hexo">
<meta property="article:tag" content="不蒜子">
Expand Down Expand Up @@ -338,26 +338,16 @@ <h1 id="seo-header">不蒜子在 Safari 中计数异常</h1>
<div class="markdown-body">

<p>近期,我的博客在 Safari 中,文章访问量异常的大。经过抓包,确认了问题与 Referer 头相关。然而,Referrer-Policy 并没有解决问题。这与其接口设计有关。本文将进行解释,并提出我的解决方案。</p>
<h2 id="问题表现"><a href="#问题表现" class="headerlink" title="问题表现"></a>问题表现</h2><div class="group-image-container"><div class="group-image-row"><div class="group-image-wrap">
<img src="/../img/image-6.png.webp" srcset="/img/loading.gif" lazyload alt="Safari 打开效果">
</div><div class="group-image-wrap">
<img src="/../img/image-7.png.webp" srcset="/img/loading.gif" lazyload alt="正常情况打开效果">
</div></div></div>
<h2 id="问题表现"><a href="#问题表现" class="headerlink" title="问题表现"></a>问题表现</h2><div class="group-image-container"><div class="group-image-row"><div class="group-image-wrap"><img src="/../img/image-6.png" srcset="/img/loading.gif" lazyload alt="Safari 打开效果"></div><div class="group-image-wrap"><img src="/../img/image-7.png" srcset="/img/loading.gif" lazyload alt="正常情况打开效果"></div></div></div>

<figcaption aria-hidden="true" class="image-caption">左:Safari 打开效果 右:正常情况打开效果</figcaption>

<p>如图所示,Safari 显示的访问量高达 1k+,比实际访问量高出了 1448 次。这是很奇怪的,引起了我的注意。</p>
<h2 id="问题原因"><a href="#问题原因" class="headerlink" title="问题原因"></a>问题原因</h2><p>为了找到引起这个问题的原因,我们对上述页面分别向 busuanzi 发的请求都进行抓包。</p>
<h3 id="Safari"><a href="#Safari" class="headerlink" title="Safari"></a>Safari</h3><p>
<img src="/../img/img-20231004-safari-busuanzi.png.webp" srcset="/img/loading.gif" lazyload alt="Safari 向 busuanzi 发的请求">
</p>
<h3 id="Chrome"><a href="#Chrome" class="headerlink" title="Chrome"></a>Chrome</h3><p>
<img src="/../img/img-20231004-chrome-busuanzi.png.webp" srcset="/img/loading.gif" lazyload alt="Chrome 向 busuanzi 发的请求">
</p>
<h3 id="Safari"><a href="#Safari" class="headerlink" title="Safari"></a>Safari</h3><p><img src="/../img/img-20231004-safari-busuanzi.png" srcset="/img/loading.gif" lazyload alt="Safari 向 busuanzi 发的请求"></p>
<h3 id="Chrome"><a href="#Chrome" class="headerlink" title="Chrome"></a>Chrome</h3><p><img src="/../img/img-20231004-chrome-busuanzi.png" srcset="/img/loading.gif" lazyload alt="Chrome 向 busuanzi 发的请求"></p>
<p>从上面的两个请求,可以看出Safari 向 busuanzi 发送的请求 Referer 头是错误的。这导致 Safari 获得了 <code>https://blog.cxzlw.top/</code> 的浏览量信息。这是为什么呢?</p>
<h2 id="高人指点"><a href="#高人指点" class="headerlink" title="高人指点"></a>高人指点</h2><p>加入 busuanzi QQ 群<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="419260983,来自 busunazi 官网 [不蒜子 | 不如](https://ibruce.info/2015/04/04/busuanzi/#more:~:text=%E5%A3%B0%E6%98%8E%EF%BC%9A%E7%BB%8F%E7%94%A8%E6%88%B7%E5%BB%BA%E8%AE%AE%EF%BC%8C%E6%96%B0%E5%A2%9E%E4%B8%8D%E8%92%9C%E5%AD%90%E4%BA%A4%E6%B5%81QQ%E7%BE%A4%EF%BC%9A419260983%EF%BC%8C%E6%AC%A2%E8%BF%8E%E5%A4%A7%E5%AE%B6%E5%8A%A0%E5%85%A5%E3%80%82%E2%80%94%E2%80%94%20%E4%B8%8D%E5%A6%82%EF%BC%8C2017.02)">[1]</span></a></sup>后,我在群中询问。并得到了大佬「 」的回复。<br>
<img src="/../img/image-8.png.webp" srcset="/img/loading.gif" lazyload alt="大佬的解答">
</p>
<h2 id="高人指点"><a href="#高人指点" class="headerlink" title="高人指点"></a>高人指点</h2><p>加入 busuanzi QQ 群<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="419260983,来自 busunazi 官网 [不蒜子 | 不如](https://ibruce.info/2015/04/04/busuanzi/#more:~:text=%E5%A3%B0%E6%98%8E%EF%BC%9A%E7%BB%8F%E7%94%A8%E6%88%B7%E5%BB%BA%E8%AE%AE%EF%BC%8C%E6%96%B0%E5%A2%9E%E4%B8%8D%E8%92%9C%E5%AD%90%E4%BA%A4%E6%B5%81QQ%E7%BE%A4%EF%BC%9A419260983%EF%BC%8C%E6%AC%A2%E8%BF%8E%E5%A4%A7%E5%AE%B6%E5%8A%A0%E5%85%A5%E3%80%82%E2%80%94%E2%80%94%20%E4%B8%8D%E5%A6%82%EF%BC%8C2017.02)">[1]</span></a></sup>后,我在群中询问。并得到了大佬「 」的回复。<br><img src="/../img/image-8.png" srcset="/img/loading.gif" lazyload alt="大佬的解答"></p>
<p>图中的链接是:<a target="_blank" rel="noopener" href="https://zhufan.net/2020/10/14/referrer-policy%E9%82%A3%E4%BA%9B%E4%BA%8B/">Referrer Policy那些事 | 煮饭🍚</a></p>
<h2 id="Referrer-Policy"><a href="#Referrer-Policy" class="headerlink" title="Referrer-Policy"></a>Referrer-Policy</h2><p>发生了什么呢?浏览器开始用 <code>strict-origin-when-cross-origin</code> 替换之前的 <code>no-referrer-when-downgrade</code> 作为 <code>Referrer-Policy</code> 的默认值,而这个新策略破坏了向不蒜子发送的 <code>Referer</code> 头。</p>
<p>于是我设置了 <code>Referrer-Policy</code>。具体来说,我将下面的代码加入了我的博客:</p>
Expand Down Expand Up @@ -814,5 +804,5 @@ <h4 class="modal-title w-100 font-weight-bold">搜索</h4>
<noscript>
<div class="noscript-warning">博客在允许 JavaScript 运行的环境下浏览效果更佳</div>
</noscript>
<!-- hexo injector body_end start --><script async src="/js/image-ng.js"></script><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
<!-- hexo injector body_end start --><script async src="/js/progressbar-done.js"></script><!-- hexo injector body_end end --></body>
</html>
Loading

0 comments on commit 9a90788

Please sign in to comment.