From be868179e10f838f0af15a0fe71d00f8e0a67e6c Mon Sep 17 00:00:00 2001 From: cxzlw Date: Wed, 5 Jul 2023 14:01:15 +0000 Subject: [PATCH] Site updated: 2023-07-05 14:01:14 --- 2023/07/{05 => 04}/zhihu-aac-old/index.html | 30 +- 404.html | 4 +- about/index.html | 8 +- archives/2023/07/index.html | 6 +- archives/2023/index.html | 6 +- archives/index.html | 6 +- categories/index.html | 4 +- favicon.ico | Bin 0 -> 13982 bytes {imgs => img}/image-1.png | Bin {imgs => img}/image-2.png | Bin {imgs => img}/image-3.png | Bin {imgs => img}/image-4.png | Bin {imgs => img}/image-5.png | Bin {imgs => img}/image.png | Bin index.html | 12 +- links/index.html | 454 -------------------- local-search.xml | 6 +- sitemap.xml | 2 +- tags/index.html | 4 +- "tags/\345\217\215\347\210\254/index.html" | 6 +- "tags/\347\237\245\344\271\216/index.html" | 6 +- 21 files changed, 50 insertions(+), 504 deletions(-) rename 2023/07/{05 => 04}/zhihu-aac-old/index.html (96%) create mode 100644 favicon.ico rename {imgs => img}/image-1.png (100%) rename {imgs => img}/image-2.png (100%) rename {imgs => img}/image-3.png (100%) rename {imgs => img}/image-4.png (100%) rename {imgs => img}/image-5.png (100%) rename {imgs => img}/image.png (100%) delete mode 100644 links/index.html diff --git a/2023/07/05/zhihu-aac-old/index.html b/2023/07/04/zhihu-aac-old/index.html similarity index 96% rename from 2023/07/05/zhihu-aac-old/index.html rename to 2023/07/04/zhihu-aac-old/index.html index ca72712..6ac7919 100644 --- a/2023/07/05/zhihu-aac-old/index.html +++ b/2023/07/04/zhihu-aac-old/index.html @@ -7,8 +7,8 @@ - - + + @@ -19,18 +19,18 @@ - + - - - + + + - + @@ -288,7 +288,7 @@

聊聊知乎盐选反爬 (回答页篇)

近些阵子,知乎上线了针对专栏[1]中盐选文章的反爬系统,随后该系统也被运用在知乎回答页面中的盐选文章上。具体表现为爬取的文章内容中出现大量的错乱词汇。而在本篇文章中,我们将一步步带领各位解开这些乱码。在这个过程中,我们将对字体反爬有更深入的认识,并学到运用字体反爬时需要注意的问题。

一、知乎反爬效果

来自知乎回答不被爱是一种什么样的感受? - 知乎

-

乱码示意图

+

乱码示意图

如图所示,在页面源码中出现了大量乱码,例如(原字,错字):[2]