Docs: 1. improve AI-extraction docs

platonai · Apr 5, 2024 · f01c486 · f01c486
1 parent ce1d388
commit f01c486
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 6 deletions.
diff --git a/docs/get-started/14AI-extraction.md b/docs/get-started/14AI-extraction.md
@@ -10,10 +10,9 @@ Platon.ai's algorithm can transform web pages into data with 100% zero human int
 and even without machine learning training. It is driven by unsupervised machine learning, similar to how humans read 
 and understand the internet.
 
-After rendering each web page in a browser, we use JavaScript to calculate a series of properties for each web page 
-element, mainly including the element's position and size. At the same time, we construct more interesting implicit 
-features of web page elements, such as topological and semantic features. 
-Thus, **a web page can be visualized as a geometric graph composed of many rectangles with attributes, and when 
+We calculate a series of features for each element on a webpage after rendering it in a browser, including visual, 
+geometric, topological, and semantic features.
+**A web page can be considered as a geometric graph composed of many rectangles with attributes, and when 
 combined, it resembles a bundle of newspapers. The World Wide Web (WWW) can be viewed as a fiber bundle with a 
 three-dimensional manifold as the base space.**
 

diff --git a/docs/get-started/zh/14AI-extraction.md b/docs/get-started/zh/14AI-extraction.md
@@ -1,11 +1,13 @@
 AI 自动提取
 =
 
-Platon.ai 的目标是开发一套高效采集并阅读理解复杂网站的 AI，完整精确输出数据和知识。目前我们开源了“高效采集”这一部分，“阅读理解”这一部分是个长期且艰巨的任务，我们发布了一个“阅读理解**网页结构**并完整精确输出数据”的[预览版](https://github.com/platonai/PulsarRPAPro#run-auto-extract)，这个版本在不久的未来也会开源。
+Platon.ai 的目标是开发一套高效采集并阅读理解复杂网站的 AI，完整精确输出数据和知识。目前我们开源了“高效采集”这一部分，“阅读理解”这一部分是个长期
+且艰巨的任务，我们发布了一个“阅读理解**网页结构**并完整精确输出数据”的[预览版](https://github.com/platonai/PulsarRPAPro#run-auto-extract)，这个版本在不久的未来也会开源。
 
 Platon.ai 的算法能够 100% 无人干预将网页变成数据 -- 不需要配规则，甚至也不需要机器学习训练，它是无监督机器学习驱动的，像人一样去阅读理解互联网。
 
-我们将每个网页在浏览器中渲染后，通过 js 计算出每个网页元素的一系列属性，主要包括元素的位置和大小。同时，我们构造了网页元素的更多有趣的隐含特征，譬如拓扑和语义相关的特征。目前，包括位置和大小在内，我们为每个网页元素构造了 100 多个独立特征。这样，**一张网页可视作由很多个带属性的矩形组成的几何图形（Geometric graph），将全体网页压到一起，如同一捆报纸，万维网（WWW）可以被视作以三维流形为基空间的纤维丛。**
+我们将每个网页在浏览器中渲染后，计算出每个网页元素的一系列特征，包括视觉、几何、拓扑和语义特征。**一张网页可看作由很多个带属性的矩形组成的几何图形
+（Geometric graph），将所有网页压到一起，如同一捆报纸，万维网（WWW）可以被视作以三维流形为基空间的纤维丛。**
 
 <div style="text-align: center">
     <img width="400px" src=https://pica.zhimg.com/80/v2-1262abb4d28b31a00bcf1199b1aba441_1440w.jpeg?source=d16d100b  alt="auto extracted chart"/>