diff --git a/docs/get-started/14AI-extraction.md b/docs/get-started/14AI-extraction.md index 738f4ab47..d09a11fd7 100644 --- a/docs/get-started/14AI-extraction.md +++ b/docs/get-started/14AI-extraction.md @@ -10,10 +10,9 @@ Platon.ai's algorithm can transform web pages into data with 100% zero human int and even without machine learning training. It is driven by unsupervised machine learning, similar to how humans read and understand the internet. -After rendering each web page in a browser, we use JavaScript to calculate a series of properties for each web page -element, mainly including the element's position and size. At the same time, we construct more interesting implicit -features of web page elements, such as topological and semantic features. -Thus, **a web page can be visualized as a geometric graph composed of many rectangles with attributes, and when +We calculate a series of features for each element on a webpage after rendering it in a browser, including visual, +geometric, topological, and semantic features. +**A web page can be considered as a geometric graph composed of many rectangles with attributes, and when combined, it resembles a bundle of newspapers. The World Wide Web (WWW) can be viewed as a fiber bundle with a three-dimensional manifold as the base space.** diff --git a/docs/get-started/zh/14AI-extraction.md b/docs/get-started/zh/14AI-extraction.md index eff37feda..d375029d1 100644 --- a/docs/get-started/zh/14AI-extraction.md +++ b/docs/get-started/zh/14AI-extraction.md @@ -1,11 +1,13 @@ AI 自动提取 = -Platon.ai 的目标是开发一套高效采集并阅读理解复杂网站的 AI,完整精确输出数据和知识。目前我们开源了“高效采集”这一部分,“阅读理解”这一部分是个长期且艰巨的任务,我们发布了一个“阅读理解**网页结构**并完整精确输出数据”的[预览版](https://github.com/platonai/PulsarRPAPro#run-auto-extract),这个版本在不久的未来也会开源。 +Platon.ai 的目标是开发一套高效采集并阅读理解复杂网站的 AI,完整精确输出数据和知识。目前我们开源了“高效采集”这一部分,“阅读理解”这一部分是个长期 +且艰巨的任务,我们发布了一个“阅读理解**网页结构**并完整精确输出数据”的[预览版](https://github.com/platonai/PulsarRPAPro#run-auto-extract),这个版本在不久的未来也会开源。 Platon.ai 的算法能够 100% 无人干预将网页变成数据 -- 不需要配规则,甚至也不需要机器学习训练,它是无监督机器学习驱动的,像人一样去阅读理解互联网。 -我们将每个网页在浏览器中渲染后,通过 js 计算出每个网页元素的一系列属性,主要包括元素的位置和大小。同时,我们构造了网页元素的更多有趣的隐含特征,譬如拓扑和语义相关的特征。目前,包括位置和大小在内,我们为每个网页元素构造了 100 多个独立特征。这样,**一张网页可视作由很多个带属性的矩形组成的几何图形(Geometric graph),将全体网页压到一起,如同一捆报纸,万维网(WWW)可以被视作以三维流形为基空间的纤维丛。** +我们将每个网页在浏览器中渲染后,计算出每个网页元素的一系列特征,包括视觉、几何、拓扑和语义特征。**一张网页可看作由很多个带属性的矩形组成的几何图形 +(Geometric graph),将所有网页压到一起,如同一捆报纸,万维网(WWW)可以被视作以三维流形为基空间的纤维丛。**
auto extracted chart