-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsnapshot__snapshot_tests____docs__bots.snap
186 lines (142 loc) · 7.65 KB
/
snapshot__snapshot_tests____docs__bots.snap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
source: repology-webapp/tests/snapshot_tests/mod.rs
expression: snapshot
snapshot_kind: text
---
Status: 200
Header: content-type: text/html
Header: content-length: 7652
---
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Multiple package repositories analyzer">
<meta name="keywords" content="repository, package, packages, version">
<meta name="author" content="Dmitry Marakasov">
<title>Bots information - Repology</title>
<link rel="stylesheet" href="/static/bootstrap.min.v3.3.7.8dc6d358477be27a.css">
<link rel="stylesheet" href="/static/repology.v21.bb7b05a7b3bfbdb9.css">
<link rel="icon" href="/static/repology.v1.6108dff405ea1a42.ico" sizes="16x16 32x32 64x64" type="image/x-icon">
<link rel="search" type="application/opensearchdescription+xml" title="Repology packages" href="/opensearch/project.xml">
<link rel="search" type="application/opensearchdescription+xml" title="Repology maintainers" href="/opensearch/maintainer.xml">
</head>
<body>
<nav class="navbar navbar-default navbar-static-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#repology-navbar-collapse" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img alt="Repology" src="/static/repology40x40.v1.0ea026630d9180a2.png" width="40" height="40">
</a>
</div>
<div class="collapse navbar-collapse" id="repology-navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="/projects/">Projects</a></li>
<li><a href="/maintainers/">Maintainers</a></li>
<li><a href="/repositories/statistics">Repositories</a></li>
<li><a href="/tools">Tools</a></li>
<li><a href="/security/recent-cves">Security</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li><a href="/news">News</a></li>
<li class="active"><a href="/docs">Docs</a></li>
</ul>
</div>
</div>
</nav>
<div class="container">
<h1 class="page-header">Repology bots</h1>
</div>
<div class="container">
<p>You're likely seeing this because your site had been visited by Repology.</p>
<h3>What's repology?</h3>
<p>Repology is a free and open source service which monitors a huge number of
<a href="https://en.wikipedia.org/wiki/Package_(package_management_system)">package</a>
<a href="https://en.wikipedia.org/wiki/Software_repository">repositories</a>,
comparing versions of packaged software across them and gathering other information
on free and open source projects which may be useful to the
<a href="https://en.wikipedia.org/wiki/Free_and_Open-Source_Software">F/OSS</a>
community.</p>
<h3>What robots are used by repology?</h3>
<h4>repology-fetcher</h4>
<p>Identifies itself as <code>repology-fetcher/0 (+https://repology.org/docs/bots)</code></p>
<p>This process regularly retrieves information from software repositories. The
preferred way is to get a single file which describes all the available packages,
but for repositories which don't support this the robot may iterate over some
web API. The robot visits a site on each update cycle (~2-3 hours currently) and
fetches files it needs sequentially (e.g. it never does parallel requests).</p>
<p>You may find metadata on which repositories are fetched
<a href="https://github.com/repology/repology-updater/tree/master/repos.d">here</a>
and the fetcher code
<a href="https://github.com/repology/repology-updater/tree/master/repology/fetchers">here</a>.
</p>
<p>If you think the robot creates excess load on your site, feel free to drop an
<a href="https://github.com/repology/repology-updater/issues/new">issue</a> in the GitHub.
If Repology gets information on your repository through web API, we'd greatly
appreciate if you provide a regular dump of package information from your repository
(data used by Repology include package name, version, one-line summary, list of maintainers,
list of categories/tags, homepage and download URLs, license information) in machine readable
format (preferably JSON) as well. This will allow more frequent updates with less load
on the repository side, and faster update process, simpler parsing code and probably more
useful data for Repology.</p>
<h4>repology-linkchecker</h4>
<p>Identifies itself as <code>repology-linkchecker/1 (+https://repology.org/docs/bots)</code></p>
<p>This process pokes links retrieved from package metadata to check that they
are alive. Dead links and links which involve redirects are reported to package
maintainers so the package metadata could be correspondingly updated. If this robot
visits you site, this means it is mentioned in some package metadata.</p>
<p>The process visits each link once a week. It issues
<a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods">HEAD</a>
request first, and only if that fails it falls back to a
<a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods">GET</a>
request. This means that in most cases the robot won't retrieve the contents of a URL,
using only marginal amount of web traffic. Also, there's a delay of 3 seconds between
consecutive requests to a single hostname, to ensure no excess site load is generated.</p>
<p>You may see the link checker source code
<a href="https://github.com/repology/repology-linkchecker">here</a>.</p>
<h4>repology-vulnupdater</h4>
<p>Identifies itself as <code>repology-vulnupdater/1 (+https://repology.org/docs/bots)</code></p>
<p>This process maintains up to date information on software security vulnerabilities
in Repology by periodically fetching <a href="https://nvd.nist.gov/vuln/data-feeds#JSON_FEED">NVD JSON feeds</a>.
It issues a <a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods">GET</a>
request to each feed every 10 minutes, <code>Etag</code> and <code>If-None-Match</code> HTTP headers
are used to avoid refetching the files if they have not changed since the last request.</p>
<p>You may see the vulnupdater's source code
<a href="https://github.com/repology/repology-vulnupdater">here</a>.</p>
<h3>robots.txt policy</h3>
<p>Please note that none of our robots is a crawler. Unlike most search engines which would
try to gather all available URLs from a specific website, so it may be required to restrict
them through <a href="https://en.wikipedia.org/wiki/Robots_exclusion_standard">robots.txt</a>
file, Repology only interacts with a <strong>fixed</strong> small set of <strong>man-made</strong>
links, and needs unconditional access to them to perform its tasks (e.g. retrieving repository
information and link availability checking), so neither of repology robots respects
<a href="https://en.wikipedia.org/wiki/Robots_exclusion_standard">robots.txt</a>.</p>
</div>
<footer class="footer">
<div class="container">
<p class="pull-right footer-links">
GitHub repositories:
<a href="https://github.com/repology/repology-rs/tree/master/repology-webapp">webapp</a>,
<a href="https://github.com/repology/repology-updater">updater</a>,
<a href="https://github.com/repology/repology-rules">ruleset</a>
</p>
<p>
Copyright (C) 2016-2025 Dmitry Marakasov<br>
Code licensed under GPLv3+.<br>
Powered by Rust.
</p>
</div>
</footer>
<script src="/static/jquery-3.7.1.min.cc6904672d4db9a3.js"></script>
<script src="/static/bootstrap.min.v3.3.7.66502a3c5769c640.js"></script>
<script src="/static/moment.min.v2.29.2.b7efa16ed0d164d6.js"></script>
<script src="/static/repology.v2.ee0228ab3d88406f.js"></script>
</body>
</html>