-
Notifications
You must be signed in to change notification settings - Fork 1
/
devposts.html
411 lines (402 loc) · 17.7 KB
/
devposts.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
<!DOCTYPE html>
<html lang="en">
<head>
<title>Elite: Dangerous - Unofficial Dev Forum Posts Tracker, by Athanasius</title>
<link href="style.css" rel="stylesheet" type="text/css"/>
<link rel="alternate" type="application/rss+xml" title="Elite: Dangerous Dev Posts" href="https://ed.miggy.org/devtracker/ed-dev-posts.rss" />
<link rel="alternate" type="application/rss+xml" title="Elite: Dangerous Dev Posts (Full Text)" href="https://ed.miggy.org/devtracker/ed-dev-posts-fulltext.rss" />
</head>
<body>
<h1>Now collecting new posts on new forums</h1>
<p>
As I type this it is 2019-04-08 and the Frontier forums have been up
again on the XenForo v2 version for 11 days. Sadly it turned out
that the 'Dev Feed' has two problems:
<ol>
<li>
Whilst the "<a
href="https://forums.frontier.co.uk/find-threads/staffpost">Staff
/ Developer Posts</a>" is listing both threads that 'developers's
started and any others they posted in, they're listed in order of
a latest reply by <i>anyone</i>. Also the 'Dev Post' button goes
to the latest 'developer' post in the given thread, leaving you no
easy way to find prior ones.
</li>
<li>
There is RSS output available for that URL (append '?rss=1'), but
the resulting feed only includes the [i]first[/i] 'developer'
post per thread.
</li>
</ol>
</p>
<p>
So I felt obligated to update my scraper to work against the new
forums. In some ways this proved to be easier than for the old site
(e.g. consistent Content-Type of utf8, instead of randomly changing, so no
hacks required to deal with that). I have also been given a key for
the XF2 API which makes retrieving some of the data cleaner, with
less load on the forums servers.
</p>
<p>
For historical purposes I'll keepo the three archives of the pre-XF2 posts available:
<ol>
<li>
Postgresql 'pg_dump --format=plain': <a href="devtracker/archives/athan-ed-dev-rss-archive-20190325.sql.gz">athan-ed-dev-rss-archive-20190325.sql.gz</a>
</li>
<li>
RSS, precis text only: <a href="devtracker/archives/athan-ed-dev-rss-archive-20190325.rss.gz">athan-ed-dev-rss-archive-20190325.rss.gz</a>
</li>
<li>
RSS, fulltext: <a href="devtracker/archives/athan-ed-dev-rss-fulltext-archive-20190325.rss.gz">athan-ed-dev-rss-archive-20190325.rss.gz</a>
</li>
</ol>
</p>
<h2>
What is this?
</h2>
<p>
If you're a fan of <a href="http://www.elitedangerous.com/">Elite:
Dangerous</a> then you may also follow the
<a href="http://frontier.co.uk/">Frontier Developments</a> <a
href="http://forums.frontier.co.uk/">forums</a>. If you do that then you may be
frustrated at the lack of an easy way to see just the developer
posts.
</p>
<p>
You could learn who they are and check around their profile pages
manually now and then.
</p>
<p>
You could write a <a href="https://elitedevtracker.com/">simple scraper that aggregates all the known
developer posts onto one page</a>.
</p>
<p>
Or you could decide that you much prefer an <a
href="http://www.rssboard.org/rss-specification">RSS feed</a>, so write a
scraper and RSS output for it all. That's what I did. I even ended
up making an additional feed with the full "clicked-through" text of
each post.
</p>
<h2>
How do I get it?
</h2>
<p>
If your web browser is set up with an RSS feed extension then you
should see the RSS icon up in your URL bar. Simply click on that and
get on with adding one of the feeds to your reader.
</p>
<p>
NB: The feeds are updated at most every 5 minutes, more slowly if the
forums are especially busy. Thus there's no point polling these feeds
more often than every 5 minutes. Also the output contains the last 28 days
worth of posts, so if you check the feed any less frequently than that
you will miss posts.
</p>
<p>
If for some reason you need to enter a feed manually, then these are
the URLs:
</p>
<ol>
<li>
The legacy 'precis text only' feed - <a href="https://ed.miggy.org/devtracker/ed-dev-posts.rss">https://ed.miggy.org/devtracker/ed-dev-posts.rss</a>
</li>
<li>
The newer 'full text' feed - <a href="https://ed.miggy.org/devtracker/ed-dev-posts-fulltext.rss">https://ed.miggy.org/devtracker/ed-dev-posts-fulltext.rss</a>
</li>
</ol>
<p>
<b>Caveat 1</b> - The feeds will only contain the fully public
posts from the forums, i.e. nothing from the Private Backers Forum or
the Alpha Backers Forum. Also you'll see <i>all Frontier Developments
forum posts, not just those relating to Elite: Dangerous, <b>unless I
got around to blacklisting a particular non-ED forum</b>.</i>
</p>
<p>
<b>Caveat 2</b> - Each post in the 'precis text only' feed will be as it is seen on the
appropriate member profile page's "Activity List". That means you'll
only ever see the new text (only a portion of it for longer posts),
and never any quoted text.
<br/>
<br/>
The 'full text' feed will usually contain full text, including any
quotes, from the relevant post. Sometimes it may have fallen back to
just the precis text. This would happen if the 'click-through' to
retrieve the full post text failed. Instead of failing the whole
post, and possibly missing one entirely, I decided to store, and use,
at least what I could.
<p>
<b>Caveat 3</b> - I have specifically chosen to not include the QA-*
or Support-* accounts in my scraping, as they seldom post anything of
much value (to anyone but who they're replying to).
</p>
<p>
<b>Caveat 4</b> - Each post will be as it appeared on the forums at
the time the scraper first saw it. Posts in the RSS feed are never
updated for later edits.
</p>
<h2>Searching</h2>
<p>
There's a search available on <a href="devtracker/">the main feed
page</a>. Or maybe you're curious about <a
href="devtracker/frontier-dev-latest-post-times.html">the latest
seen and scraped post for each tracked developer</a>. This will
update after each run of the collector script (so currently every 5
minutes)
</p>
<h2>
Who are you?
</h2>
<p>
I go by the name 'Athanasius' online, and indeed I'm 'Cmdr Athanasius'
in Elite: Dangerous. You'll also find me as 'Athan' on <a
href="https://www.quakenet.org/">QuakeNet</a>'s
<a href="http://webchat.quakenet.org/?channels=elite-dangerous&uio=d4">#elite-dangerous</a>
and on the <a
href="http://forums.frontier.co.uk/member.php?u=28846">Frontier
Forums</a>.
</p>
<h2>
But what if you stop supporting this and it breaks?
</h2>
<p>
Well, I do have a history of stopping playing Elite Dangerous, yet I
still keep this project going for my own use. That said, if I do
completely lose interest this will keep running unless the server
goes away (it's paid for and used by other people). If it does
disappear then hopefully someone has bookmarked <a
href="https://github.com/Athanasius/ed-devtracker">the public github
repository</a> and knows enough perl and postgresql to get it up and
running elsewhere.
</p>
<p>
As of 2019-08-28 I <i>am</i> pondering a re-write in Python (3.x, not
2.7.x when that's EOL 1st Jan 2020!) so as to increase the chances of
someone else wanting to pick it up.
</p>
<hr/>
<a name="news"></a>
<h1>
News
</h1>
<p><u>2019-11-05 09:58 UTC</u><br />
New "Planet Zoo" forums added to the ignore list.
</p>
<p><u>2019-10-17 16:40 UTC</u><br />
I've just added support for BBCode [s]...[/s] to come out properly as
<s>strikethrough</s>.
</p>
<p><u>2019-05-22 09:47 UTC</u><br />
I've decided to permanently keep the RSS feeds at "last 28 days" of
posts, rather than the old-forums "7 days".
</p>
<p><u>2019-04-08 10:56 UTC</u><br />
Now scraping against the XF2 forums for the live feed. There's still
a little fix up that needs doing on some historical posts (where they
started a thread, rather than being a reply) to have the URLs point
to the actual thread on the new forums, rather than "Not Found".
</p>
<p>
Other than that everything should be working again, including the
search interface (although with a double '/' in some URLs for now,
which is harmless).
</p>
<p><u>2019-03-25 11:00 UTC</u><br />
The old forums are now down for the move to the new forums. I've
stopped my scraper from running and will shortly make archives of the
posts so far available.
</p>
<p><u>2018-07-13 15:30 UTC</u><br />
I've added "Off-Topic Discussion" to the list of ignored forums, for
what should be obvious reasons.
</p>
<p><u>2018-06-12 16:50 UTC</u><br />
All of the new JWE related forums are now on the ignore list. A
bunch got added for the game launch, so a few posts leaked into this
RSS feed.
</p>
<p><u>2018-03-28 15:45 UTC</u><br />
I've just implemented ignoring of selected forums. This is initially
the current four "Jurassic World: Evolution" forums. Once I'm
confident this is working without causing issues I may expand that
list so that any forum not specific to ED is ignored.
</p>
<p><u>2018-02-03 22:53 UTC</u><br />
I've removed 'Dale Emasiri' from the scraping as he's no longer
working at Frontier.
</p>
<p><u>2017-11-06 13:50 UTC</u><br />
I've just removed Drew Wagar from the monitored forum accounts. He's
no longer actively writing for ED and his posts aren't really
relevant to the game any more. Someone let me know if this changes
and I'll consider adding him back.
</p>
<p><u>2017-10-10 16:53 UTC</u><br />
Updated all the URLS to utilise https://ed.miggy.org/ instead of
https://miggy.org/games/elite-dangerous/
</p>
<p><u>2017-09-21 09:15 UTC</u><br />
The new 'full text' feed hasn't shown any issues in over a week of
testing, so it's now described and linked to on this page.
</p>
<p><u>2017-06-05 20:05 UTC</u><br />
Added Lloyd Morgan-Moore (157490) to the scraped forum accounts.
</p>
<p><u>2017-03-10 17:20 UTC</u><br />
I've just finished work to use what I hope are truly unique
<strong>and static</strong> GUIDs for each post. Before this change
if someone on the Frontier Forums edited the title of a topic then the
scraper would use the changed URL as the permaLink and thus put it
out as an additional new post. Where several posts had already been
made within the topic this would cause spam after the edit.
</p>
<p>I now strip out the embedded topic title from the URL, using just
those parts that are necessary to reach it on the forums. A
side-effect of setting this up is that the feed just got spammed with
a huge amount of not-actually-new posts due to changing over to using
these better GUIDs in the permaLink property of each post. This will
be a one-off thing.
</p>
<p><u>2016-12-01 02:49 UTC</u><br />
Changed the behaviour of the scraper so it no longer takes the
<em>first</em> known post it sees as evidence that it is up to date
for that forum member's activity. It will still mean skipping that
individual post, but it will continue to check all other posts
visible on the activity page. <strong>NB: This assumes there'll be
no more than 50 posts on an Activity List at any one time, else it
will think additional ones are new.</strong>
</p>
<p><u>2016-12-01 02:49 UTC</u><br />
Removed Greg Ryder from the list of monitored accounts, as apparently
he's been gone from Frontier for over a year now.
</p>
<p><u>2016-11-15 13:28 UTC</u><br />
I've just added Drew Wagar to the monitored forum posters. He's the
author of one of the official books and driver of the 'Formidine Rift'
mystery. He doesn't seem to post too much, but when he does it's
likely about said mystery.
</p>
<p><u>2016-08-26 15:12 UTC</u><br />
I've just changed the code to strip the topic titles out of posts
URLs. If they change, and they can with a moderator edit, the code
would see all the posts in that thread as new again. This is what
happened over the past day or so with the "The Galaxy - Is its size
now considered to be a barrier to gameplay by the Developers?" thread
in "Dangerous Discussion".
</p>
<p><u>2016-08-13 09:41 UTC</u><br />
I've now removed Ben Parry from the collector. He only ever posts in
there Star Citizen thread now.
</p>
<p><u>2016-06-12 18:50 UTC</u><br />
The issue with the post URLs seems to have been fixed.
</p><p>
Do note that because the URLs now include the thread title in text
form it means my scraper has seen a lot of older posts as now-new, as
it uses exactly that URL to know if it's seen a post before. So
there's been a splurge of ~50 posts repeated in the feed.
</p>
<p><u>2016-06-12</u><br />
There have been some forum changes that are breaking my scraper. I've
got code to work around one (putting text like " 4 replies and 242
views" inside the element of class=date). But the links in the
Activity Lists are now also broken, not properly containing the anchor
to the specific post within the thread.
</p><p>
I've PM'd BrettC about both and hopefully he'll fix things.
</p>
<p><u>2016-05-05 00:41 UTC</u><br/>
All URLs have been changed to use https://miggy.org/... rather than
http://www.miggy.org/. You may want to update your RSS readers. The
certificate for HTTPS/SSL is from LetsEncrypt and includes a full
enough chain of Certificate Authorities that it should validate in
any modern, up to date, web browser.
</p>
<p><u>2016-05-05 00:17 UTC</u><br/>
Added " (<Forum Name>)" text to the end of item titles. This
will only be the sub-forum name, but in most cases should help with
things like knowing if a post is in an Xbox forum, or a more general
one.
</p>
<p><u>2015-07-02 19:00 BST</u><br/>
As the QA-* accounts mostly seem to post things like "thanks for that
report", and otherwise things that aren't all that relevant I've
decided to no longer scrape them for this feed. Note that FD do now
seem to be pushing patch notes/changlogs to <a
href="https://community.elitedangerous.com/patch-notes">the Community
Site</a>, and that page has its own RSS feed.
<br/>
<br/>
Any posts I'd already scraped for QA-* accounts will stay in the
database but no new ones will appear. Also, lack of probably
usefulness is why I've not added the Support-* accounts either.
</p>
<p><u>2015-06-16 00:00 BST</u><br/>
I've just finished putting together a very basic <a href="devtracker/">Search
Interface</a> for the 'precis' text of all the posts in the database
that drives this RSS feed. Any feedback appreciated through the
usual channels.
</p>
<p><u>2015-06-12 16:04 BST</u><br/>
A small tweak to the scraping code means white space in posts is no
longer collapsed, this will preserve the formatting where a post
contains multiple paragraphs, bullet points etc.<br/>
Past posts will remain 'broken' in this respect.
</p>
<p><u>2015-06-12 15:45 BST</u><br/>
Tweaked the output code to use the https:// URI scheme instead of
http://, given the FD forums are enforcing the former now anyway.
<b>NB: this change will have caused GUIDs to change and thus the last
7 days worth of posts to re-appear in your feed.</b>
</p>
<p><u>2015-06-05 20:00 BST</u><br/>
I've just changed the RSS generation to output the last 7 days worth
of posts, rather than the last 100 posts as it was before. This
should hopefully avoid any issue in the future with low-frequency
pollers missing posts due to high dev post rate.
</p>
<p><u>2015-05-25 21:15 BST</u><br/>
Further to the last news post there's also an issue with superscripts
in posts. I've not yet time to look into this, but it will do things
like cause '10<sup>11</sup>' to be collapsed to '1011'.<br/>
<br/>
Also, there's a becoming-a-FAQ; "Why don't you show the full post,
including quotes where present, on each post in the feed?". The
answer being:
<blockquote>Currently the scraping is simple, look at (only the first
page/initial load of) each users' latest posts on their activity list.
That's what only contains what I scrape. To get the full posts and
quotes I'd have to follow each post link, incurring an extra page
load. As I've already been co-ordinating with the head moderator
BrettC on ensuring my scraper bot doesn't cause problems with
hammering the forums, particularly when during those times the forums
are already under high load, I can't afford to cause that extra load.
</blockquote>
</p>
<p><u>2015-01-27 20:05 GMT</u><br/>
It's come to my attention that despite claiming utf-8 output the feed
sometimes attempts to include characters from other encodings.
Unfortunately the source of the problem is that the FD forums are
telling my scraper script that they're in iso-8859-1 format but
allowing characters from windows-1252 encoding through as well.
</p>
<p>
Whilst I could try to hack around this with changing the input
encoding or doing some search/replace that feels like it will break
on something else in the future, so for now you'll have to put up
with the weirdness and blame FD's forum setup for allowing the 'bad'
characters'.
</p>
<p><u>2014-10-25 00:47 BST</u><br/>
I've now updated my scripts for the new forums. All looks to be OK,
although you might have refreshed the feed just as I had a date error
in it (the cause of which has been found and rectified).
</p>
<p>
If you did then you might fail to pick up new items until one appears
after Sat, 25 Oct 2014 16:24:00 +0000, depending on how your RSS
reader acts.
</p>
<hr/>
<a href="http://feedvalidator.org/check.cgi?url=https%3A//ed.miggy.org/devtracker/ed-dev-posts.rss"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>
</body>
</html>