Skip to content

Commit

Permalink
Updates comments explaining the bug further + Reduces test batch to 25
Browse files Browse the repository at this point in the history
  • Loading branch information
empty-codes committed Nov 8, 2024
1 parent b32ca58 commit a9db735
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 25 deletions.
16 changes: 8 additions & 8 deletions lib/wikidata/diff/api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,24 +35,24 @@ def self.fetch_revision_data(client, revision_ids)

def self.parse_revisions(pages)
parsed_contents = {}

pages.each_key do |page|
revisions = pages[page]['revisions']

# Some API responses may include a `page` entry without a 'revisions' property.
# To handle this, skip the iteration if `revisions` is nil or false, ensuring that
# pages without revisions do not cause errors during processing.
# The MediaWiki API responses may sometimes get truncated, causing the `revisions`
# property of some pages to be omitted. As a temporary workaround, the guard statement below skips pages that appear
# to have no revisions (`revisions` is nil or false) to prevent errors during processing.
next unless revisions

revisions.each do |revision|
parsed_content = parse_revision(revision)
parsed_contents[revision['revid']] = parsed_content if parsed_content
end
end

parsed_contents
end

def self.parse_revision(revision)
content_model = revision['slots']['main']['contentmodel']

Expand Down
34 changes: 17 additions & 17 deletions spec/wikidata/diff/api_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,31 @@
require 'rspec'

# Test case for the Api.get_revision_contents method
# Some API responses may include a `page` entry that does not have a 'revisions' property.
# Example: The course with course_id 10023, has 2 pages that do not have a 'revisions' property as seen below:
# API Response: {"pageid": 188280, "ns": 0, "title": "Q189784"}, {"pageid": 265881, "ns": 0, "title": "Q274897"}
# In contrast, pages with revisions have responses such as:
# {"pageid": 54252, "ns": 0, "title": "Q52053", "revisions": [...]}
# The MediaWiki API responses may get truncated for large queries, causing the `revisions` property to be omitted.
# This is not because pages lack revisions, but rather due to response size limits imposed by the MediaWiki API.
# As a result, only a subset of the requested revisions is returned, making some pages appear to have no revisions.
# Example: The course with course_id 10023, has 2 pages that appear to have no
# 'revisions' property due to truncation issues: {"pageid": 188280, "ns": 0, "title": "Q189784"}
# In contrast, a complete response includes: {"pageid": 54252, "ns": 0, "title": "Q52053", "revisions": [...]}
# To mitigate this issue temporarily, we've added a guard statement to skip pages without revisions.
# A proper solution will involve logic to handle truncated responses effectively.

RSpec.describe Api do
describe '.get_revision_contents' do
let(:revision_ids) do
[
# A batch of 50 revision IDs associated with a course (course_id: 10023).
# This batch includes pages (pageid: 18820 and 265881) known to lack the 'revisions' property.
2266122608, 2266122618, 2266122626, 2266122646, 2266122666, 2266122683,
2266122709, 2266122730, 2266122739, 2266122747, 2266122763, 2266122777,
2266122783, 2266122790, 2266122808, 2266122817, 2266122829, 2266122850,
2266122880, 2266122931, 2266122949, 2266122973, 2266122994, 2266123011,
2266123017, 2266123021, 2266341034, 2266123060, 2266123123, 2266123148,
2266123175, 2266123210, 2266123270, 2266123325, 2266123373, 2266123418,
2266341148, 2266123442, 2266123459, 2266123479, 2266123502, 2266123529,
2266123536, 2266123548, 2266123562, 2266123568, 2266341782, 2266123581,
2266123596, 2266123602
# A batch of 25 revision IDs associated with a course (course_id: 10023).
# JSON: https://www.wikidata.org/w/api.php?action=query&prop=revisions&revids=2266123021|2266341034|2266123060|2266123123|2266123148|2266123175|2266123210|2266123270|2266123325|2266123373|2266123418|2266341148|2266123442|2266123459|2266123479|2266123502|2266123529|2266123536|2266123548|2266123562|2266123568|2266341782|2266123581|2266123596|2266123602&rvslots=main&rvprop=content|ids|comment&format=json
# This batch includes pages (pageid: 18820 and 265881) that appear to lack the 'revisions' property.
2266123021, 2266341034, 2266123060, 2266123123, 2266123148,
2266123175, 2266123210, 2266123270, 2266123325, 2266123373,
2266123418, 2266341148, 2266123442, 2266123459, 2266123479,
2266123502, 2266123529, 2266123536, 2266123548, 2266123562,
2266123568, 2266341782, 2266123581, 2266123596, 2266123602
]
end

it 'returns without raising an error when an API response has a page with no revisions.' do
it 'returns without raising an error and handles both cases where revisions are present or absent' do
expect {
Api.get_revision_contents(revision_ids)
}.not_to raise_error
Expand Down

0 comments on commit a9db735

Please sign in to comment.