Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirror staged landsat demo collections in prod #138

Merged
merged 3 commits into from
Jun 6, 2024

Conversation

anayeaye
Copy link
Contributor

@anayeaye anayeaye commented May 31, 2024

What

This PR adds the collection metadata for publishing the hand-curated landsat sample collections as well as a notebook for selecting, correcting, and publishing both collection and item metadata to the production account.

How tested

I ran the notebook included in this PR to

  1. correct and publish the collection metadata to the test catalog
  2. iterate over the items in the source catalog and fix metadata where possible and then publish only the valid items to the test catalog
  3. repeated for production and verified that the item count in the production catalog is not significantly smaller than the staging catalog (but it is expected to be lower because the staging catalog had invalid records with hrefs to files that do not exist in s3)

Caveats

  1. Many of the items in these collections had invalid classification metadata so I removed classification from the declared stac_extensions
  2. Many items had hrefs to non-existent files, at the point when I hit this validation error I decided to stop incrementally fixing the item metadata and just publish the valid stac records to production. So the production counts in the audit are lower than staging counts but this is because invalid items were published to the staging catalog. It is unlikely that anyone will miss records with links to tifs that do not exist.

Formerly Blocked

No longer blocked but I am leaving these notes here in case we see a similar validation error in the future

Currently items cannot be published with an Asset not accessible: Forbidden error

"[{'loc': ('body', 'assets', 'red', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'blue', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'green', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'nir08', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'swir16', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'swir22', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'ANG.txt', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.txt', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.xml', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'coastal', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.json', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_pixel', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_radsat', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'thumbnail', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_aerosol', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'reduced_resolution_browse', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}]"

However the role that is used by the titiler and that should be used by the ingest api is able to access these assets, for example for s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF

curl -X 'GET' \
  'https://test.openveda.cloud/api/raster/cog/info?url=s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF' \
  -H 'accept: application/json' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   369  100   369    0     0    405      0 --:--:-- --:--:-- --:--:--   405
{
  "bounds": [
    -106.8322594139634,
    -76.02196986271962,
    -96.60693094236582,
    -73.34367090509093
  ],
  "minzoom": 4,
  "maxzoom": 10,
  "band_metadata": [
    [
      "b1",
      {}
    ]
  ],
  "band_descriptions": [
    [
      "b1",
      ""
    ]
  ],
  "dtype": "uint16",
  "nodata_type": "Nodata",
  "colorinterp": [
    "gray"
  ],
  "scales": [
    1.0
  ],
  "offsets": [
    0.0
  ],
  "driver": "GTiff",
  "count": 1,
  "width": 8381,
  "height": 8441,
  "overviews": [
    2,
    4,
    8,
    16,
    32,
    64
  ],
  "nodata_value": 0.0
}

@anayeaye
Copy link
Contributor Author

anayeaye commented Jun 5, 2024

Unblocked by adding requester pays configuration to the ingest-api NASA-IMPACT/veda-backend#388

@anayeaye anayeaye changed the title BLOCKED by inaccessible assets feat/mirror staged landsat demo collections Mirror staged landsat demo collections in prod Jun 5, 2024
@anayeaye anayeaye marked this pull request as ready for review June 5, 2024 20:48
@anayeaye anayeaye requested a review from smohiudd as a code owner June 5, 2024 20:48
@anayeaye anayeaye merged commit f6cd153 into main Jun 6, 2024
2 checks passed
@anayeaye anayeaye deleted the feat/mirror-staged-landsat-demo-collections branch June 6, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants