Skip to content

4. Remove near metadata (nearby) duplicate

Ingrid M. Angel Benavides edited this page Jan 4, 2021 · 5 revisions

Profiles are near metadata duplicates if its truncated latitude and longitude (down to 1 decimal digit) are the same and it's time difference is smaller than 1 day (this is different than last the 2019v1 update when time was truncated down to the day). Uses function box_meta_neardup.m

Then, it checks if the profile is a content duplicate (> 95% threshold). Figure 12 MOCCA report.

Decision

  • If is content duplicate: Delete worst profile (the decision about which profile is worst is exactly as in the exact metadata duplicates).
  • If is not content duplicate: Keep both profiles

The same script can be use to thin the database (ex. Argo reference database) in regions with many profiles, comparing nearby profiles content in a certain depth range (ex. below 900 db which is used for the DMQC). Uses profcompcont_deep(upper depth limit, coincidence %)

Obs. the content duplicates found in this step are those that were not found using the method to find possible content duplicates in step 3 (Figure 13 MOCCA report)