-
add even more additional sorting to stabilize remove stale segments
-
add additional sorting to stabilize remove stale segments
-
simplify checks for bad message
-
fix tests now that we are relying on input being sanitized of bad values
-
better null checks
-
check using is_null not is None
-
switch checks to take into acount Andres's fixes to the segmenter that convert invalid values to None
-
now that bad courses become nans correctly handle them
-
PR#85: Improve seg id generation so we don't get occasional duplicate seg ids.
Changes the segmenter identity matching logic PR#73
This changes the identity matching window to 15 minutes, since 6 minutes
is too low for the data quality we have on 2012. It also removes the
receiver from the identity cache keys, because that field is causing a
lot of problems due to hard inconsistencies.
Updates the version in the init.py
add basic tests for current identity assignment PR#74
move identity addition so it works forward and backwards; adjust tests PR#74
Adds version update information
Adds changelog information for the last 2 releases
run through black, isort, and flak8
start adding comments about where we should refactor
first pass of refactor; break up process in small functions; py.test passes and flake8 happy
break out matching code to separate class
break out msg processing code into separate class
fix imports with isort
Removed class inheritance on Segmentizer as it no longer needs DiscrepancyCalculator functions. The only one is uses is a static method so it is now called directly from the class rather than from self. PR#77
Updated a few loop yields to use instead for cleaner code
Changed one last yield for loop to a
Added docstrings to all internal functions. Renamed_clean
to_clean_segment
for more clarity.
Added docstrings to DiscrepancyCalculator class. Change one function to be denoted internal with underscore.
Some flake8 cleanup
Finished docstrings
Changed Segmentizer to Segmenter and placed in segmenter.py to maintain consistency. core.py is kept with pass-through import and Segmentizer class.
Ran pre-commit on all files for linting/styling
add new regression tests
Updated the path to the repo in the README.rst
Changed query that pulls expected tracks for regression testing to be deterministic by adding msgid to sort criteria. Update .pre-commit-config.yaml because there were so many conflicts to reorder checks because isort and flake8 were colliding.
Modified the gitignore for .ipynb_checkpoints/
Fixed a bug in test_edge_cases where test messages were not given types and pytest was passing on the wrong type of ValueError. Modified to add type to messages and explicitly check that ValueError message mentioned 'unsorted'
Added new unit tests aimed at msg_processor.py. Fixed a bug and duplicate code in msg_processor.py. Moved a test from test_edge_cases.py to test_msg_processor.py since it was more relevant there.
Added a readme to gpsdio_segment/ with diagram for how a message moves through the segmenter.
Added unit test for last BAD_MESSAGE test.
Removed SAFE_SPEED check and updated unit tests to reflect
Added link between the two READMEs
Update README.rst
allow newish builting timezone objects
Atomic identities (PR#78)
switch to atomic identities
fix identity storage
another identity fix
still trying to get something for identities that will work well pipe-segment
add destintions; clean out unused code
make destination have tuple type similar to identity
add some debug logging to track down where all my data went
add more debug logging to track down where all my data went
turn down logging
refactor to allow pipe-segment to use metrics
bump default hours to 24
protect against negative hours
fix all fast tests; comment out identity tests since they need to be redone
Updated identity unit tests to reflect new atomic identities. (PR#81)
Co-authored-by: Jenn Van Osdel jenn@globalfishingwatch.org
Updated regression tests to reflect atomic identities PR#82. Some segmenter changes made segments span over more days, changing the expected seg_id for many messages. Some SSVID regression tests were automatically deleted as they did not have data in 2021010* as at some point these files had been created for the larger time range of 202101*. Biggest change is converting new namedtuples for Identity and Destination into JSON serializable objects in test_expected.add_expected().
fix for Issue#83-PR#84 test_unsorted transient failure
untested implementation of new way to generate ids
fix spelling of itertools
Co-authored-by: Andres Arana andres@globalfishingwatch.org
Co-authored-by: Andrés Arana and2arana@gmail.com
Co-authored-by: Matias Piano matias@globalfishingwatch.org
Co-authored-by: Jenn Van Osdel jenn@globalfishingwatch.org
Co-authored-by: jennvanosdel jennifervo07@gmail.com
Co-authored-by: pwoods25443 pwoods25443@gmail.com -
PR#72 Pipe 3 Segmenter Improvements / Add Stitcher
make sticher work on day by day identity instead of cumjutive identity info
updated sticher version that uses lookahead
filter empty segments before sort to avoid comparison issues
bump max hours back up to 8
tweak stitcher
update tests
fix tests so work under py27 and py37
use identity count to improve longterm stability
improve stitcher performance
start implementing framework for looking at active tracks
trim number of active tracks to try to eleiminate hot spots
trim number of active tracks further
fix bug that resulted in make_tracks hanging
loosen tolerances, but make dependent on track number
bulletproof index function
bug fixes to improve performance, tweaks to improve results
add index to tracks; clean up code somewhat
Support VMS and require type field in messages.
switch back to dev tag
bump version to make it clearer that we're working on tracks
emit active status of track
tweak metrics to improve stitching
minor tuning tweaks / cleanup of metrics
switch to primarily using namedtuples in track generation, begin working on support items for next prototype
commit still working version before tearing the rest of the way apart to try MHT
cleaned up MHT version
allow placeholder segments
allow placeholder segments - fix
add more fields to Track so we keep track of last message
skip sig checks for now when first seg is missing
Keep track of signature in tracks
clean up costs a bit
remove unused track_sigs parameter
Protect signature cost function from overflow
another try at preventing overflow in signature code
protect against small magnitude vectors
catch errors in sig calculator and log warning instead
add notebook showing examples of track based voyages
update TrackBasedVoyages notebook
update to use daily message counts
clarify logic when pruning tracks
fix bug in warning for large decay
fix bug in assembling identity data for tracks
add ommitted sig_cost
lower weight of signature, as it was breaking causing trouble
reduce weight for signature
add logging of costs; reduce signature weight
Fix typo in setting debug level
debug signatures, tweak values
use all of signature for matching
fix signatures so intermediate dates get correct values
use last message of day rather than first because it's always available
store whole previous tracks instead of just historical signatures
switch from storing tracks_by_date to parent_track
bump warning up to debug so logging doesn't get swamped
only use last_msg from track to avoid reconstituting fake segments
use last_message for cost key rather than segment
remove unnecessary test for seg_ids / prefix
tweak parameters to fix up some of the examples after recent changes
sort by last message of day to be consistent with way we reconstitute tracks
cleanup and try to eliminate difference between running daily and batch
more changes to make less sensitive to restarts
bug fix
reinstate length cost; more cleanup
try fix duplicate segment emission; updates to trackidexample
add destination to recorded fields; tweak stitcher params
fix bug where n_destinations left in add_info
add daily message counts to segments
support destination as a signature value
merge changes from develop
add destinations, lengths and widths to signature
bug fix; missing lengths
add missing argumnents when creating segment
port idenity changes from 2.5 -
undo changes meant for other branch
-
Add changes; bump version to 3.0
-
ignore notebooks in ipynb format