Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find a solution for mapping_types in SynonymTerm #31

Open
RichJackson opened this issue Jun 10, 2024 · 0 comments
Open

find a solution for mapping_types in SynonymTerm #31

RichJackson opened this issue Jun 10, 2024 · 0 comments

Comments

@RichJackson
Copy link
Collaborator

getting rid of mapping types would be a useful thing to do for many reasons:

  1. no one really cares about them it seems
  2. they complicate hashing behaviour of SynonymTerm and thus SynonymDB can have inconsistent behaviour and results

Originally, we had them as a term can have multiple disjointed mapping types in a KB.

original comment

          the `get_all` call 'smells' wrong to me here for a couple of reasons:
  1. We're having to iterate over all the synonym terms for every annotation here.
  2. Within the get_all call, we get the synonym term associated with the term_norm, but don't capture it here, and then have to recalculate it within drop_equivalent_id_set_from_synonym_term when we do synonym_term = self._syns_database_by_syn[name][synonym].

Is there a problem I'm missing with replacing:

        self._associated_id_sets_by_id: Dict[
            ParserName, Dict[str, Set[FrozenSet[EquivalentIdSet]]]
        ] = {}

with

        self._synonym_terms_by_id: Dict[
            ParserName, Dict[str, Set[SynonymTerm]]
        ] = {}

?

If we do that, then this function doesn't need to loop over all the synonym terms for the parser, and can just be:

    def drop_equivalent_id_set_containing_id_from_all_synonym_terms(
        self, name: ParserName, id_to_drop: Idx
    ) -> Tuple[int, int]:
        """
        remove all EquivalentIdSet's that contain this id from all SynonymTerms
        in the database

        :param name:
        :param id_to_drop:
        :return:
        """

        terms_modified = 0
        terms_dropped = 0

        syn_term: SynonymTerm
        for syn_term in self.get_associated_syn_terms_for_id(name, id_to_drop):
            new_id_sets = frozenset(  # or set( if we leave it as-is
                equiv_id_set
                for equiv_id_set in syn_term.associated_id_sets
                if id_to_drop not in equiv_id_set.ids
            )

            result = self._modify_or_drop_synonym_term_after_id_set_change(
                new_id_sets, name, syn_term
            )

            if result == DBModificationResult.ID_SET_MODIFIED:
                terms_modified += 1
            elif result == DBModificationResult.SYNONYM_TERM_DROPPED:
                terms_dropped += 1

        return terms_modified, terms_dropped

Although doing that makes me wonder if we actually need both this new self._synonym_terms_by_id and the existing self._syns_by_aggregation_strategy, since that also goes from an id to some data associated with a set of SynonymTerms (their term_norms). I wonder if we could use a single data structure here - but equally that might trade off a small-ish amount of space for a bit of performance, so not sure how bothered we would be.

Original comment from @EFord36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant