Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Added from_config() to all classes #63

Merged
merged 46 commits into from
Oct 16, 2023
Merged

Added from_config() to all classes #63

merged 46 commits into from
Oct 16, 2023

Conversation

igiloh-pinecone
Copy link
Contributor

@igiloh-pinecone igiloh-pinecone commented Oct 4, 2023

Problem

Add functionality of loading all classes from config

Solution

Created two Mixins:

  • FactoryMixin - allows loading a chose derived class from by calling BaseClass.from_config({'type': 'DerivedClass', 'params': {'param_a': 'b'})
  • ConfigurableMixin - load a class that has several dependencies, by creating a class-level mapping of {dependency_name: default_class}

Type of Change

  • New feature (non-breaking change which adds functionality)

Test Plan

WIP. Will add tests soon

Wasn't following the naming convention
Automatic dict of subclasses, using newer Python 3.6 syntax
This is complicated to generalize for all of our main components
I had to make base class mandatory, to make sure we're checking isinstance() correctly
This is mostly testing _set_component(), which needs to be tested in its own separate UT later
Need to rename it now (in a separate commit)
Fully hierarchical loading.
Needed to separate Tokenizer to its own file, so it can import a default calss
This way the QueryGenerator can be loaded in a simple {type: , params:} manner without dependencies
That was actually very easy
Need to call the proper FactoryMixin to convert class string names to actual types
In order to support hierarachical from_config().
Can't say I like it...
Just for better visibility
This way it's more readable + mypy is happy
Should be identical to defaults now
@igiloh-pinecone igiloh-pinecone changed the title WIP: added from_config() to all classes Added from_config() to all classes Oct 10, 2023
Needed to change following latest DEFAULTs change
Previous fix was incorrect...
I change the ConfigurableMixin so it will allow the class constructor to have some mandatory args, as long as the class overrides from_config() and takes care of them
config/config.yaml Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Outdated Show resolved Hide resolved
resin/utils/config.py Show resolved Hide resolved
self._top_k = top_k
self._system_prompt = prompt or DEFAULT_SYSTEM_PROMPT
self._function_description = \
function_description or DEFAULT_FUNCTION_DESCRIPTION
self._prompt_builder = PromptBuilder(HistoryPruningMethod.RAISE, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually creates a bottleneck where you can't make your system handle long histories, and there is no way to config it otherwise. Can we simply have history_pruning an init param for this class and take it from the config? If we really want we can also reference in the yaml to the value inside chat engine params

Anyway, IMO the default should be here recent and not raise. This is the chat engine default, and also for query generation you probably need less history on the general use case than for chat engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll make it a param.

Regarding long histories - the assumption is that we don't take history longer than max_prompt_tokens, and we do raise in that case (just like OpenAI would).
The reason we need pruning in ChatEngine is that we add gazillion tokens of Context - so in a way we would error even for a history length that OpeanAI wouldn't have errored for.
Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a correct answer here, but if the idea is to mimic OpenAI behavior and raise an error for history longer than max prompt tokens, it should happen in the context builder and not in the query generator. I think right now the default for context builder is to enable history larger than all prompt size, even for kb context of length zero

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion is to put here history truncation in default, and if we want the OpenAI behavior of raising exception for history longer than prompt size it should happen in the context builder, and by a configurable flag. For now I'm fine with any choice you make. My vote is to put here truncation in default since I think it's the more common behavior for the chat with RAG usecase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. I'll add a param for it (in a separate PR), then we can decide about default behavior

resin/knoweldge_base/knowledge_base.py Show resolved Hide resolved
This solves some multiple inheritance edge cases
Got feedback in comments of PR #57 that this code is a bit unreadable, so I decided to implement explicitly in the constructors.
It's a bit more code duplication, but much more readable.

While at it, I also made some arguments mandatory, while keeping them optional in the config
We need to call `.from_config()` for every class in _DEFAULT_COMPONENTS, even if it doesn't appear in the config
The Mixin now fully supports both loading one of several base class options with a `type` field, or a specific class with `params`.
It also supports recursively loading sub-components
Need to change from_config so it won't change the dict (no `pop()` essentially). Otherwise behavior is very weird from user's perspective.
To avoide altering the user's original config dictionary.
@igiloh-pinecone igiloh-pinecone changed the base branch from dev to remove_create_new October 12, 2023 18:59
Base automatically changed from remove_create_new to dev October 15, 2023 18:39

@classmethod
def from_config(cls, config: Dict[str, Any]):
return cls._from_config(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need this anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, because calling super.from_config() won't work here.
Maybe in the future we can improve that

@igiloh-pinecone igiloh-pinecone merged commit 18f9010 into dev Oct 16, 2023
9 checks passed
@igiloh-pinecone igiloh-pinecone deleted the from_config_mixin branch October 16, 2023 07:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants