Added `from_config()` to all classes #63

igiloh-pinecone · 2023-10-04T21:47:03Z

Problem

Add functionality of loading all classes from config

Solution

Created two Mixins:

FactoryMixin - allows loading a chose derived class from by calling BaseClass.from_config({'type': 'DerivedClass', 'params': {'param_a': 'b'})
ConfigurableMixin - load a class that has several dependencies, by creating a class-level mapping of {dependency_name: default_class}

Type of Change

New feature (non-breaking change which adds functionality)

Test Plan

WIP. Will add tests soon

Wasn't following the naming convention

Automatic dict of subclasses, using newer Python 3.6 syntax

This is complicated to generalize for all of our main components

It's coming along

This can be used in the constructor

I had to make base class mandatory, to make sure we're checking isinstance() correctly

This is mostly testing _set_component(), which needs to be tested in its own separate UT later

Need to rename it now (in a separate commit)

Fully hierarchical loading.

Needed to separate Tokenizer to its own file, so it can import a default calss

This way the QueryGenerator can be loaded in a simple {type: , params:} manner without dependencies

That was actually very easy

Need to call the proper FactoryMixin to convert class string names to actual types

So we can instantiate it

In order to support hierarachical from_config(). Can't say I like it...

Just for better visibility

This way it's more readable + mypy is happy

Should be identical to defaults now

Needed to change following latest DEFAULTs change

Previous fix was incorrect...

I change the ConfigurableMixin so it will allow the class constructor to have some mandatory args, as long as the class overrides from_config() and takes care of them

config/config.yaml

resin/utils/config.py

acatav · 2023-10-10T13:59:49Z

resin/chat_engine/query_generator/function_calling.py

        self._top_k = top_k
        self._system_prompt = prompt or DEFAULT_SYSTEM_PROMPT
        self._function_description = \
            function_description or DEFAULT_FUNCTION_DESCRIPTION
+        self._prompt_builder = PromptBuilder(HistoryPruningMethod.RAISE, 1)


It's actually creates a bottleneck where you can't make your system handle long histories, and there is no way to config it otherwise. Can we simply have history_pruning an init param for this class and take it from the config? If we really want we can also reference in the yaml to the value inside chat engine params

Anyway, IMO the default should be here recent and not raise. This is the chat engine default, and also for query generation you probably need less history on the general use case than for chat engine

You're right, I'll make it a param.

Regarding long histories - the assumption is that we don't take history longer than max_prompt_tokens, and we do raise in that case (just like OpenAI would).
The reason we need pruning in ChatEngine is that we add gazillion tokens of Context - so in a way we would error even for a history length that OpeanAI wouldn't have errored for.
Does that make sense?

I don't think there is a correct answer here, but if the idea is to mimic OpenAI behavior and raise an error for history longer than max prompt tokens, it should happen in the context builder and not in the query generator. I think right now the default for context builder is to enable history larger than all prompt size, even for kb context of length zero

So my suggestion is to put here history truncation in default, and if we want the OpenAI behavior of raising exception for history longer than prompt size it should happen in the context builder, and by a configurable flag. For now I'm fine with any choice you make. My vote is to put here truncation in default since I think it's the more common behavior for the chat with RAG usecase

I don't have a strong opinion. I'll add a param for it (in a separate PR), then we can decide about default behavior

resin/knoweldge_base/knowledge_base.py

resin/utils/config.py

This solves some multiple inheritance edge cases

Got feedback in comments of PR #57 that this code is a bit unreadable, so I decided to implement explicitly in the constructors. It's a bit more code duplication, but much more readable. While at it, I also made some arguments mandatory, while keeping them optional in the config

We need to call `.from_config()` for every class in _DEFAULT_COMPONENTS, even if it doesn't appear in the config

The Mixin now fully supports both loading one of several base class options with a `type` field, or a specific class with `params`. It also supports recursively loading sub-components

The error message has changed

Need to change from_config so it won't change the dict (no `pop()` essentially). Otherwise behavior is very weird from user's perspective.

To avoide altering the user's original config dictionary.

config/config.yaml

This simplifies the code a lot

acatav · 2023-10-16T07:31:56Z

resin/utils/config.py

+
+    @classmethod
+    def from_config(cls, config: Dict[str, Any]):
+        return cls._from_config(config)


maybe we don't need this anymore?

We do, because calling super.from_config() won't work here.
Maybe in the future we can improve that

igiloh-pinecone added 26 commits September 27, 2023 18:34

[context] Renamed BaseContextBuilder

ef6c1bd

Wasn't following the naming convention

[kb] Trying different approach for from_config

d018f16

Automatic dict of subclasses, using newer Python 3.6 syntax

[kb] Keep playing with from_config for KB

ec466b8

This is complicated to generalize for all of our main components

WIP: Continuing to implement new from_config()

d84e9f8

It's coming along

[config] minor typo fix

0999f9f

[config] Added _set_component(), for loading component that has defaults

b8e0061

This can be used in the constructor

Merge branch 'add-missing-defaults' into from_config_mixin

54cbd80

[config] Improved set_component()

b29d669

I had to make base class mandatory, to make sure we're checking isinstance() correctly

[test] Added KB tests for init()

c9ada1a

This is mostly testing _set_component(), which needs to be tested in its own separate UT later

trying to implmement a single from_config() without overrides

aaba33b

Merge remote-tracking branch 'origin/dev' into from_config_mixin

efedd06

moved my utils.py into the new utils/ paackage

c132ca5

Need to rename it now (in a separate commit)

[config] Refactored utils.py to utils/config.py

ea05b30

[config] Finished config of ContextEngine

903f556

Fully hierarchical loading.

[tokenizer] Added initialize_from_config()

a8aaa91

Needed to separate Tokenizer to its own file, so it can import a default calss

[chat] Made LLM optional for QueryGenerator

0513c44

This way the QueryGenerator can be loaded in a simple {type: , params:} manner without dependencies

[chat] Made ChatEngine configurable

3094013

That was actually very easy

[tokenizer] Fix load_from_config

607f843

Need to call the proper FactoryMixin to convert class string names to actual types

[llm] Made BaseLLM inherit from FactoryMixin

4a0087d

So we can instantiate it

[config] Confing fixes

69b520c

[kb] Made index_name optional

2e7d29c

In order to support hierarachical from_config(). Can't say I like it...

[config] Added list_supported_types()

ec4fc69

Just for better visibility

Make flake8 happy

d36cd52

[config] Changed the signature of FactoryMixin.from_config()

3dfdca4

This way it's more readable + mypy is happy

[config] Finalize hierarchical config

b392b26

Should be identical to defaults now

[kb] Cleanups

bfcbf8b

igiloh-pinecone requested review from miararoy and acatav October 4, 2023 21:47

igiloh-pinecone changed the title ~~WIP: added from_config() to all classes~~ Added from_config() to all classes Oct 10, 2023

[kb] Bug fix - fixed create_with_new()

e24467f

Needed to change following latest DEFAULTs change

igiloh-pinecone added 2 commits October 10, 2023 14:52

[kb] Bug fix

b92d14e

Previous fix was incorrect...

[config] Allow for mandatory params

3df1072

I change the ConfigurableMixin so it will allow the class constructor to have some mandatory args, as long as the class overrides from_config() and takes care of them

acatav reviewed Oct 10, 2023

View reviewed changes

resin/utils/config.py Show resolved Hide resolved

igiloh-pinecone added 12 commits October 11, 2023 16:31

[config] Simplify code - don't rely on abc.ABC

9d9df50

This solves some multiple inheritance edge cases

[config] Bug fix - support minimal config

2a79435

We need to call `.from_config()` for every class in _DEFAULT_COMPONENTS, even if it doesn't appear in the config

[config] Removed FactoryMixin, merged with ConfigurableMixin

65e965b

The Mixin now fully supports both loading one of several base class options with a `type` field, or a specific class with `params`. It also supports recursively loading sub-components

remove commented out code

45bc120

flak8

5270867

[tests] test_kb - fixed after removing _set_component()

83eb2b7

The error message has changed

Merge remote-tracking branch 'origin/dev' into from_config_mixin

ec0456b

Merge branch 'remove_create_new' into from_config_with_new

6ad0d57

[test] Started adding config tests

3cd9517

Need to change from_config so it won't change the dict (no `pop()` essentially). Otherwise behavior is very weird from user's perspective.

[config] Bug fix - shallow copy the config file

6e53c21

To avoide altering the user's original config dictionary.

[test] Finalized ConfigurableMixin tests

5278c7d

igiloh-pinecone changed the base branch from dev to remove_create_new October 12, 2023 18:59

igiloh-pinecone added 2 commits October 12, 2023 22:04

[test] Bug fix - had duplicate test name

b2952b6

Make flake8 happy

fd645e3

acatav reviewed Oct 15, 2023

View reviewed changes

config/config.yaml Show resolved Hide resolved

Merge branch 'remove_create_new' into from_config_mixin

0e7705f

Base automatically changed from remove_create_new to dev October 15, 2023 18:39

igiloh-pinecone added 2 commits October 16, 2023 09:46

[config] Removed the kwargs option of from_config()

fb00e39

This simplifies the code a lot

Merge remote-tracking branch 'origin/dev' into from_config_mixin

4b961fd

acatav approved these changes Oct 16, 2023

View reviewed changes

igiloh-pinecone merged commit 18f9010 into dev Oct 16, 2023
9 checks passed

igiloh-pinecone deleted the from_config_mixin branch October 16, 2023 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `from_config()` to all classes #63

Added `from_config()` to all classes #63

igiloh-pinecone commented Oct 4, 2023 •

edited

Loading

acatav Oct 10, 2023

igiloh-pinecone Oct 11, 2023

acatav Oct 15, 2023

acatav Oct 15, 2023

igiloh-pinecone Oct 15, 2023

acatav Oct 16, 2023

igiloh-pinecone Oct 16, 2023

Added from_config() to all classes #63

Added from_config() to all classes #63

Conversation

igiloh-pinecone commented Oct 4, 2023 • edited Loading

Problem

Solution

Type of Change

Test Plan

acatav Oct 10, 2023

Choose a reason for hiding this comment

igiloh-pinecone Oct 11, 2023

Choose a reason for hiding this comment

acatav Oct 15, 2023

Choose a reason for hiding this comment

acatav Oct 15, 2023

Choose a reason for hiding this comment

igiloh-pinecone Oct 15, 2023

Choose a reason for hiding this comment

acatav Oct 16, 2023

Choose a reason for hiding this comment

igiloh-pinecone Oct 16, 2023

Choose a reason for hiding this comment

Added `from_config()` to all classes #63

Added `from_config()` to all classes #63

igiloh-pinecone commented Oct 4, 2023 •

edited

Loading