-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDv2 Dreambooth LoRA fine-tuning API #312
Conversation
@@ -37,20 +41,16 @@ class RayRuntimeSpec: | |||
|
|||
|
|||
@dataclass | |||
class RayExecutor: | |||
class RayExecutor(metaclass=SingletonMetaclass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the motivation for making this a singleton? Did we ever have more than one executor before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already a singleton with the .get()
classmethod returning a singleton Instance. This just makes it a simpler way to instantiate singleton classes
from nos.logging import logger | ||
|
||
|
||
RUNTIME_ENVS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to discuss this more I think. Is there a way we can do this all inside of a single env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for this is to avoid having to pollute the main repo with all the dependencies especially for training purposes. For diffusers, it needed a specific revision which made it difficult to support as the base conda env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, yea this might make the case for dedicated training containers, each with the dependencies needed for a particular training flow.
self, | ||
prompts: Union[str, List[str]], | ||
num_images: int = 1, | ||
num_inference_steps: int = 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will want to move these to the config eventually (or maybe expose in the API, so we can kick off longer training jobs)
@@ -108,3 +108,14 @@ service InferenceService { | |||
// TODO (spillai): To be implemented later (for power-users) | |||
// rpc DeleteModel(DeleteModelRequest) returns (DeleteModelResponse) {} | |||
} | |||
|
|||
|
|||
message TrainingRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to be a hassle to represent the full state needed for training over grpc I think. As discussed training might be something that doesn't run through the client (i.e. deeper integrations with pixeltable so it can run nos server code directly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. I'm not sure if we want to represent this here, since it's not standard across training tasks.
@@ -9,12 +11,19 @@ | |||
|
|||
|
|||
def cached_repo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to implement the training flow without pulling/installing these repos in their entirety? Is this so the whole thing can be dynamic and not require each env to be declared as a pip dependency?
Summary
Related issues
Checks
make lint
: I've runmake lint
to lint the changes in this PR.make test
: I've made sure the tests (make test-cpu
ormake test
) are passing.