Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDv2 Dreambooth LoRA fine-tuning API #312

Merged
merged 4 commits into from
Sep 1, 2023
Merged

Conversation

spillai
Copy link
Contributor

@spillai spillai commented Aug 29, 2023

Summary

  • support for LoRA based fine-tuning of stable-diffusion via dreambooth
  • added LoRA dreambooth based inference with attn_procs swapping
  • added test training service

Related issues

Checks

  • make lint: I've run make lint to lint the changes in this PR.
  • make test: I've made sure the tests (make test-cpu or make test) are passing.
  • Additional tests:
    • Benchmark tests (when contributing new models)
    • GPU/HW tests

@spillai spillai added the feature New feature or request label Aug 29, 2023
@spillai spillai added this to the NOS v0.0.10 milestone Aug 29, 2023
@spillai spillai self-assigned this Aug 29, 2023
@@ -37,20 +41,16 @@ class RayRuntimeSpec:


@dataclass
class RayExecutor:
class RayExecutor(metaclass=SingletonMetaclass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for making this a singleton? Did we ever have more than one executor before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already a singleton with the .get() classmethod returning a singleton Instance. This just makes it a simpler way to instantiate singleton classes

from nos.logging import logger


RUNTIME_ENVS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to discuss this more I think. Is there a way we can do this all inside of a single env?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this is to avoid having to pollute the main repo with all the dependencies especially for training purposes. For diffusers, it needed a specific revision which made it difficult to support as the base conda env.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, yea this might make the case for dedicated training containers, each with the dependencies needed for a particular training flow.

self,
prompts: Union[str, List[str]],
num_images: int = 1,
num_inference_steps: int = 50,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will want to move these to the config eventually (or maybe expose in the API, so we can kick off longer training jobs)

@@ -108,3 +108,14 @@ service InferenceService {
// TODO (spillai): To be implemented later (for power-users)
// rpc DeleteModel(DeleteModelRequest) returns (DeleteModelResponse) {}
}


message TrainingRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to be a hassle to represent the full state needed for training over grpc I think. As discussed training might be something that doesn't run through the client (i.e. deeper integrations with pixeltable so it can run nos server code directly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. I'm not sure if we want to represent this here, since it's not standard across training tasks.

@@ -9,12 +11,19 @@


def cached_repo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to implement the training flow without pulling/installing these repos in their entirety? Is this so the whole thing can be dynamic and not require each env to be declared as a pip dependency?

@spillai spillai merged commit 5007e83 into main Sep 1, 2023
1 check passed
@spillai spillai deleted the spillai/sdv2-finetuning-api branch September 5, 2023 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants