Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a Generic Custom Backend Template #7377

Open
hrzhao76 opened this issue Jun 26, 2024 · 0 comments
Open

Proposal for a Generic Custom Backend Template #7377

hrzhao76 opened this issue Jun 26, 2024 · 0 comments

Comments

@hrzhao76
Copy link

Dear Triton Team, thank you for developing such an exceptional package that facilitates cloud inference. We are a group from High Energy Physics Experiments, looking to leverage your inference-as-a-service model to manage complex inference pipelines on remote GPUs. We are particularly interested in an efficient custom backend capable of supporting our extensive multi-module pipelines.

I would like to inquire if it is possible to implement a generic custom backend template similar to the TritonPythonModel available in the Python backend. This template would allow developers to focus solely on defining the initialization, execution, and finalization functions without having to manage the intricacies of the backend API, such as device selection.

Could we explore the feasibility of this proposal? I am ready and willing to volunteer my time to assist in the development of this feature.

To illustrate, I envision a base class structured as follows:

class BaseCustomBackend {
public:
    virtual void initializePipeline() = 0;
    virtual std::vector<int> runPipeline(std::vector<int> inputs) = 0;
    virtual ~BaseCustomBackend() {}
};

class CustomPipeline : public BaseCustomBackend {
public:
    void initializePipeline() override {
        // Insert custom initialization logic here
    }
    std::vector<int> runPipeline(std::vector<int> inputs) override {
        // Insert custom pipeline processing logic here
        return inputs;
    }
};

And within the backend code:

TRITONBACKEND_ModelInstanceExecute(){
      BaseCustomBackend* customPipeline = new CustomPipeline();

    // Assuming some inputs are prepared for the pipeline
    // handled by triton memory BackendInputCollector
    std::vector<int> inputs = {1, 2, 3}; // Example inputs

    // Execute the pipeline
    std::vector<int> outputs = customPipeline->runPipeline(inputs);

    // prepare responses ...BackendOutputResponder 
}

Is your feature request related to a problem? Please describe.
Yes, our aim is to reduce the complexity of developing custom backends for intricate pipelines, thereby boosting both efficiency and usability.

Describe the solution you'd like
A templated custom backend that abstracts lower-level details and allows developers to focus on pipeline-specific logic.

Describe alternatives you've considered
While other solutions may be considered, integrating a templated approach directly within Triton could substantially streamline development efforts.

Additional context
I am available to discuss this proposal further and provide additional use cases or details as needed.

tagging my colleagues @xju2 @ytchoutw @yongbinfeng @kpedro88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant