Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard | Core?] Py-Spy and Python 3.12 - Lack of support #47388

Open
mtralka opened this issue Aug 28, 2024 · 9 comments
Open

[Dashboard | Core?] Py-Spy and Python 3.12 - Lack of support #47388

mtralka opened this issue Aug 28, 2024 · 9 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks

Comments

@mtralka
Copy link

mtralka commented Aug 28, 2024

What happened + What you expected to happen

Py-Spy does not yet support Python 3.12 - nor does it seem to have much momentum to (benfred/py-spy#670, benfred/py-spy#661, benfred/py-spy#633). On Python 3.12 and Ray 2.32 the CPU flame graph and stack trace abilities of the Ray Dashboard are broken / non-functional due to upstream py-spy lack of support for 3.12.

Does Ray have plans to transition to another statistical profiler (ex. https://github.com/P403n1x87/austin) to support the 3.12 releases? What would this level of effort be? Is the Ray team open to PRs here?

I understand Ray is in python 3.12 alpha support per #45904. Is this issue a prerequisite for a non-alpha release?

Pyroscope seems to have the same issue:
grafana/pyroscope-rs#168
grafana/pyroscope#3287

Versions / Dependencies

Python 3.12
Ray 2.32

Reproduction script

Following the documentation for Py-Spy in KubeRay - https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/pyspy.html#kuberay-pyspy-integration.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@mtralka mtralka added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 28, 2024
@Superskyyy
Copy link
Contributor

Superskyyy commented Aug 28, 2024

We have tried to use Pyroscope continous profiler and it works pretty well, but in the long run the new elastic ebpf profiler would be the way to go, I imagine.

@Superskyyy
Copy link
Contributor

@anyscalesam anyscalesam added P1 Issue that should be fixed within a few weeks core Issues that should be addressed in Ray Core and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 3, 2024
@anyscalesam
Copy link
Collaborator

TY for raising this; we'll review this week internally and see what solution we can do here. I think there is a non pyspy path for CPU flamegraphs IIRC

@mtralka
Copy link
Author

mtralka commented Sep 11, 2024

Thank you @anyscalesam. Any update on this path internally?

From an external view it seems prudent to create another CpuProfilingManager but for a non-pyspy target. We would have to ensure it has flame graph and / or speedscope support. Plus then support plumbing ProfilingManager selection into the dashboard reporter agent. Thoughts? I'd be happy to implement something like this if the project is interested in such a path.

@anyscalesam
Copy link
Collaborator

cc @alanwguo here > we're looking at Scalene and PyInstrument as possible substitutes.

@mtralka
Copy link
Author

mtralka commented Sep 11, 2024

Thanks for the quick reply. Both of those look promising. The AI features of Scalene is slightly concerning from a naive perspective (I assume phoning home is opt-in?). Selection of profiler does seem to be the hardest decision here.

A "bring your own" profiler approach seems potentially attractive. Should Ray lib make the choice of what profiler is shipped with it or should we enable the user to provide their own + register an adapter class (basically a ProfilingManager) as needed?

@alanwguo
Copy link
Contributor

Hi @mtralka , thanks for offering to help out here. Your outline of what changes would be needed makes sense to me. I agree that the selection of the profiler is the hardest decision here.

I do like the idea of "bring your own" profiler, although we'd probably still in the future replace pyspy as the default. How would you imagine a user could register an adapter class?

@mtralka
Copy link
Author

mtralka commented Sep 11, 2024

Of course, happy to help where possible. I agree pyspy shouldn't be the default if py3.12 support is mainstreamed.

Regarding implementation - this could certainly be a rabbit hole! IMO a relatively simple / straightforward approach here seems prudent since there doesn't seem to be much user desire for new features here other than fixing py3.12 (please correct me if I'm wrong!).

Some questions / assumptions:

  • input from ray is always constant across profiler adapters (pid, native, etc)
  • output should be delivered in a compatible format with existing ray dashboard features
    • This format varies across profiling tasks
      • cpu profile - likely the same as currently supported - flamegraph, "raw", speedscope.
        • I assume "raw" is a pyspy specific format. This might be an issue if raw is interpreted downstream in Ray. If it isn't, then no worries
      • trace dump - I'm unaware of a "standard" format here. Would need to find / establish a standard here. I think it is safe to make the user adapt to this standard rather then try to handle every format variation internally. Do you have any domain expertise here?
      • memory profiling - likely the same as currently supported - flamegraph, table

For profilers with "unique" features like Scalene's Web UI we'd have to consider how to present these features.

In its simplest form we can keep the existing class based approach. Pseudo ->

def register_profiler(profiler_manager: type[CpuProfilingManager], as_default: bool = False):
    """Register a CPU profiling manager.
    
    as_default: bool
        Whether to set the given profiler as the default one.
    """
    if not issubclass(profiler_manager, CpuProfilingManager):
        raise TypeError

    _PROFILERS[profiler_manager.PROFILER_NAME] = profiler_manager

def get_profiler(name: str) -> CpuProfilingManager:
  return _PROFILERS[profiler_manager.PROFILER_NAME]

async def GetTraceback(self, request, context):
        pid = request.pid
        native = request.native
        desired_profiler_name = ... # pull from ENV / config singleton / however ray handles config
        profiler_type = get_profiler(desired_profiler_name)
        p = profiler_type(self._log_dir)
        success, output = await p.trace_dump(pid, native=native)
        return reporter_pb2.GetTracebackReply(output=output, success=success)

Where we register subclasses of theCpuProfilingManager. The current CpuProfilingManager is PySpy specific but implementation seems to be refactorable without changing the API.

Note that using a class-based profiler system delineated by CPU / MEM category is a bit limiting / obtuse for combined profilers such as Scalene which handle CPU / GPU / MEM all in one.

Thoughts? Totally acceptable if I'm off base here

@alanwguo
Copy link
Contributor

@mtralka, registering subclasses or functions can make sense.

We should see if the output of the profiling functions can be left to be pretty generic. With py-spy, the tracedumb is raw text, the flamegraph is an image file (renderable by the browser).

Memory profiling is actually handled separately using "Memray" instead of py-spy. We should consider making that customizable as well. I believe the out of this is also raw text.

It seems if the generic output format is any html-renderable document, that may be simple interface to conform to.

In terms of class-based override or not. I don't think we necessarily need to adhere to the current pattern of using classes. We could override individual functions per profiling functionality. I don't think there is very much code re-use between the functions within the same class.

Do you want to give that a shot?

@anyscalesam anyscalesam added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants