Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP : Jupyter notebooks #4253

Closed
wants to merge 16 commits into from
Closed

Conversation

Quafadas
Copy link
Contributor

@Quafadas Quafadas commented Aug 11, 2022

This is an (initial, horrible) draft PR which goes after
scalameta/metals-feature-requests#236

I believe it sets out a viable solution sketch. If we were able to get the almond installer working, and include the classpath in it's .json file, my experiments suggest that VSCode would (automagically) do the rest for us. I think I would need (quite some) help, to get ready for merge.

@@ -1967,6 +1970,70 @@ class MetalsLanguageServer(
case ServerCommands.NewScalaProject() =>
newProjectProvider.createNewProjectFromTemplate().asJavaObject

case ServerCommands.SetupNotebookKernelForThisProject() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably do this automatically if a notebook path is sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you have a hint, where to look in metals, for when such a path might be sent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the method annotated with @JsonNotification("textDocument/didOpen") it's send every time a file is open.

https://github.com/scalameta/metals/blob/main/metals/src/main/scala/scala/meta/internal/metals/MetalsLanguageServer.scala#L1066

scribe.info("Found classpath ")
scribe.info("targetClasspath " + targetClasspath.mkString("\n"))
val jvmReprRepo = coursierapi.MavenRepository.of(
"https://maven.imagej.net/content/repositories/public/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the additional repo?

Copy link
Contributor Author

@Quafadas Quafadas Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almond requires an artefact jvmrepr. https://github.com/jupyter/jvm-repr

The jvm repr artefact, is only available on jitpack, as far as I can tell.

"https://maven.imagej.net/content/repositories/public/"
)
// TODO check scala version is valid. For now use 2.13.7
val scalaVersion = "2.13.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably use the Scala version from the build target.

@Quafadas
Copy link
Contributor Author

@tgodzik My biggest worry at the moment, is that when I try to actually run this in metals, I get a permission error.

I'm not sure what the restrictions metals has, but I don't know how to deal with this. I'll try to look again, and see if I messed something up the first time... but ...

The other problems I (hope!) can be solved with some work, but if we simply aren't allowed to make changes to the user path, which I think is what almond would need to install itself, then this idea isn't going to work and I'd have to go back to the drawing board. Does the shell runner have the same rights as metals? Is there any way to elevate it's privilege's?

@tgodzik
Copy link
Contributor

tgodzik commented Aug 12, 2022

@tgodzik My biggest worry at the moment, is that when I try to actually run this in metals, I get a permission error.

I'm not sure what the restrictions metals has, but I don't know how to deal with this. I'll try to look again, and see if I messed something up the first time... but ...

The other problems I (hope!) can be solved with some work, but if we simply aren't allowed to make changes to the user path, which I think is what almond would need to install itself, then this idea isn't going to work and I'd have to go back to the drawing board. Does the shell runner have the same rights as metals? Is there any way to elevate it's privilege's?

The shell should have the same permissions as Metals, since it will be an inherited process. There shouldn't be any issues here, we were able to run other process such as ammonite, which are quite similar to this. What is the exception you are getting?

CC @alexarchambault

Copy link
Member

@tanishiking tanishiking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Quafadas, thank you very much for kicking the ball rolling around notebook support for Metals!

It looks like what this PR tries to do is something like downloading an almond kernel for the specific Scala version that the project *.ipynb file belongs to (as Metals do that for worksheet feature).

However, I don't think this approach doesn't work

  • As you can see from the following gif, we can easily switch the kernel by clicking the upper-right button in the vscode notebook. The point is, **the URI of *.ipynb file and Scala project structure **
    • ipynb
  • In my (data science experience, (not sure in the Scala case though), we often create notebook files in separate /notebooks directory, where it doesn't belong to any Scala project.

What the "notebook support" looks like

I was thinking the "notebook support" is like

  • Developers install almond kernel by themselves (don't need to install kernels from Metals)
    • It might be helpful if Metals provide a command to install almond kernel (?) but it's optional (and I'm not sure it's really helpful)
  • When developers open an *.ipynb file where metadata.kernelspec.language == 'scala', metals magically provide LSP features

What we need to do for "notebook support"

Rough sketch

  • Construct a valid runnable Scala (Ammonite?) script
    • In order to enjoy the LSP feature on notebooks, we have to combine all the Scala cells to a valid Scala script
    • It might be nice to rely on LSP 3.17's notebook document synchronization feature
      • I'm not sure how the client concatenates cells or customizes the way to contamination (almond cells need to preprocess before contamination).
  • Also, we need to deconstruct the LSP response for the concatenated document into the right place (of notebook cells)
    • I guess the LSP clients should handle that by something like this Middleware, and Scala should have Metals version of Middleware in somewhere (maybe in metals-vscode?) I'm not sure.
  • Metals should run for the given (notebook's) Scala version.
    • It should be available from language_info.version
    • (maybe) start BSP server of ... something (like scala-cli bsp) and connect to the server

VSCode would (automagically) do the rest for us

What do you point to by "the rest"?

…ating a launcher with two seperate classpaths... (not implemented). Updates lsp4k to 0.15.0 in order to get the server notebooks commands.
@Quafadas
Copy link
Contributor Author

@tgodzik I looked into this some more, based on your feedback. What I've done.

  • I found that didOpenTextDocument command isn't triggered when a notebook is opened, so...
  • Bumped lsp4j to 0.15.0 to gain access to the didOpenNotebook command, and added that command to metals initialisation options according to the LSP spec.
  • Added the coursier-launcher dependancy to metals
  • Moved the bootstrap out into it's own "Notebooks" file, and tidied up some of the cruft, version setting etc.

I think this puts the bootstrap in the "right" place, and deals with all the boilerplate... I tried a very naive strategy for setting the classpath to include the project, and was not surprised when it failed.

A couple days ago, Alex A popped up on the coursier discussions board, and essentially answered the question of how to do what I wanted, "properly".
coursier/coursier#2479

There are a couple of key insights.

  • Need the courtier-launcher dependency.
  • Need to somehow split the classpath into two parts. One for launching the kernel itself, and one for ammonite inside, it, to actually play with (if I understood correctly - I may not have).

That part is not done... progress is slow :-(...

@Quafadas
Copy link
Contributor Author

Quafadas commented Aug 21, 2022

@tanishiking It's cool that someone else is looking this... I'm trying... but I may not be the right person! This is my first foray into tooling.

Here are some guides to my thought process ...

In my (data science experience, (not sure in the Scala case though), we often create notebook files in separate /notebooks directory, where it doesn't belong to any Scala project.

I am not a data scientist, but have done similar. My question: would you keep this workflow, if a .ipynb file, was a first class citizen of the project, with the same classpath and integrations that a .scala file has?

My hypothesis, is no, and that I would (personally) change my workflow. If the answer yes, then I think your solution could be a very good one, and maybe we could bi-furcate a solution along those lines, depending on whether our .ipynb was a source file in the project.

It looks like what this PR tries to do is something like downloading an almond kernel for the specific Scala version

My progress is very slow! I recently pushed some extra commits that take it a (little) further. You are right that this was step 1 - I agree, that this is a trivial piece of information, and hence not that exciting.

However, metals is also in the (unique?) position of having access to a second piece of (non-trivial) information - the project classpath.

As you can see from the following gif, we can easily switch the kernel by clicking the upper-right button in the vscode notebook. The point is, **the URI of *.ipynb file and Scala project structure **

Exactly... what I want to test, is setting up a kernel per project, with the right classpath, for that project. When an .ipynb file is opened, that is in the source directory of a project... metals knows it's classpath! We can easily switch between many kernels with the mechanism you propose.

There would theoretically be a second way of doing this. I believe one could set a CLASSPATH variable in the .env file of the project root, and would require a single "global" kernel per scala version. This can be found in the in the VSCode discussion board if memory serves. I see some downsides with this, I particular if it was part of a project that used sbt .env or mill .env plugins, one could make the user rather angry by trampling their settings. For this reason, I concluded it way be better to totally isolate each project in a kernel per project.

This a hypothesis I wish to test. It may not be a good mechanism.

What the "notebook support" looks like

I think you see here a slightly different vision. I had hoped, that if we can get the classpath problem right inside the LSP mechanism, that

  • an *.ipynb file could ultimately end up as a first class project citizen. VSCode already has some great diff tools, so version control makes sense etc, would be great to get compilation information "in project", potentially even debugging etc.
  • that it should be possible to (easily?) benefit from all past (and future) work done on LSP in metals, by wheeling in existing functionality, rather than inventing a new mechanism. Again, a hypothesis I am not yet in a position to test.

At that rate I am progressing, and without help... I guess this may never happen!

As you read, I do not claim, that this is certainly the right way. But I think it is an exciting future if it all hung together... ultimately I think we're pulling the same way.

@Quafadas
Copy link
Contributor Author

@tanishiking Ummm, that was really long... sorry!

@tanishiking
Copy link
Member

tanishiking commented Aug 22, 2022

My question: would you keep this workflow, if a .ipynb file, was a first class citizen of the project, with the same classpath and integrations that a .scala file has?

src
`-- main
    `-- scala
        |-- A.scala
        `-- Foo.ipynb

Do you mean, we may want to put Foo.ipynb adjacent to A.scala in src/main/scala directory?
Basically, I don't think so. why do you want to put the *.ipynb there? Maybe I'm seeing a different vision.

  • We want to use the same Scala version with the A.scala?
    • Don't need to put *.ipynb there, because we can switch the kernel Scala version freely.
  • We want to use the same libraries with the A.scala?
    • Don't need to put *.ipynb there because we can download dependency by import $ivy.org.apache.spark::spark-sql:2.4.0, the way of using external dependency in Scala kernel is quite different from python.
  • We want to call something defined in A.scala from Foo.ipynb?

Don't worry, this task is really challenging one, and cutting-edge feature of LSP. Take you time :)

@alexarchambault
Copy link
Contributor

Hi all, first thanks @Quafadas for starting up work on this! This feature will probably require quite some work overall to fully work, but earlier groundwork will definitely be useful.

I agree with @tanishiking that Metals shouldn't install Almond itself. If notebook support in VSCode allowed (allows?) LSP servers to supply their own kernels, then Metals should use that, but I don't know if it's supported.

As @tanishiking develops in his comment (around "What we need to do for "notebook support"), Metals will need to have some model of a notebook (basically, the kind of info its gets about projects via BSP, such as Scala version, class path, ...) in order to offer LSP features for it. It'll also need semantic DB files for it.

I think getting such a model can be done more faithfully than building a script for the notebook content. Almond could expose such a model itself, via a BSP connection. For that, it would need to detect when it's run by VSCode. If that's the case, it would add the semanticdb scalac plugin when it (asks Ammonite to) compile(s) things, so that we would get semanticdb files. It would also run a BSP server, expecting clients to connect via a socket file say, at a pre-defined location (like somewhere in ~/.local/share/almond/bsp/{kernel PID}, exact location built with directories-jvm). That BSP server would mainly answer "descriptive" endpoints, such as workspace/buildTargets, buildTarget/dependencySources, and the like. buildTarget/compile and buildTarget/test would be no-op (compilation being handled via the Jupyter protocol).

Then Metals would have to connect to that BSP server to know about the notebook. And internally in Metals, we would have one "project" per notebook, as you already discussed above.

@alexarchambault
Copy link
Contributor

alexarchambault commented Aug 22, 2022

One hurdle though, is going to be the mismatch between the sources passed to scalac by Almond (via Ammonite) and the locations in the notebooks that Metals should be passed via LSP.

Basically, Almond / Ammonite creates one source per cell, looking roughly like

package ammonite.$sess
// imports of things from cmd0, cmd1, ... here (that is things defined in previous cells)
object cmd2 {
  // cell code goes here
  def $main() = {
    // generated code that prints variables, function names, etc. defined in cell code
  }
}

So the semantic DBs we get from scalac are about those files.

These semantic DBs will need to be post-processed, even maybe merged, so that they correspond to the document and locations that Metals gets asked about via LSP. As examples, post-processing (of diagnostics, semantic DBs, ...) in Scala CLI is handled around here. In Ammonite (when it runs a BSP server for scripts), it's handled around here.

Also, Metals sometimes needs to pass sources to the interactive compiler (at least for completions, and for some other features too). So it needs to be able to create on-the-fly a source that looks like the one above, so that the user code in it sees the previous cells in particular.

@Quafadas
Copy link
Contributor Author

Quafadas commented Aug 22, 2022

I want to think more about all the feedback above, but in response to;

If notebook support in VSCode allowed (allows?) LSP servers to supply their own kernels

I don't think that's the model.
microsoft/vscode-jupyter#10813

If this helps at all we do respect global jupyter paths for where we pick up kernelspecs.

https://jupyter-client.readthedocs.io/en/latest/kernels.html#kernel-specs

My understanding of that comment, is that VSCode will pick up any kernel it can find according to the Juypter kernelspec document. Hence I have not had the expectation, that the kernel was built into or owned by the LSP server itself. I was hoping to use metals server, to bootstrap kernels on demand, which are compatible with the current project...

@tanishiking Your examples above are great.

We want to call something defined in A.scala from Foo.ipynb?
Is it possible? Can we do that by feeding the project classpath to the almond?
It would be really great if we can do that, but it looks pretty advanced feature of almond (?)

This was my goal. Is it possible? I hope...

the previous two functionalities are waypoints that are needed to get there, but I would agree, not "goals" in their own right.

I guess I'd like to finish what I started, and see if it is indeed possible to get such a kernel to execute code found in "A.scala" in a Foo.ipynb notebook. Whether that was worthy of review / merge is then up to someone else :-).

Based on Alex's response, It sounds like true LSP support would then be a further, seemingly tougher step... I did read the LSP 3.17 notebook spec... it reads like it's designed to re-use as much prior art as possible, but I do not pretend, to understand how.

@Quafadas
Copy link
Contributor Author

Quafadas commented Aug 22, 2022

I had my best shot at what I wanted from this... and failed!

This branch does now appear to and setup a kernel which is recognised by VSCode when opening an ipynb file.

However, in a workspace with no dependancies, running 1 + 2 in a notebook, using that kernel, I get this

  dotty.tools.dotc.classpath.ZipAndJarFileLookupFactory.create(ZipAndJarFileLookupFactory.scala:28)
  dotty.tools.dotc.classpath.ZipAndJarFileLookupFactory.create$(ZipAndJarFileLookupFactory.scala:23)
  dotty.tools.dotc.classpath.ZipAndJarClassPathFactory$.create(ZipAndJarFileLookupFactory.scala:41)
  ammonite.compiler.Compiler.$anonfun$5(Compiler.scala:124)
  scala.collection.immutable.Vector2.map(Vector.scala:1872)
  scala.collection.immutable.Vector2.map(Vector.scala:433)
  ammonite.compiler.Compiler.ammonite$compiler$Compiler$$asDottyClassPath(Compiler.scala:124)
  ammonite.compiler.Compiler$$anon$9.classPath(Compiler.scala:76)

If I include dependancies in the project, then the kernel doesn't even start. That's both good and bad - I can see that those dependancies end up in the launcher jar - so it's trying to do the right thing. Just failing!

Point is, I have no idea where to even begin, on trying to sort this out, sadly. Further, based on the commentary above, it's not even clear that this is the "right" direction of travel.

If forced to guess, I would guess that this may not be a good strategy, but I have no evidence for that.
https://github.com/Quafadas/metals/blob/2f539ae3971c24ac3ee1ae33c9bbc0285aecc9ab/metals/src/main/scala/scala/meta/internal/metals/notebooks/Notebooks.scala#L80

Feedback welcomed... but, I'm stuck.

@Quafadas
Copy link
Contributor Author

If anyone wants to take a look and thinks this is a sensible direction of travel / they can help, most welcome and I'll reopen.

@Quafadas Quafadas closed this Aug 23, 2022
@tgodzik
Copy link
Contributor

tgodzik commented Aug 23, 2022

Thanks @Quafadas for all the work! It seems that we might indeed need to rethink the approach and how we handle notebooks 🤔 I don't have a lot of expertise there unfortunately, but I hoped it wouldn't be that complex :sigh:

@alexarchambault @tanishiking Let's keep the discussion running in the feature request.

@Quafadas
Copy link
Contributor Author

@tgodzik It was worth a punt!

It's healthy that there are lots of different views :-)... my own is that I guess I'm not that excited by LSP support (don't get me wrong, very cool)... but being able to just "run" a kernel, with the project context (even without the trimmings), I think could be super useful (for me)... and I was hoping too... a lot simpler :-)!

... so I gave it my best shot and failed! Which I'm fine with... I'm a bit bummed out that I don't have a distance function - do I have a typo... Or is it that trying to construct a classpath like this will never work... But hey ho - I learned a lot...

Most of which was how incredible the metals project is - thanks again for your willingness to answer!

@tanishiking
Copy link
Member

@Quafadas Thanks again for working on this!
While "Language support for notebook" is something different and pretty complex task, being able to set up kernel from Metals would be still useful ❤️

If you're interested, maybe we can open another ticket for "setting up kernel from Metals" 👍 "running kernel with the project context" will be a stretch goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants