Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments in Code cells are taken as section and increment numbering #11835

Closed
mcanouil opened this issue Jan 10, 2025 · 6 comments · Fixed by #11841
Closed

Comments in Code cells are taken as section and increment numbering #11835

mcanouil opened this issue Jan 10, 2025 · 6 comments · Fixed by #11841
Assignees
Labels
bug Something isn't working latex LaTeX engines related libraries and technologies typst
Milestone

Comments

@mcanouil
Copy link
Collaborator

mcanouil commented Jan 10, 2025

Bug description

A comment in a code cell (qmd/ipynb) # a comment lead to the wrong overall header level to be used in the document for Typst and LaTeX formats.

Note that the behaviour is observed for all engines.

Steps to reproduce

Render the following document into Typst or LaTeX PDF.

InputOutput
---
title: XXXX
format:
  typst: default
  pdf: default
number-sections: true
jupyter: python3
---

## Introduction

```{python}
# Import libraries for loading the dataset

print("Hello World")
```

### Subsection

{{< lipsum 1 >}}
Image

Expected behavior

Without having to double the comment symbol or use shift-heading-level-by: -1.

InputOutput
---
title: XXXX
format:
  typst: default
  pdf: default
number-sections: true
jupyter: python3
---

## Introduction

```{python}
## Import libraries for loading the dataset

print("Hello World")
```

### Subsection

{{< lipsum 1 >}}
Image

Your environment

  • macOS Sequoia (latest)

Quarto check output

Quarto 99.9.9
[✓] Checking environment information...
      Quarto cache location: /Users/mcanouil/Library/Caches/quarto
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.4.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.46.3: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 99.9.9
      commit: d0d6f87e18e7522386bc8b1ab5b390ef7eca3799
      Path: /Users/mcanouil/Projects/quarto/quarto-cli/package/dist/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.11
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/mcanouil/Library/TinyTeX/bin/universal-darwin
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.13.1
      Path: /Users/mcanouil/Projects/quarto/quarto-playground/.venv/bin/python3
      Jupyter: 5.7.2
      Kernels: julia-1.11, python3

[✓] Checking Jupyter engine render....OK

(|) Checking R installation...........ℹ R version 4.4.2 (2024-10-31)
! Config '~/.Rprofile' was loaded!
[✓] Checking R installation...........OK
      Version: 4.4.2
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Users/mcanouil/Projects/quarto/quarto-playground/renv/library/macos/R-4.4/aarch64-apple-darwin20
        - /Users/mcanouil/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815
      knitr: 1.48
      rmarkdown: 2.29

[✓] Checking Knitr engine render......OK
@mcanouil mcanouil added bug Something isn't working jupyter latex LaTeX engines related libraries and technologies typst and removed jupyter labels Jan 10, 2025
@mcanouil mcanouil changed the title Comments in Code cells are taken as section for numbering Comments in Code cells are taken as section and increment numbering Jan 10, 2025
@mcanouil mcanouil added the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label Jan 10, 2025
@cderv cderv self-assigned this Jan 10, 2025
@cscheid cscheid added this to the v1.7 milestone Jan 10, 2025
@cscheid cscheid removed the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label Jan 10, 2025
@cscheid
Copy link
Collaborator

cscheid commented Jan 10, 2025

This bug almost certainly from the code that detects the highest level of headings. I bet we're using regexps there.

Another point for the "AST everywhere" project I keep talking about...

@cderv
Copy link
Collaborator

cderv commented Jan 10, 2025

It does not happen for HTML it seems, so I was curious.

This is indeed LaTeX and Typst specific

// Don't shift the headings if we see any H1s (we can't shift up any longer)
const hasLevelOneHeadings = !!markdown.match(/\n^#\s.*$/gm);
// pdfs with no other heading level oriented options get their heading level shifted by -1
if (
!hasLevelOneHeadings &&
autoShiftHeadings &&
(flags?.[kNumberSections] === true ||
format.pandoc[kNumberSections] === true) &&
flags?.[kTopLevelDivision] === undefined &&
format.pandoc?.[kTopLevelDivision] === undefined &&
flags?.[kShiftHeadingLevelBy] === undefined &&
format.pandoc?.[kShiftHeadingLevelBy] === undefined
) {
extras.pandoc = {
[kShiftHeadingLevelBy]: -1,
};
}

// unless otherwise specified, pdfs with only level 2 or greater headings get their
// heading level shifted by -1.
const hasLevelOneHeadings = !!markdown.match(/\n^#\s.*$/gm);
if (
!hasLevelOneHeadings &&
flags?.[kShiftHeadingLevelBy] === undefined &&
format.pandoc?.[kShiftHeadingLevelBy] === undefined
) {
pandoc[kShiftHeadingLevelBy] = -1;
}

So this is a processing specific to Quarto and we do indeed use regex in both place to search for headings.

// heading level shifted by -1.
const hasLevelOneHeadings = !!markdown.match(/\n^#\s.*$/gm);

// Don't shift the headings if we see any H1s (we can't shift up any longer)
const hasLevelOneHeadings = !!markdown.match(/\n^#\s.*$/gm);

On way to fix this would be a Pandoc call with a Lua filter giving us information if any H1.
Or a conversion with pandoc to json ast format to parse. Something like that, that does provided us with AST. In R Markdown ecosystem, I think we needed something like this, and we used a basic --to html conversion to then query DOM for any expected node.

In any case, this is another call to pandoc if we go that road.

@mcanouil
Copy link
Collaborator Author

It does not happen for HTML it seems, so I was curious.

This is indeed LaTeX and Typst specific

Yes, sorry I only implied it.
I did not test docx though.

@cscheid
Copy link
Collaborator

cscheid commented Jan 10, 2025

The truly correct fix here would be for us to operate on the AST (either via Pandoc or some other parser).

A slightly better way to this would be to use the MappedString infrastructure and breakQuartoMd, but I think it will require adding a few functions to the mapped-text.ts API, and I don't know if that's worth the trouble. But, you could:

  • create a version of searchAll in src/core/lib/mapped-text.ts that takes a mappedstring returns an array of mappedstring results.
  • use breakQuartoMd function to find code blocks, and then check that the returned mappedstring values are not inside any code blocks.

Edit: it's even easier than that, because formatExtras uses a string instead of a MappedString. I'll make a PR.

@cscheid
Copy link
Collaborator

cscheid commented Jan 10, 2025

:sadtrombone:

We can't really solve it with breakQuartoMd, because formatExtras operates on the post-engine markdown, which doesn't have executable code blocks, and the bug can happen with non-executable code blocks as well. Here's a simpler repro:

---
format: typst
number-sections: true
---

## Section 1

```r
# this is a comment that shouldn't break quarto
print("Hello, world")
```

A second call to pandoc it is.

@cderv
Copy link
Collaborator

cderv commented Jan 11, 2025

#11841 is exactly what I had in mind by using Pandoc ! Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working latex LaTeX engines related libraries and technologies typst
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants