Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of search results is poor #331

Open
esafak opened this issue Sep 29, 2022 · 17 comments · May be fixed by #341
Open

Formatting of search results is poor #331

esafak opened this issue Sep 29, 2022 · 17 comments · May be fixed by #341

Comments

@esafak
Copy link
Contributor

esafak commented Sep 29, 2022

Surely this is not a reasonable result for static statement:

Screen Shot 2022-09-28 at 21 12 47

The code block is broken, and a random sentence is printed in a huge font while the title's weight is weaker than the author's.

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Jan 24, 2023

I just tried to take a look at this out of curiosity and received this output in the forum:

image

That seems reasonable enough to me.
I'm not 100%, but was the issue maybe fixed in the meantime and can thus be closed? Or am I overlooking anything?

@pietroppeter
Copy link

it does indeed seems it is fixed but looking at past commits it is unclear what fixed it (I do remember the issue in the past, do not remember recent occurences). and unclear whether or not the forum has been recently deployed. #317 would be useful :)

@ringabout
Copy link
Member

ringabout commented Jan 24, 2023

https://forum.nim-lang.org/t/7969#50770 seems to be broken: "Couldn't render historic post in #50770"

image

And it cannot be rendered properly in the search results.

@pietroppeter
Copy link

mmh, if this screenshot is to be trusted we do have a last deploy last october:
image

@pietroppeter
Copy link

https://forum.nim-lang.org/t/7969#50770 seems to be broken: "Couldn't render historic post in #50770"

this is related to this (still open) issue: #330

I think there are some cases still not working

@jyapayne
Copy link
Collaborator

jyapayne commented Jan 24, 2023

@pietroppeter and @PhilippMDoerner Search the forum for option.sepBy. You should get this:

image

The issue is only present when the code block gets cut off and doesn't have an ending and/or beginning ```

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Jan 24, 2023

I can see the issue in the forum, however, I can't seem to replicate it when I run this locally.

I pretty much just copy pasted jack's post above to make a thread out of it, but no matter what I search, this is the kind of stuff I get:

image
image
image

This is with the forum compiled under 1.9.1, which shouldn't make a difference here.

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Jan 24, 2023

Ohhhhh it's explicitly the comment ## match --<name>:<option> that is making an issue here.
The combination of the comment being ## and no indication that this should be a codeblock being present.
So basically it just sees the "##" in normal text and assumes "That's normal markdown, so that's a headline" and then we have our problem.

Now the next question that follows this is:
If you have a code-comment in a search-result code-block and the indicators for the code-block get cut off... how do you treat this then? That seems like a pretty ugly issue to fix.
To prevent this kind of thing you'd kind of have to parse the text to ensure if a code-block occurs that it is always closed at the very least at the end of the comment before ellipsing out.

And that is only if the snippet you want to show is near the start or the end of a codeblock.
If you're inside of a code-block you also need to scan this and put the entire snippet in between codeblock markings. I'll need to investigate whether that is even possible since I know this makes use of sqlite FTS, though I'm more familiar with FTS5 than FTS4

@jyapayne
Copy link
Collaborator

@PhilippMDoerner You are correct. This issue would likely be fairly difficult to solve based on the reasons you mentioned. With the current structure of the code, I'm also not sure how to approach it. FTS with the snippet extension only returns the partial text from each match, so you'd probably need to get the full text from sqlite to compare so you'd be able to see where the start/end code block markers are. An ugly problem indeed.

I was curious how Discourse solves this, and it looks like they just strip all formatting and render the plain text: https://discourse.codecombat.com/search?q=penPositions%20%3D. Maybe we could do that as well.

@pietroppeter
Copy link

they just strip all formatting and render the plain text

It seems a very reasonable thing to do indeed. Maybe highlighting (bold?) the search term as they do

@PhilippMDoerner
Copy link
Contributor

they just strip all formatting and render the plain text

It seems a very reasonable thing to do indeed. Maybe highlighting (bold?) the search term as they do

That actually runs into problems for exactly the example we were looking at.
Say the comment looks like this:

## [<p1> (<p2> <p1>)*]

And you have a searchquery of <p1>.
Now naturally you can just go and do a "replace" on the HTML string to insert <strong> <p1> </strong> for any hit to<p1> in the string.
The issue comes in rendering.
How do you render this int he frontend correctly now, because you want the <strong> tags to render as HTML, while you want <p1> to render as text?
If you try to render this as HTML, you'll get the <p1> bold but you also get interpreted to be rendered as HTML and it starts disappearing.

Demonstration:
image

As you can see, there is no to be found anywhere.

Now we can of course circumvent that, we can say to just render the text as is. But that means it'll also render the <strong> tags for text instead of as HTML.

Demonstration:
image

So either we just render plaintext, or we accept that any HTML-strings in the content itself will not be rendered as text.

As an additional issue though, I haven't found an easy way in karax so far to retain the whitespacing, so it all looks like the unreadable messy textblobs you see above.

@jyapayne
Copy link
Collaborator

jyapayne commented Jan 25, 2023

@PhilippMDoerner If you want to render a search of <p1>, you will have to escape the characters that HTML considers special characters. See here for a complete table of conversions. I believe you should only have to escape &, <, >, ", and '. See the lodash escape table.

@jyapayne
Copy link
Collaborator

So in your case, you would replace <p1> by <strong> &lt;p1&gt; </strong>

PhilippMDoerner added a commit to PhilippMDoerner/nimforum that referenced this issue Jan 26, 2023
When rendering search hits via karax, it is desired to
emphasise the search term in the hit that was originally looked for.

This requires you to surround the search term with <strong> tags.
However, any HTML-text in the search hit will also be rendered as HTML.
This is undesired. Thus all HTML is also escaped in the text.

The rest is just name refactoring for better clarity.

nim-lang#331
PhilippMDoerner added a commit to PhilippMDoerner/nimforum that referenced this issue Jan 26, 2023
When highlighting text with "*" in markdown, it needs space between
the hit enshrined by "**" to get boldened.
This is not always the case.
Therefore we add whitespaces before/after the **
This way we can enforce that it properly gets boldened.

nim-lang#331
PhilippMDoerner added a commit to PhilippMDoerner/nimforum that referenced this issue Jan 26, 2023
Sometimes search-results contain the string `## `.
They do occur as code comments in code-blocks.
If they get recognized, they get turned into headlines.
To avoid this, a replace statement was added.

nim-lang#331
@PhilippMDoerner
Copy link
Contributor

It does turn out we're overcomplicating things.
I just throw together a bit of a solution for multiple issues:

  1. The snippets that get rendered (using SQLITE FTS4-SQL feature called SNIPPET) sometimes fail to render properly. You instead get to actually see the ** that should get rendered into strong tags, because there's no whitespace after the match. I added those, that get rid of an entire class of rendering issues
  2. To prevent ## and # (the leading whitespace is important) to be rendered into a headline, we might as well do a simple replace of # in general to \\# . Most of the time that fixes the issue, sometimes that leads to \# being rendered instead of #. Which I can live with.

Mysteriously that even takes care of the HTML issue

image

@jyapayne
Copy link
Collaborator

That fixes the bolded text issue, but now you have more issues because the markdown is parsing proc sepBy[T, T2](p1: Parser[T], p2: Parser[P2]): Parser[seq[T]] = as proc sepBy T, T2: Parser[seq[T]]. You're losing information in the text because markdown sees [X](Y) and parses it as a link Y with text X. I still think you're better off using the plaintext and escaping the HTML special characters as I mentioned above. Otherwise you're going to have to keep finding hacks to workaround markdown parsing issues.

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Jan 26, 2023

The alternative is plain-text and hacks that work around making things sometimes bold and sometimes not, we're running into hacks either way.
Because you'll have to escape HTML-characters first and then replace whatever placeholder you inserted around search-matches with <strong> and </strong> tags, Neither feels particularly clean. The benefit this one provides is that at the very least it provides some more readable formatting.

PhilippMDoerner added a commit to PhilippMDoerner/nimforum that referenced this issue Jan 26, 2023
Sometimes search-results contain the string `[`.
They do occur as generics in code-blocks.
If they get recognized, they get turned into links.
To avoid this, a replace statement was added.

nim-lang#331
@jyapayne
Copy link
Collaborator

jyapayne commented Jan 26, 2023

But they aren't hacks. HTML has special characters that need to be escaped if you want to display them normally. Replacing &, <, >, ", and ' with their respective escape codes is how you insert text into an HTML element in JS.

I get the formatting will be cleaner in the code samples, but that isn't really an issue when you can just click into the post and read the full code. There is a good reason that Discourse doesn't render text with formatting. It's tricky business and broken text gets rendered like it currently does in the Nim forum.

Also, if you start escaping Markdown characters, the formatting in the regular, non-code text will be broken and won't display headers and links correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants