-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace middleman-search with lunr with Algolia DocSearch #706
base: master
Are you sure you want to change the base?
Conversation
f34b2cf
to
5351a54
Compare
5351a54
to
bb85a7e
Compare
bb85a7e
to
9abd7c2
Compare
Hey, small feedback to improve your DocSearch results, you can update the Crawler config to better scope the |
9abd7c2
to
ba24618
Compare
@shortcuts Thanks, but I could not see any good element for |
The main title of the page, the navbar section/active subsection is what we recommend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice!
I agree that Algolia Docsearch seems easier for the moment.
Is it possible to recrawl the website on every deploy instead of daily?
While testing I noticed some extra search results which should not be needed. For example, when looking for "docker", one of the results is a link to the sidebar: https://bundler.io/guides/bundler_docker_guide.html#sidebar-wrapper. Can we avoid generating these redundant results?
source/layouts/base.haml
Outdated
@@ -30,3 +30,6 @@ | |||
|
|||
.footer | |||
= partial 'layouts/footer' | |||
= javascript_include_tag "https://cdn.jsdelivr.net/npm/@docsearch/js@3.1.1/dist/umd/index.min.js" | |||
= javascript_include_tag "https://cdn.jsdelivr.net/npm/bootstrap@5.2.0-beta1/dist/js/bootstrap.min.js" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this? Is it related to not longer needing Popover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to think this is not needed. By removing the final load import { Popover } from 'bootstrap';
from assets/javascripts/search.js
, it looks like bootstrap esm JS will not loaded any more, which will break navbar collapse. So with the current situation of slowness of asset (https://bundler.io/application.min.js
) downloading from GitHub Pages.
The below image shows how fast JavaScripts are delivered from CDN (DocSearch JS, Bootstgrap JS, and our own JS from top to bottom):
See also https://getbootstrap.com/docs/5.1/getting-started/webpack/#importing-javascript
Is the search branded right now? That would be big 👎 for me. |
@simi Yes I see you, but Typesense will cause the same result... |
@simi As the next step to this PR, we can prepare our own cluster on Algolia (or Typesense cloud). We can estimate how much it will be if some sponsors pay for it.
|
In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go. |
Sure, we have a GitHub action available: https://github.com/algolia/algoliasearch-crawler-github-actions (which works with DocSearch :)) |
My previous comment is at least now incorrect (my memory might be from years ago...). As described in the doc below, we can schedule it more frequently and trigger it manually. Not sure we can do it via API/CLI though:
https://www.algolia.com/doc/tools/crawler/apis/configuration/schedule/ still says every 24 hours:
Crawler configuration can be modified at https://crawler.algolia.com/admin/crawlers/7f4b6579-bba3-4c1f-a0ff-462d9f408281/configuration/edit (I sent an invitation to you). Just hours ago, I updated it as follows: new Crawler({
rateLimit: 8,
startUrls: ["https://bundler.io"],
renderJavaScript: false,
sitemaps: [],
ignoreCanonicalTo: false,
discoveryPatterns: ["https://bundler.io/**"],
schedule: "at 5:44 PM on Sunday",
actions: [
{
indexName: "bundler",
pathsToMatch: ["https://bundler.io/**"],
recordExtractor: ({ helpers }) => {
return helpers.docsearch({
recordProps: {
lvl1: [
".commands h2#NAME",
"h1",
"#page-content-wrapper h2",
"head > title",
],
content: [
"#page-content-wrapper p, #page-content-wrapper li",
"p",
".team .card-body",
"span.contributor",
],
lvl0: {
selectors: "",
defaultValue: "Documentation",
},
lvl2: ["h3"],
lvl3: ["h4"],
lvl4: ["h5"],
lvl5: ["article h5", "main h5", "h5"],
lvl6: ["article h6", "main h6", "h6"],
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
],
initialIndexSettings: {
bundler: {
attributesForFaceting: ["type", "lang"],
attributesToRetrieve: [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type",
],
attributesToHighlight: ["hierarchy", "content"],
attributesToSnippet: ["content:10"],
camelCaseAttributes: ["hierarchy", "content"],
searchableAttributes: [
"unordered(hierarchy.lvl0)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy.lvl6)",
"content",
],
distinct: true,
attributeForDistinct: "url",
customRanking: [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)",
],
ranking: [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom",
],
highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: "</span>",
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: "allOptional",
},
},
appId: "3JA5LRH987",
apiKey: "masked",
}); |
ba24618
to
eff461e
Compare
Updated the primary color for box-shadow etc. in searchbox and modal as I have forgotten the adaption of Bundler branding color. |
eff461e
to
846f7c4
Compare
846f7c4
to
e85d9be
Compare
e85d9be
to
dd60648
Compare
dd60648
to
d2fdc7e
Compare
d2fdc7e
to
2b5f07e
Compare
Admin role is now requested to Org members/Repo admins in #730 to set up periodic indexing on Actions 🙏 |
2b5f07e
to
093b187
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it soon.
3f2bf0e
to
5238184
Compare
Signed-off-by: Takuya Noguchi <takninnovationresearch@gmail.com>
5238184
to
2c82de3
Compare
@deivid-rodriguez How can I trigger a review app for this (old) PR (just for development purpose)? |
I do it manually from Heroku UI, I created it for you. |
What was the end-user problem that led to this PR?
As middleman-search (RubyGems, GitHub) has not been maintained for more than 5.5 years (for example manastech/middleman-search#38), bundler-site maintainers/contributors cannot upgrade lunr to the latest without their additional efforts (or other community's efforts than maintainers/contributors of this repo).
Closes #691
What was your diagnosis of the problem?
UI can be replaced by https://github.com/algolia/docsearch or its forks. Backend (current
/search/lunr-index.json
) can be replaced by Algolia DocSearch with automated crawling or Typesense cloud with manual crawling by us.What is your fix for the problem, implemented in this PR?
/application.min.js
is moved to outside of<head>
.The crawler is manually run by @tnir at any time.
Screenshots
PC (as-is)
PC (searching)
Old screenshots for previous changeset
Checklist
--docsearch-searchbox-background
is now #ebedf0 (default), which is away from "Bunlder color scheme", which does not exist 😁750px
-breakpoint for mobile while the current site powered by Bootstrap 5 has 768px for navbar.Why did you choose this fix out of the possible options?
Algolia's DocSearch for open source projects gets ready in a few days. If Typesense offers
we might be able to use #702, but this change looks clear at this moment.
Signed-off-by: Takuya Noguchi takninnovationresearch@gmail.com