From e67fb59c66c4c4e0ca859935d59fcc8294bdf2de Mon Sep 17 00:00:00 2001 From: "D. Bohdan" Date: Tue, 12 Nov 2024 13:56:35 +0000 Subject: [PATCH] refactor: use `jsonl` and "JSON Lines" terminology It is best to avoid ambiguity about whether the tools work with JSON or JSON Lines documents. This breaks the JSON import workflow: - `tools/dir2json` is renamed to `tools/dir2jsonl`. - `tools/import json` becomes `tools/import jsonl`. Convert the readme to SemBr. --- README.md | 109 ++++++++++++++++------------------ tests/tests.tcl | 28 ++++----- tinyfts-dev.tcl | 2 +- tools/{dir2json => dir2jsonl} | 18 +++--- tools/import | 14 ++--- 5 files changed, 83 insertions(+), 88 deletions(-) rename tools/{dir2json => dir2jsonl} (86%) diff --git a/README.md b/README.md index fb8cf3a..564a230 100644 --- a/README.md +++ b/README.md @@ -4,33 +4,32 @@ A very small standalone full-text search HTTP/SCGI server. -![A screenshot of what the unofficial tinyfts search service for the -Tcler's Wiki looked like](screenshot.png) +![A screenshot of what the unofficial tinyfts search service for the Tcler's Wiki looked like](screenshot.png) ## Contents -* [Dependencies](#dependencies) -* [Usage](#usage) -* [Query syntax](#query-syntax) -* [Setup](#setup) -* [Operating notes](#operating-notes) -* [License](#license) +- [Dependencies](#dependencies) +- [Usage](#usage) +- [Query syntax](#query-syntax) +- [Setup](#setup) +- [Operating notes](#operating-notes) +- [License](#license) ## Dependencies ### Server -* Tcl 8.6 -* tclsqlite3 with [FTS5](https://sqlite.org/fts5.html) +- Tcl 8.6 +- tclsqlite3 with [FTS5](https://sqlite.org/fts5.html) ### Building, tools, and tests The above and -* Tcllib -* kill(1), make(1), sqlite3(1) -* tDOM and file(1) to run `tools/dir2json` +- Tcllib +- kill(1), make(1), sqlite3(1) +- tDOM and file(1) to run `tools/dir2jsonl` On recent Debian and Ubuntu install the dependencies with @@ -73,7 +72,7 @@ Options: The basic usage is ```sh -tools/import json example.jsonl example.sqlite3 +tools/import jsonl example.jsonl example.sqlite3 # Local server ./tinyfts --db-file example.sqlite3 --local 8080 # Server available over the network @@ -84,50 +83,48 @@ tools/import json example.jsonl example.sqlite3 ### Default or "web" -The default full-text search query syntax in tinyfts resembles that of a Web -search engine. It can handle the following types of expressions. +The default full-text search query syntax in tinyfts resembles that of a Web search engine. +It can handle the following types of expressions. -* `foo` — search for the word *foo*. -* `"foo bar"` — search for the phrase *foo bar*. -* `foo AND bar`, `foo OR bar`, `NOT foo` — search for both *foo* and *bar*, at -least one of *foo* and *bar*, documents without *foo* respectively. -*foo AND bar* is identical to *foo bar*. The operators *AND*, *OR*, and *NOT* -must be in all caps. -* `-foo`, `-"foo bar"` — the same as `NOT foo`, `NOT "foo bar"`. +- `foo` — search for the word *foo*. +- `"foo bar"` — search for the phrase *foo bar*. +- `foo AND bar`, `foo OR bar`, `NOT foo` — search for both *foo* and *bar*, + at least one of *foo* and *bar*, + documents without *foo* respectively. + *foo AND bar* is identical to *foo bar*. + The operators *AND*, *OR*, and *NOT* must be in all caps. +- `-foo`, `-"foo bar"` — the same as `NOT foo`, `NOT "foo bar"`. ### FTS5 You can allow your users to write full [FTS5 queries](https://www.sqlite.org/fts5.html#full_text_query_syntax) -with the command line option `--query-syntax fts5`. FTS5 queries are more -powerful but expose the technical details of the underlying database. (For -example, the column names.) Users who are unfamiliar with the FTS5 syntax -will find it surprising and run into errors because they did not quote a word -that has a special meaning. +with the command line option `--query-syntax fts5`. +FTS5 queries are more powerful but expose the technical details of the underlying database. +(For example, the column names.) +Users who are unfamiliar with the FTS5 syntax will find it surprising and run into errors because they did not quote a word that has a special meaning. ## Setup -Tinyfts searches the contents of an SQLite database table with a particular -schema. The bundled import tool `tools/import` can import serialized data -(text files with one JSON object or Tcl dictionary per line) and wiki pages -from a [Wikit](https://wiki.tcl-lang.org/page/Wikit)/Nikit database into -a tinyfts database. +Tinyfts searches the contents of an SQLite database table with a particular schema. +The bundled import tool `tools/import` can import serialized data +(text files with one [JSON object](https://jsonlines.org/) or Tcl dictionary per line) +and wiki pages from a [Wikit](https://wiki.tcl-lang.org/page/Wikit)/Nikit database to a tinyfts database. ### Example This example shows how to set up search for a backup copy of the -[Tcler's Wiki](https://wiki.tcl-lang.org/page/About+the+WIki). The -instructions should work on most Linux distributions and FreeBSD with the -dependencies and Git installed. +[Tcler's Wiki](https://wiki.tcl-lang.org/page/About+the+WIki). +The instructions should work on most Linux distributions and FreeBSD with the dependencies and Git installed. 1\. Go to . -Download and extract the last Wikit database snapshot of the Tcler's Wiki. -Currently that is `wikit-20141112.zip`. Let's assume you have extracted the -database file to `~/Downloads/wikit.tkd`. + Download and extract the last Wikit database snapshot of the Tcler's Wiki. +Currently that is `wikit-20141112.zip`. + Let's assume you have extracted the database file to `~/Downloads/wikit.tkd`. -2\. Download, build, and test tinyfts. In this example we use Git to get the -latest development version. +2\. Download, build, and test tinyfts. + In this example we use Git to get the latest development version. ```sh git clone https://github.com/dbohdan/tinyfts @@ -135,17 +132,17 @@ cd tinyfts make ``` -3\. Create a tinyfts search database from the Tcler's Wiki database. The -repository includes an import tool that supports Wikit databases. Depending -on your hardware, this may take up to several minutes with an input database -size in the hundreds of megabytes. +3\. Create a tinyfts search database from the Tcler's Wiki database. + The repository includes an import tool that supports Wikit databases. + Depending on your hardware, this may take up to several minutes with an input database size in the hundreds of megabytes. ```sh ./tools/import wikit ~/Downloads/wikit.tkd /tmp/fts.sqlite3 ``` -4\. Start tinyfts on . The server URL should open -automatically in your browser. Try searching. +4\. Start tinyfts on . + The server URL should open automatically in your browser. + Try searching. ```sh ./tinyfts --db-file /tmp/fts.sqlite3 --title 'tinyfts demo' --local 8080 @@ -154,17 +151,15 @@ automatically in your browser. Try searching. ## Operating notes -* If you put tinyfts behind a reverse proxy, remember to start it with the -command line option `--behind-reverse-proxy true`. It is necessary for -correct client IP address detection, which rate limiting depends on. Do -**not** enable `--behind-reverse-proxy` if tinyfts is not behind a reverse -proxy. It will let clients spoof their IP with the header `X-Real-IP` or -`X-Forwarded-For` and evade rate limiting themselves and rate limit others. +- If you put tinyfts behind a reverse proxy, remember to start it with the command line option `--behind-reverse-proxy true`. + It is necessary for +correct client IP address detection, which rate limiting depends on. + Do **not** enable `--behind-reverse-proxy` if tinyfts is not behind a reverse proxy. + It will let clients spoof their IP with the header `X-Real-IP` or `X-Forwarded-For` and evade rate limiting themselves and rate limit others. ## License -MIT. [Wapp](https://wapp.tcl.tk/) is copyright (c) 2017-2022 D. Richard Hipp -and is distributed under the Simplified BSD License. -[Tacit](https://github.com/yegor256/tacit) is copyright (c) 2015-2020 -Yegor Bugayenko and is distributed under the MIT license. +MIT. +[Wapp](https://wapp.tcl.tk/) is copyright (c) 2017-2022 D. Richard Hipp and is distributed under the Simplified BSD License. +[Tacit](https://github.com/yegor256/tacit) is copyright (c) 2015-2020 Yegor Bugayenko and is distributed under the MIT license. diff --git a/tests/tests.tcl b/tests/tests.tcl index bc10af0..6143f44 100755 --- a/tests/tests.tcl +++ b/tests/tests.tcl @@ -30,7 +30,7 @@ package require textutil cd [file dirname [info script]]/.. -set td(json-sample) [string map [list \n\n \n \n {}] { +set td(json-lines-sample) [string map [list \n\n \n \n {}] { { "url": "https://fts.example.com/foo", "title": "Foo", @@ -67,7 +67,7 @@ set td(json-sample) [string map [list \n\n \n \n {}] { } }] -set td(tcl-sample) [join [lmap line [split $td(json-sample) \n] { +set td(tcl-sample) [join [lmap line [split $td(json-lines-sample) \n] { json::json2dict $line }] \n] @@ -147,20 +147,20 @@ tcltest::test tools-import-1.1.3 {Tcl import} -cleanup $td(cleanup) -body { } -match glob -result https://fts.example.com/foo\nhttps://fts.example.com/bar -tcltest::test tools-import-1.2.1 {JSON import} -body { - tclsh tools/import json - $td(dbFile) << $td(json-sample) +tcltest::test tools-import-1.2.1 {JSON Lines import} -body { + tclsh tools/import jsonl - $td(dbFile) << $td(json-lines-sample) } -cleanup $td(cleanup) -result {} -tcltest::test tools-import-1.2.2 {JSON import} -cleanup $td(cleanup) -body { - tclsh tools/import json - $td(dbFile) --table blah \ - << $td(json-sample) +tcltest::test tools-import-1.2.2 {JSON Lines import} -cleanup $td(cleanup) -body { + tclsh tools/import jsonl - $td(dbFile) --table blah \ + << $td(json-lines-sample) exec sqlite3 $td(dbFile) .schema } -match glob -result {*CREATE VIRTUAL TABLE "blah"*USING fts5*} -tcltest::test tools-import-1.2.3 {JSON import} -cleanup $td(cleanup) -body { - tclsh tools/import json - $td(dbFile) --url-prefix http://example.com/ \ - << $td(json-sample) +tcltest::test tools-import-1.2.3 {JSON Lines import} -cleanup $td(cleanup) -body { + tclsh tools/import jsonl - $td(dbFile) --url-prefix http://example.com/ \ + << $td(json-lines-sample) exec sqlite3 $td(dbFile) {SELECT url FROM tinyfts LIMIT 2} } -match glob -result https://fts.example.com/foo\nhttps://fts.example.com/bar @@ -179,14 +179,14 @@ tcltest::test tools-import-2.3 {} -cleanup $td(cleanup) -body { } -returnCodes 1 -match glob -result * -tcltest::test tools-dir2json-1.1 {Normal dir} -body { - tclsh tools/dir2json x/ tests/dir1/ +tcltest::test tools-dir2jsonl-1.1 {Normal dir} -body { + tclsh tools/dir2jsonl x/ tests/dir1/ } -match regexp -result \ {{"url":"x/bar.html","timestamp":\d+,"title":"bar.html","content":"Bar."} {"url":"x/foo.txt","timestamp":\d+,"title":"foo.txt","content":"Foo."}} -tcltest::test tools-dir2json-2.1 {Bad HTML} -body { - tclsh tools/dir2json x/ tests/dir2/ 2>@1 +tcltest::test tools-dir2jsonl-2.1 {Bad HTML} -body { + tclsh tools/dir2jsonl x/ tests/dir2/ 2>@1 } -match glob -result \ {*can't parse HTML*Missing ">"*"content":"