I needed fulltext search in a text-heavy application that uses DataScript.
Yes, but it's still early.
The search adapter maintains a fulltext index in a separate DataScript database. This is not ideal, but the current design of Datomic & DataScript do not support extensible indices.
The fulltext adapter:
- listens for changes in the source connection using
(d/listen! conn)
, - inspects the incoming tx-report,
- filters on attributes that have
:db/fulltext true
in their schema, - tokenises the string value,
- removes stop words like "the" and "and",
- maintains a multi-cardinality attribute in the fulltext DataScript DB.
Using a separate connection makes it convenient to have a one-to-one attribute mapping and to manage cache eviction since it could grow large. In practice this is not an issue because you can query across DataScript databases, e.g. (d/q '[:in $ $1 ...] db1 db2)
.
(ns datascript-fulltext.example
:require [[reagent.core :as r :refer [atom]]
[datascript.core :as d]
[com.theronic.datascript.fulltext :as ft]])
(def conn (d/create-conn {:message/text {:db/fulltext true}})
(def !ft-conn (atom nil))
(defn parent-component [conn]
(let [!input (atom "hi")] ;; todo text input
[:div [:code "Matching fulltext entities: " (ft/query! @!input)]]))
(defn init! []
(let [fulltext-conn (ft/install-fulltext! conn)]
(d/transact! conn [{:db/id -1 :message/text "hi there"}]) ;; load from storage after connecting to sync.
(ft/search @ft/ft-conn "hi") ;; => fill yield message ID.
(reset! !ft-conn fulltext-conn))
(reagent/render [parent-component conn]))
(init!)
-
Delete index values on mutation. Should be quick to add, and then do a smart diff to avoid writes. - Store hashed token values instead of strings.
- Use schema definition of source connection.
- Track source schema and rebuild index on change.
- Batch updates using queued web workers to prevent locking main thread.
- Add adapter for off-site storage, e.g. Redis.
- Match source transaction IDs if possible.
- Fork & extend Datascript to support
(fulltext ...)
search function. - [in progress] Add soundex or double-metaphone
- Maintain indexed token counts for relevance ranking. Maybe :db/index does this already?
- Support n-grams (can get heavy).
- Bloom filters.
- seq over matching datoms directly with pagination.