-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending Kazu #20
Comments
oops - just hit the 'comment and close issue button' by accident midway through writing a reply, sorry! real reply pending |
Yes, Kazu is very flexible - the downside is that it's so flexible, we haven't yet done a great job of documenting all that flexibility. In order to bring an additional (custom) ontology/kb, you would need to build your own model pack. Doing this is something we have had on our backlog to document for a while, but don't have anything good yet unfortunately. One note is that we currently in the process of releasing a new version of Kazu - 2.0 . This doesn't change much for a user of the default model pack, but changes some of the details of providing config for 'Curating' knowledge bases to e.g. filter out bad synonyms for NER. How urgently are you looking at this - if I waited until the new version is out sometime next week to give you a proper guide, would that be ok for you, or would you rather than something to get you started sooner, even if it means some re-work if you want to upgrade to 2.0 later? |
Disabling some of the existing ontologies alone has one way of doing it that should be considerably simpler - with the downside that the string matching facilities of Kazu will still have the disabled ontology 'baked in' (which will take up memory, but shouldn't affect compute much), unless the model pack was rebuilt. Is this something you're interested in, or mainly the adding of additional ontologies, and therefore building a custom model pack? |
Yes, I can wait until the new model is out, so I am working with the latest version once and for all. Next week is not bad for me. I work day-day in this domain and have built a similar tool for my org, I see common approaches, themes and packages like ahocorasick, but Kazu appears more matured robust to fuzzy matching particularly when terms overlap. So, I am thinking why re-invent the wheel if I can build on and extend Kazu for my local need. |
At some point both I may need to do both. But I will give priority to adding custom kb. |
Sounds good - in which case I think waiting for the new model pack and release is best. Incidentally, Kazu actually uses pyahocorasick "under the hood" for its exact string matching in the MemoryEfficientStringMatchingStep, so it provides functionality on top of it. |
To keep you in the loop, it's taken me a little longer this week to progress the next release, but we're making good progress, should be sometime next week. Sorry for the delay! |
Thanks for the update.
…On Fri, 9 Feb 2024, 09:22 Elliot Ford, ***@***.***> wrote:
To keep you in the loop, it's taken me a little longer this week to
progress the next release, but we're making good progress, should be
sometime next week. Sorry for the delay!
—
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC44JD5MFLWLHEAXYOERERTYSXTGXAVCNFSM6AAAAABCWOPDUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGU4DIMBZGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I am new to Kazu and quite fascinated by it, but I want to find out if Kazu is flexible to the point that a developer can bring an additional (or custom) ontology/knowledgebase in addition to what's already in use for a certain entity or even diasble what's built into Kazu just to use a different one?
The text was updated successfully, but these errors were encountered: