-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic html loader with crawly #22
base: main
Are you sure you want to change the base?
Basic html loader with crawly #22
Conversation
warnero
commented
Oct 20, 2023
- Added Document and DocumentLoader Behaviours
- Added Crawly DocumentLoader
* adding readme function overview image * updated image * fixing image alt text * centering image? * reduced image * tweak image url for cache busting * rename image to update * chat chain logo image * added logo image * logo-32px * don't commit graphic sources * graphic updates * fixed spelling * cleanup the configuration README * make chatgpt response tests more robust Even when given specific instructions like "Return the response 'Hi'." ChatGPT (and LLMs in general) don't always follow the instructions *exactly* (for example, ChatGPT will often respond to the above prompt with "Hi!"). As a result, equality testing on the response makes for flaky tests. This change keeps the test prompts, but instead matches on the responses with `=~`. Still not perfect, but less likely to be flaky, which in tests seems like a win. * link to demo project * add "update_custom_context" to LLMChain - added tests * add support for setting the `OpenAI-Organization` header in requests to the OpenAI API * set pattern match in `DataExtractionChain` to look for `role: :assistant` as it appears to be the only valid result at this stage * improved the data extraction prompt - didn't consistently handle 'null' values * update readme - add example of openai_org_id config * improve pattern match on data extraction chain * update version * updated changelog * put "Elixir" in the Readme title --------- Co-authored-by: Mark Ericksen <brainlid@gmail.com> Co-authored-by: Ben Swift <ben@benswift.me> Co-authored-by: Adam Mokan <amokan@gmail.com>
Hey @brainlid I wanted to split up my work into smaller chunks so I can get it in (and others can play with the blocks/revamp/etc.). How does this one look? |
@brainlid I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general? |
I think this effort has stalled out. I’m open to new work in this area.
What do you need?
…On Sat, Aug 24, 2024 at 6:54 AM Matt Husby ***@***.***> wrote:
@brainlid <https://github.com/brainlid> I see this has been sitting for a
while. I am planning on doing some data loading from api's soon, and was
wondering if there are plans to integrate this PR or some sort of document
in general?
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGFQGDZBYWTO5VAS7BPL3DZTB7ANAVCNFSM6AAAAABNBSKKS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGM4DINRTGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am not doing anything too fancy, just planning to pull in some jira tickets and maybe github issues. My main question is what do you think of using the Document model that is in this PR? I would like to stick to a standard way of doing the document loading etc, at first glance this seems fine - but wanted to make sure I wasn't missing something. |
I think the Document model was incomplete. The idea was to base it on the TS/Python LangChain Document idea. I'm not using it personally nor do I have any short-term needs for it. However, I'm open to that approach. |