Basic html loader with crawly #22

warnero · 2023-10-20T22:27:33Z

Added Document and DocumentLoader Behaviours
Added Crawly DocumentLoader

* adding readme function overview image * updated image * fixing image alt text * centering image? * reduced image * tweak image url for cache busting * rename image to update * chat chain logo image * added logo image * logo-32px * don't commit graphic sources * graphic updates * fixed spelling * cleanup the configuration README * make chatgpt response tests more robust Even when given specific instructions like "Return the response 'Hi'." ChatGPT (and LLMs in general) don't always follow the instructions *exactly* (for example, ChatGPT will often respond to the above prompt with "Hi!"). As a result, equality testing on the response makes for flaky tests. This change keeps the test prompts, but instead matches on the responses with `=~`. Still not perfect, but less likely to be flaky, which in tests seems like a win. * link to demo project * add "update_custom_context" to LLMChain - added tests * add support for setting the `OpenAI-Organization` header in requests to the OpenAI API * set pattern match in `DataExtractionChain` to look for `role: :assistant` as it appears to be the only valid result at this stage * improved the data extraction prompt - didn't consistently handle 'null' values * update readme - add example of openai_org_id config * improve pattern match on data extraction chain * update version * updated changelog * put "Elixir" in the Readme title --------- Co-authored-by: Mark Ericksen <brainlid@gmail.com> Co-authored-by: Ben Swift <ben@benswift.me> Co-authored-by: Adam Mokan <amokan@gmail.com>

warnero · 2023-10-24T22:12:21Z

Hey @brainlid I wanted to split up my work into smaller chunks so I can get it in (and others can play with the blocks/revamp/etc.). How does this one look?

matthusby · 2024-08-24T12:54:09Z

@brainlid I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general?

brainlid · 2024-08-24T15:44:51Z

I think this effort has stalled out. I’m open to new work in this area. What do you need?

…

On Sat, Aug 24, 2024 at 6:54 AM Matt Husby ***@***.***> wrote: @brainlid <https://github.com/brainlid> I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general? — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGFQGDZBYWTO5VAS7BPL3DZTB7ANAVCNFSM6AAAAABNBSKKS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGM4DINRTGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

matthusby · 2024-08-24T16:07:10Z

I am not doing anything too fancy, just planning to pull in some jira tickets and maybe github issues.

My main question is what do you think of using the Document model that is in this PR? I would like to stick to a standard way of doing the document loading etc, at first glance this seems fine - but wanted to make sure I wasn't missing something.

brainlid · 2024-08-25T12:34:48Z

what do you think of using the Document model that is in this PR

I think the Document model was incomplete. The idea was to base it on the TS/Python LangChain Document idea. I'm not using it personally nor do I have any short-term needs for it. However, I'm open to that approach.

warnero and others added 3 commits October 14, 2023 16:52

Document struct and Crawly Document Loader

7352ac7

Merge branch 'main' into feature/basic_html_loader_with_crawly

a679613

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic html loader with crawly #22

Basic html loader with crawly #22

warnero commented Oct 20, 2023

warnero commented Oct 24, 2023

matthusby commented Aug 24, 2024

brainlid commented Aug 24, 2024 via email

matthusby commented Aug 24, 2024

brainlid commented Aug 25, 2024

Basic html loader with crawly #22

Are you sure you want to change the base?

Basic html loader with crawly #22

Conversation

warnero commented Oct 20, 2023

warnero commented Oct 24, 2023

matthusby commented Aug 24, 2024

brainlid commented Aug 24, 2024 via email

matthusby commented Aug 24, 2024

brainlid commented Aug 25, 2024