Skip to content

Commit

Permalink
quick fix of retriever format and component's sequential change
Browse files Browse the repository at this point in the history
  • Loading branch information
liyin2015 committed Jul 12, 2024
1 parent 2d6a403 commit 143d2ff
Show file tree
Hide file tree
Showing 5 changed files with 23 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Documentation
on:
push:
branches:
- release # Trigger the workflow when changes are pushed to the release branch
- li # Trigger the workflow when changes are pushed to the release branch

permissions:
contents: write
Expand Down
2 changes: 2 additions & 0 deletions docs/source/tutorials/agent.rst
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,8 @@ The above example will be formated as:
**Subclass ReActAgent**

If you want to customize the agent further, you can subclass the :class:`ReActAgent<components.agent.react.ReActAgent>` and override the methods you want to change.


.. .. figure:: /_static/images/query_1.png
.. :align: center
.. :alt: DataClass
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorials/component.rst
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Using a decorator is an even more convenient way to create a component from a fu

.. code-block:: python
.. @fun_to_component
@fun_to_component
def add_one(x):
return x + 1
Expand All @@ -275,7 +275,7 @@ Let's put the `FunComponent`` and `DocQA`` together in a sequence:

.. code-block:: python
from lightrag.core.component import Sequential
from lightrag.core.container import Sequential
@fun_to_component
def enhance_query(query:str) -> str:
Expand Down Expand Up @@ -318,7 +318,7 @@ The structure of the sequence using ``print(seq)``:

- :class:`core.component.Component`
- :class:`core.component.FunComponent`
- :class:`core.component.Sequential`
- :class:`core.container.Sequential`
- :func:`core.component.fun_to_component`


Expand Down
5 changes: 2 additions & 3 deletions docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Additionally, what shines in LightRAG is that all orchestrator components, like
You can easily make each component work with different models from different providers by switching out the `ModelClient` and its `model_kwargs`.


We will introduce the libraries starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials.
We will introduce the library starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials.
With these building blocks, we will further introduce optimizing, where the optimizer uses building blocks such as Generator for auto-prompting and retriever for dynamic few-shot in-context learning (ICL).

Building
Expand Down Expand Up @@ -126,8 +126,7 @@ Code path: :ref:`lightrag.core<apis-core>`. For abstract classes:
* - :doc:`embedder`
- The component that orchestrates model client (Embedding models in particular) and output processors.
* - :doc:`retriever`
- The base class for all retrievers who in particular retrieve relevant documents from a given database to add **context** to the generator.

- The base class for all retrievers, which in particular retrieve relevant documents from a given database to add *context* to the generator.

Data Pipeline and Storage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
23 changes: 15 additions & 8 deletions docs/source/tutorials/retriever.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ LightRAG library does not prioritize the coverage of integration for the followi

Instead, our design goals are:

1. Representative and valable coverage:
1. Cover representative and valuable retriever methods:

a. High-precision retrieval methods and enabling them to work locally and in-memory so that researchers and developers can build and test more efficiently.
b. Showcase how to work with cloud databases for large-scale data, utilizing their built-in search and filter methods.
Expand Down Expand Up @@ -254,7 +254,7 @@ In this note, we will use the following documents and queries for demonstration:
The first query should retrieve the first and the last document, and the second query should retrieve the second and the third document.

FAISSRetriever
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First, let's do semantic search, here we will use in-memory :class:`FAISSRetriever<components.retriever.faiss_retriever.FAISSRetriever>`.
FAISS retriever takes embeddings which can be ``List[float]`` or ``np.ndarray`` and build an index using FAISS library.
The query can take both embeddings and str formats.
Expand Down Expand Up @@ -334,7 +334,7 @@ In default, the score is a simulated probabity in range ``[0, 1]`` using consine
You can check the retriever for more type of scores.

BM25Retriever
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So the semantic search works pretty well. We will see how :class:`BM25Retriever<components.retriever.bm25_retriever.BM25Retriever>` works in comparison.
We reimplemented the code in [9]_ with one improvement: instead of using ``text.split(" ")``, we use tokenizer to split the text. Here is a comparison of how they different:

Expand Down Expand Up @@ -408,7 +408,8 @@ This time the retrieval gives us the right answer.
[RetrieverOutput(doc_indices=[2, 1], doc_scores=[0.5343238380789569, 0.4568096570283078], query='solar panels?', documents=None)]
Reranker as Retriever
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Semantic search works well, and reranker basd on mostly `cross-encoder` model is supposed to work even better.
We have integrated two rerankers: ``BAAI/bge-reranker-base`` [10]_ hosted on ``transformers`` and rerankers provided by ``Cohere`` [11]_.
These models follow the ``ModelClient`` protocol and are directly accessible as retriever from :class:`RerankerRetriever<components.retriever.reranker_retriever.RerankerRetriever>`.
Expand Down Expand Up @@ -518,7 +519,8 @@ Also, if we use both the `title` and `content`, it will also got the right respo
LLM as Retriever
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are differen ways to use LLM as a retriever:
Expand Down Expand Up @@ -598,12 +600,16 @@ The response is:
[RetrieverOutput(doc_indices=[1, 2], doc_scores=None, query='How do solar panels impact the environment?', documents=None)]
PostgresRetriever
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Coming soon.

Use Score Threshold instead of top_k
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In some cases, when the retriever has a computed score and you might prefer to use the score instead of ``top_k`` to filter out the relevant documents.
To do so, you can simplify set the ``top_k`` to the full size of the documents and use a post-processing step or a component(to chain with the retriever) to filter out the documents with the score below the threshold.

Expand All @@ -613,7 +619,8 @@ Use together with Database
When the scale of data is large, we will use a database to store the computed embeddings and indexes from the documents.

With LocalDB
^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have previously computed embeddings, now let us :class:`LocalDB<core.db.LocalDB>` to help with the persistence.
(Although you can totally persist them yourself such as using pickle).
Additionally, ``LocalDB`` help us keep track of our initial documents and its transformed documents.
Expand Down

0 comments on commit 143d2ff

Please sign in to comment.