What is summarization / is it working? #2183
Replies: 1 comment 2 replies
-
We are using an adaptation on the "ConversationSummaryBufferMemory" strategy to summarize messages. To learn more about this, see this article: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/ To summarize (lol), the summarization is triggered when the following conditions are met:
This worked well in the age of models with 4-8k context, when this was first implemented, operating within the "efficient" realm as shown in the article. However, this needs to be revisited soon as we are now in the age of ever-increasing context windows (gpt-4-turbo with 128k and anthropic 200k+). That means that we need to get to around 60-100k tokens for summarization to kick in. While this may alleviate costs from using the full context, it's sub-optimal. I would also like to add an option for the user, through the config file, to decide what the summary context window should be, first on an endpoint-level then even on a model level. |
Beta Was this translation helpful? Give feedback.
-
I am using the Azure OpenAI models and have enabled summarization on the endpoint-level. I am not sure, however, where exactly something is being summarized, now that I enabled it. At first I thought this refers to automatic generation of chat titles, but they are all called
New Chat
for me. Where can I find the summarization feature and how can I tell whether it works?Beta Was this translation helpful? Give feedback.
All reactions