Language programs #11
Replies: 6 comments
-
This was a delightful read. Not sure which format you suggest for adding questions here (used to gdocs) - adding a new comment for each chunk of questions as they come to mind? |
Beta Was this translation helpful? Give feedback.
-
Yes this works, you can use the quote reply feature for context. |
Beta Was this translation helpful? Give feedback.
-
Left general comments. One particular area I'd like to understand better is the interfaces which get closer to "prompt engineering" or templating for a particular LM call. Thinking few-shot learning, instruction prompting, chain of thought. This interface is indeed "higher level", and so may be outside of the scope of what you wanted to align on first!
So seems like the lm.Normal() object is the next-token RV which can be sampled?
So, the generator then fixes the sampling procedure to generate sequences? Hence, OpenAI we might have params like temperature, top_p, n, best_of, presence_penalty, frequency_penalty, logit_bias in here). Makes perfect sense that we can view this as a program transformation - perhaps viewed as a PDF over character sequence space.
nit: noticed same comment
Embeddings based memory might be one to flesh out for this composition flow. Powerful. Others in SK, langchain, and our good friend the cookbook may be exercises for the reader.
Other section we could think about is the corresponding interface for auto-optimization of programs, particularly in context of prompt engineering. I caught this recently.
Powerful.
Yeah I think this direction is pretty interesting. It should pair nicely with UQ methods that attempt to quantify a decoupled epistemic uncertainty.
Would be super curious how this would look for one of the visual-chatGPT or ViperGPT examples.
|
Beta Was this translation helpful? Give feedback.
-
#1 defines what would be the equivalent of
In my current thinking the RV represents the sequence (i.e. everything before an
If you're thinking about storing and retrieving from vector stores, this could be a simple Op, as discussed in #6.
Yes, this is the "just adding stuff" part of the development, where adding new features is a breeze. These could even be defined in externals libraries by users should they need to.
Agreed. I'd really like see what these guys do; my guess is just pattern-matching. Can be implemented as a program rewrite if that's the case.
This is exactly what David Dohan was after. He was kind of stuck because he is not seeing decoding methods as program transformations yet.
You could try to reproduce their results by implementing the operators they have in their toy library, send the definition of the API to GPT as the beginning of the prompt and append the question, execute the returned code. |
Beta Was this translation helpful? Give feedback.
-
This all makes sense to me, and I find to be very compelling. Especially around program transformations. Where would you think to start, an end-to-end example? Perhaps the buildt use case, or something simpler from one of the aforementioned chaining libs? |
Beta Was this translation helpful? Give feedback.
-
An example similar to this one (without constraints to begin with) that plugs in a diffuser model for the hype: import txt
llm = txt.lm.Normal()
def meta_prompt(question):
prompt = txt.prompt("""{{ question }}
I believe the best person to answer this question is {{ expert }}.
Indeed, {{ expert }} addressed this question: {{ answer }}""")
expert = llm
answer = llm
return prompt(question=question, expert=expert, answer=answer)
out = txt.lm.beam_search(meta_prompt("What is the Earth's diameter?")
out.eval() |
Beta Was this translation helpful? Give feedback.
-
Language models are distributions over sequences
Language model is a distribution over a sequence of tokens. Sampling from a language model returns sequences of tokens that follow the model's distribution. The output of a pre-trained language model parametrized by a prompt$P$ is a random variable:
What would this look like in code? In the following$s_{rv}$ represents a random variable:
Constrained language models
We can further constrain the output of the LM, in which case we are defining a new distribution.
Say we want the sequences to stop after a set of tokens have been found, to start with a set of tokens. The constraints apply to the LM distribution itself:
We can expand these contraints to add more complex validation methods, for example for code-generation tasks (see this, this and this paper for instance). The LQML paper suggests an efficient way to apply these constraints.
An interesting case is when we limit the output of the LM to a finite number of tokens. In this case we define a new random variable we can truly sample from. Syntax is not yet clear in my mind, but I feel we should distinguish this case from the
starts_with
andstops_at
constraints above:Language generators
A language generator is a function that returns a token sequence given an input token sequence. It may be deterministic or stochastic, may or may not be parametrized. The combination of a LLM with a decoding method (argmax, sample, beam search, nucleus sampling, etc.) is a language generator. Decoders can be seen as program transformation, the same way
joint_logprob
is in AePPL: they produce and execution graph that returns a string.Self-consistency is defined in this paper
Language programs
Language programs are Directed Acyclic Graphs that link different LM-distributed random variables together. They are typically applied recursively to an initial prompt that is augmented with the RVs:
In theory, executing this graph with e.g.
prompt.eval()
should return random strings (maybe withancestral_sampling
?). In practice, we often want to get an optimal-ish output. In this case we can transform the graph using the previously-defined operators. Different operators behave in different ways. For instance,argmax
greedily decodes the graph, so this program:is equivalent to this one:
Other program transformations, like
beam_search
, yield different results when they're applied to a whole graph or to individual LM rvs. When applied to a graph with multiple LM calls, the beams used to decode a variable are continued when decoding the next variable, thus trying to find the most likely sequence for the program as a whole (called scripted beam search in the LQML paper. When applied to the LM calls individually the beams are re-initialized after each decoding:Other random variables
Other random variables can be part of a language program. They are not affected by generators, in the sense that an
.eval()
call on the output will consist in first drawing from the random variables' distribution and then decode. Example of random variable:This also applies to
llm(prompt).choose_between(["The", "A", "All"])
types of random variables. Such variables can be used in a context where we want to infer the best few-shot prompts for a given task, for instance.Infer the posterior distribution of values
In a program where we do not apply a generating transformation (such as
beam_search
) to graphs containing LM-distributed random variables likea_rv = model(prompt)
it is not clear how to perform efficient inference, because defining good proposal distributions in this space is non-trivial afaik. It remains nevertheless an open possibility with this implementation.It is however possible to perform simulation-based inference when using one of the generators, thus treating language programs as simulators. We can use humans in the loop to validate the sample, or apparently even use LMs as discriminators.
Use tools (like web search)
Tools are operators that take a sequence as an input and return a sequence (or a list of sequences). They can thus easily be added to the graph:
Here is a survey on augmented language models. We could use web search, API calls, code execution, etc.
We can even add humans in the loop with for instance a
human_input
operator.Multi-modality
Multi-modality is achieved by defining an
ImageVariable
type, and defining operators that act on/return these types. For instance with astable_diffusion
operator:Beta Was this translation helpful? Give feedback.
All reactions