In short, prompt engineering is tailoring the inputs fed into a large language model (LLM) in order to get a specific response back. Prompt engineering can generate more accurate or complete answers, lead to answers that are formatted in a particular way, or change the tone or style of the text.
Prompt engineering is just about tailoring the input text until the expected response occurs, and it should not be treated as an inscrutable skill. However, the correct deployment of varied prompts is what will enable a range of very different applications to be built from a single multi-purpose language model.
Back to Basics
To understand prompt engineering, we should review a few topics of Large Language Models (LLMs) first. We also have a deep overview in a separate blog post about Transformers and GPT, as well as LLMs in general.
The most important thing to remember: LLMs do not think and do not reason logically.
Models like GPT are trained to predict the next word in a sequence of words (other models vary slightly). With GPT-3 and GPT-4, these models have become very good at predicting the next word in a way that sounds naturally human. Users provide an input (“prompt”) and an answer comes back word by word (“completion”).
When a user gives a “prompt” to GPT, the model is actually just building on that initial block of text and continuing to extend it word by word. There is no digestion of the input question and an answer back, but rather the inputs influence the probabilities of each word in the answer.
Based on the context, it is most likely that the following sides for a burger would appear in text.
So What is Prompting?
Users of ChatGPT already know that they can tune the response of the chat agent simply by choosing to add details to their requests. For instance, we might be able to have the agent respond as Darth Vader, or William Shakespeare.
What happens is simple: providing more text input changes the probabilities of the word that should come next. In fact, compared to past neural networks, one of the key differentiators that transformer models like GPT3 have is their ability to model and encode “long distance” relationships between words. This means the model will not look at just a few preceding words, but “understand” what is relevant to the next word over long context lengths.
Here, the word healthy in a prior sentence will heavily influence the probabilities of what comes next.
But, when boiled down, there’s very little that differs between adding the word “healthy” (which should trigger a change in sides), and changing “burger” to “chow mein” and getting an entirely different set of sides suitable for an American Chinese menu.
In short, every word in the input prompt has an effect on the probability of what comes next.
What does this look like in practice?
We can make any request, but we also recognize we can give the model context. This can be powerful, including requests to return results in various forms.
Some have termed this “zero-shot” prompting. Simply by providing a little bit of context, (“multiplication” and “8-year”), the model is able to give us an answer that makes sense already:
However, without further prompting, asking for a 12-year-old’s math problems yields similar results as the 8-year-old.
However, by offering the model a few examples ahead of time, we are able to generate results that mimic what we’ve already offered.
This has been termed “few-shot” prompting, as we are giving a few cases of examples to the model for it to learn from.
There are infinite ways to tweak prompts, and tailoring the prompt to your own use case is most important. If you know the output you would like to receive and the format that you want it in, trial-and-error is the best way to accomplish the results you want.
Why is this important to us?
Chatbots are a narrow and limited use case for LLMs. For Einblick, we build powerful and user-friendly applications over LLMs–but as any software engineer can tell you, blobs of natural-language text (the output of LLMs) are not friendly to use.
By correctly using prompting, we can ensure that even though the input to LLMs can be chaotic and natural language of any form, we will always ensure that the response is well-formed and predictably able to be consumed by software functions.
Moreover, as we ambitiously create an interpreter for all data science problems, some tasks become complex very quickly (and errors in understanding might occur). By intelligently setting up prompts to correctly break down problems, we can then start automatically solving the class of “all data science problems.”
Specifically, we have designed prompts that can ask the LLMs to reason on how to solve the problem step-by-step, as above. Critically, these steps generated by the LLM can be directly linked to the software functions of Einblick. Therefore, each “thought” the LLM has can be met with a response in reality. As the LLM progresses step-by-step through tasks, it can observe success and pick the next step, or it might see failure and pivot.
This is known as the Reason+Act or ReAct framework. We can cue this by asking the model to “think step-by-step,” or by using a “few-shot” approach to provide examples as part of the prompt. For instance, here’s a simple attempt to ask for instructions to bake a cake step-by-step.
In Conclusion: It’s Real, But Don’t Buy Any Hype
Large Language Models like GPT-4 have the ability to absorb a diverse range of inputs and produce a diverse range of outputs. Underlying all of prompt engineering is the ability of LLMs to encode the relationship between the text in an existence corpus (inputs from the user), and completions that are to be generated. Prompt engineering is just the class of everything we can do to help trigger the intended output as frequently as possible.
For us, as a software company, there will be a need for engineers to start writing LLM-based tests and for quality engineering to also incorporate prompt testing into their workflows. This is not necessarily hiring a “prompt engineer” as a separate role, but rather understanding how to work with LLMs becomes another technical skill for engineers.
Finally, remember that it’s also possible to fine-tune models. For many cases of tailored applications, it might make more sense to provide a few dozen examples to the LLM and create a fine-tuned version. For instance, an industry-specific chatbot might benefit from few-shot fine-tuning rather than having carefully engineered prompts that are long, ungainly, and sometimes still insufficient.
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.