Pay %80 less to OpenAI with the `prompt compression` technique
As you know, when using a language model, we write a prompt and get a response. However, these prompts are not free. Daily usage might be free, but if, like me, you are writing a program related to this, it is not free. We pay based on the length of the prompt, especially when dealing with methods like RAG, where long texts are input into the prompt.
What if someone could shorten this for us? By shortening, I mean just removing unnecessary parts.
Logic
- Instead of sending bulky prompts to the OpenAI and paying more, we can shorten our prompts.
- This technique removes the unnecessary tokens from our prompt.
- It is significantly cost-effective. We are sending a short prompt but the almost same answer.
- The idea behind the shortening is another small-sized language model.
Library
pip install selective-context
Have a look at the focal point
from selective_context import SelectiveContextsc =
SelectiveContext(model_type=’gpt2', lang=’en’)
context2, reduced_content = sc(context_str)
context2, reduced_content = sc(prompt, reduce_ratio = 0.5)
Notebook
Watch