Pay %80 less to OpenAI with the `prompt compression` technique

1 min readJan 25, 2024

As you know, when using a language model, we write a prompt and get a response. However, these prompts are not free. Daily usage might be free, but if, like me, you are writing a program related to this, it is not free. We pay based on the length of the prompt, especially when dealing with methods like RAG, where long texts are input into the prompt.

What if someone could shorten this for us? By shortening, I mean just removing unnecessary parts.

Logic

Instead of sending bulky prompts to the OpenAI and paying more, we can shorten our prompts.
This technique removes the unnecessary tokens from our prompt.
It is significantly cost-effective. We are sending a short prompt but the almost same answer.
The idea behind the shortening is another small-sized language model.

Library

pip install selective-context

Have a look at the focal point

from selective_context import SelectiveContextsc =

SelectiveContext(model_type=’gpt2', lang=’en’)
context2, reduced_content = sc(context_str)
context2, reduced_content = sc(prompt, reduce_ratio = 0.5)

Notebook

Google Colaboratory

Edit description

colab.research.google.com

Watch

Pay %80 less to OpenAI with the `prompt compression` technique

Google Colaboratory

Edit description

Written by Suat ATAN (Ph.D.)