Use the Cheapest LLM Inference API to Build a Multi-agent App
A Quick Tutorial for Building LLM Apps Using OpenRouter
Recently, OpenAI has updated the price for GPT-3.5-Turbo
that input prices for the new model are reduced by 50% to $0.5 /M tokens, and output prices are reduced by 25% to $1.5 /M tokens. The continuous drops in the price help most for the multi-agent app developers and researchers as those types of applications cost tokens more than other types.
Unfortunately, the GPT-3.5 is no longer one of the top competitive models regarding various evaluation results and practical usage. So many open-source models and their fine-tuned varieties are released every day claiming they are better choices for certain generation tasks. You must have found the average score of the newly uploaded models on HuggingFace’s LLM leaderboard is insanely high.
For those who are interested in those advanced models but do not own enough local GPU resources to run them, there are several quick approaches. You can search for the model’s space or playground for a quick taste of the chatbot. Or, running the inference code in Google Colab can provide you time-limited access for small models (normally under 7B) for free.
Obviously, these are not good choices when you want to do complicated application tests on a wide range of models. Using a pay-as-you-go service for remote model inferences with fast speed and low cost would be naturally the choice for most developers in the PoC (Proof of Concept) stage.
After searching and comparing low-cost online inference services other than OpenAI-like commercial APIs, I have a good recommendation for you — OpenRouter.
OpenRouter
OpenRouter is a very new platform that serves as an aggregator for AI models, offering both an API and a conversational interface accessible via openrouter.ai. This API enables developers to engage with an array of large language models, image generation models, and 3D model generation tools. Furthermore, developers have the opportunity to showcase their applications to the public through the OpenRouter platform.
From its model list, you will find 30+ popular open-source models and several commercial models like OpenAI, Google Gemini, and Anthropic Claude.
To your surprise, there are a couple of 7B models that are free to use including Mistral-7B-Instruct
, Eagle -7B
, and Zephyr-7B
. The only limitation is the rate limit of 10 API requests per minute.
Most of the paid models are much cheaper than those you can find on the market. For example, Meta’s Llama-2–13B-chat
only costs 0.1474/M for input and output tokens, the hot Mixtral-8x7B-Instruct
only costs 0.27/M, Yi-34B-Chat
costs 0.72/M and CodeLlama-70B-Instruct
costs 0.81/M. Besides the token price, the throughput is also quite acceptable.
The 5 USD credit I purchased seems like it will last for quite a while!
Code Walkthrough
I would like to use OpenAI-compatible API so that I don’t need to change any of my existing code for GPT models to run OpenRouter’s inference.
Let’s write a simple code to test.
Install the latest OpenAI package
pip install openai
2. Create the client
Make sure you use the Openrouter’s base URL and the API Key from your Openrouter account.
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-v1-...",
)
3. Create a generation
Let’s try the free version of the model Mistral-7b-Instruct
to write a blog post.
completion = client.chat.completions.create(
model="mistralai/mistral-7b-instruct:free",
messages=[
{
"role": "user",
"content": "write a blog post in Bohol island including transportation, meals, hotels and activities.",
},
],
)
print(completion.choices[0].message.content)
The exact model name with the path can be found on its Browse page.
The response quality is quite good considering such a small model.
Creating such simple applications lacks a substantial challenge, let us explore the utilization of Openrouter’s API within the framework of multi-agent applications.
AutoGen+OpenRouter
In this AutoGen demo, I will continue our last task which generates a blog post, and add a reviewer agent to provide professional blog review comments.
Keep reading with a 7-day free trial
Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.