Use Open Source LLMs in AutoGen without GPU/CPU Resources

A Guide to Call Inference of Open Source LLMs on FireworksAI

Jan 04, 2024

∙ Paid

Building on the previous exploration of integrating open-source Large Language Models with the AutoGen framework, I now turn my attention to an exciting new development: Fireworks AI’s inference API. This powerful platform is great for streamlining your access to open-source LLMs, providing a cost-effective and resource-free solution for your LLM applications. This guide will focus on quickly equipping you with the knowledge to tap into inference API that promises to enhance your AutoGen projects while keeping resource consumption and usage expenses to a minimum.

Join me as we take a closer look at this implementation, which is quite simple.

AutoGen Framework

Should you not be acquainted with my earlier tutorial content of the AutoGen framework, allow me to give you a brief overview of AutoGen. Developed by Microsoft, AutoGen is an LLM development framework designed to facilitate the creation of automated conversations amongst multiple AI agents, which enables them to handle complicated tasks. It was praised for customizable LLM-powered agents and seamless integration with human input.

I like the two features of AutoGen most. First, it enables multiple conversation patterns specifically crafted for a variety of conversation scenarios, including group, collaborative, and 1-on-1 chats. In addition, AutoGen offers a large number of application templates within its code repository, giving users the convenience to customize and embed those into their projects. The framework’s second feature is its integration with various prompting tools such as the OpenAI Assistant, RAG, and function calls. This integration equips the LLM-based agents with a larger knowledge base and tools, thereby expanding their generation capabilities.

In my last tutorial, I introduced the way to use locally deployed open-source models to enable text generation for these assistant agents. It’s a good step to reduce the token cost by using the GPT models for long conversations among the agents. However, you must have a decent GPU with enough computational resources while facilitating the models, or subscribe to a Colab Pro to use models under the 7B parameters scale.

So here comes the FireworksAI which meets the requirements that developers want to use a well-performed open-source LLM but don’t want to pay a lot for tokens.

Fireworks AI

Fireworks AI builds its service on a generative AI API platform, an innovative interface for inference of open-source LLMs with a focus on speed, affordability, and customization. It extends efficient LLM customization via PEFT and foundational models like LLaMA-2, bolstered by their PyTorch expertise for improved throughput and latency, and finally leading to substantial cost reductions. Compared to the cheapest OpenAI model gpt-3.5-turbo-1106 which costs you $1/1M input and $2/1M output tokens, choosing a <16B model from Fireworks will save much if (some of) your AutoGen agents do not need much generative capabilities. If you choose competitive models better than GPT-3.5 or even equivalent to GPT-4 for certain types of tasks or languages, you will not pay more than GPT-3.5.

With a development tier with 1 dollar free credit, compatible APIs, and tools for seamless integration, including partnerships like LangChain, the platform encourages community collaboration through a shared model repository and open-source fine-tuning cookbooks.

You will find their supported official and custom models here. You can click the name of each model to play with them in the playground for free after you sign in with Google authentication.

AutoGen + FireworksAI

Now let’s see how easy it is to integrate FireworksAI API into the AutoGen project. It’s super easy.

The key initiative of this integration is that FireworksAI APIs are compatible with OpenAI APIs so that we don’t have to build a dedicated handler to harmonize the interface and prompt templates.

OpenAI Compatible API

If you are familiar with OpenAI API, the below code will be naturally easy to understand. It demonstrates how to use the API to generate text based on a given prompt. Look at it:

import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)
response = client.completions.create(
  model="accounts/fireworks/models/qwen-72b-chat",
  prompt="Say this is a test",
)
print(response.choices[0].text)

If you use Fireworks’ compatible API, you don’t have to install any special packages, instead, the latest openai package will be enough. The enabler is the base_url in the OpenAI object which indicates the target API endpoint, smartly change it to the inference endpoint https://api.fireworks.ai/inference/v1 with your Fireworks API Key.

Go to your FireworksAI’s user account page, new an API Key and copy it to the api_key.

Provide a model path in the model parameter. If you do not know the exact path name of a model, you can find it in the playground where the exact path is listed in the block of example code.

That’s all for compatible API.

Integrated to AutoGen

Let me still use the MathChat example to demonstrate a simple AutoGen application by using models from Fireworks AI.

Keep reading with a 7-day free trial

Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.