Chat Completion Azure OpenAI Development and Deployment Best Practices

Author

Reads 1.1K

Webpage of ChatGPT, a prototype AI chatbot, is seen on the website of OpenAI, on a smartphone. Examples, capabilities, and limitations are shown.
Credit: pexels.com, Webpage of ChatGPT, a prototype AI chatbot, is seen on the website of OpenAI, on a smartphone. Examples, capabilities, and limitations are shown.

Developing and deploying a chat completion model on Azure OpenAI requires careful planning and execution. To get started, ensure you have a Microsoft Azure account and the OpenAI service enabled.

Choose the right compute instance size for your model, considering factors like memory, storage, and pricing. A larger instance size can handle more complex models but increases costs.

Select the appropriate model architecture and fine-tune it for your specific use case. This involves adjusting hyperparameters and training the model on your dataset.

Setup

To set up Azure OpenAI, you'll first need to create an Azure OpenAI resource in your Azure subscription. This can be done by signing into the Azure portal at https://portal.azure.com.

You'll need to create an Azure OpenAI resource with specific settings, including choosing a region to reduce the risk of quota limits. The listed regions include default quota for the model type(s) used in this exercise.

After creating the resource, wait for deployment to complete, then go to the deployed Azure OpenAI resource in the Azure portal.

Credit: youtube.com, Getting Started with Azure OpenAI and GPT Models in 6-ish Minutes

Once you have your Azure OpenAI resource set up, you can configure the librechat.yaml file. This involves opening the file for editing and populating your Azure OpenAI settings, including specifying API keys, instance names, model groups, and other essential configurations.

Be sure to remove any legacy settings, as the LibreChat server will detect these and remind you. After inputting your settings, save the librechat.yaml file and restart your LibreChat application for the changes to take effect.

Here's a step-by-step guide to configuring the librechat.yaml file:

  1. Open librechat.yaml for editing using your preferred text editor or IDE.
  2. Configure Azure OpenAI settings by specifying API keys, instance names, model groups, and other essential configurations.
  3. Remove any legacy settings, as the LibreChat server will detect these and remind you.
  4. Save your changes to the librechat.yaml file.
  5. Restart your LibreChat application for the changes to take effect.

Model Deployment and Configuration

To deploy a model, you'll start by using the Azure AI Foundry portal, which you can access from the Azure portal. In the Azure AI Foundry portal, select the Deployments page and view your existing model deployments. If you don't already have one, create a new deployment of the gpt-35-turbo-16k model with a rate limit of 5,000 tokens per minute.

You can also deploy the gpt-35-turbo model once the Azure OpenAI service is ready. This will allow you to test it out in the chat playground and use a system message template as an Xbox customer support agent.

The models available to your users are determined by the model groupings specified in your azureOpenAI endpoint config. For example, a configuration that enables gpt-4-vision-preview, gpt-3.5-turbo, and gpt-4-turbo for your users in the order they were defined would look like this:

Model Deployments

Credit: youtube.com, Model deployment and inferencing with Azure Machine Learning | Machine Learning Essentials

Model deployments are a crucial step in making your Azure OpenAI resource available to users. You can determine the list of models available to your users by specifying model groupings in your azureOpenAI endpoint config.

The configuration determines which models are enabled for your users. For example, if you have the following configuration, it would enable gpt-4-vision-preview, gpt-3.5-turbo, and gpt-4-turbo for your users in the order they were defined.

Here's a breakdown of the configuration:

To deploy a model, you can use the Azure AI Foundry portal. From the portal, you can view existing model deployments and create a new deployment of the gpt-35-turbo-16k model with a rate limit of 5,000 tokens per minute. This is more than adequate for completing exercises while leaving capacity for other users.

Using Plugins

To use the Plugins endpoint, you need to configure your Azure OpenAI endpoint settings correctly.

You need a deployment supporting function calling to use the Plugins endpoint with Azure OpenAI.

Set the field "plugins" to true in your Azure OpenAI endpoint config to use Azure models.

The current configuration through librechat.yaml uses the primary model you select from the frontend for Plugin use.

Enabling Auto-Generated Titles

Credit: youtube.com, Hands On Lab SageMaker JumpStart Models Deployment & Validation

Enabling Auto-Generated Titles allows your conversational interface to automatically generate titles for users. This feature is particularly useful for Azure deployments.

To enable titling for Azure, set titleConvo to true. This simple step will kickstart the auto-generated title process.

You can also specify the model to use for titling, with titleModel provided you have configured it in your group(s). The default model is "gpt-3.5-turbo", so you can omit specifying it if you're okay with using this exact model.

Note that if titleConvo is set to true but titleModel isn't configured, the titling process will result in an error and no title will be generated.

Required Fields

To properly integrate Azure OpenAI with LibreChat, specific fields must be accurately configured in your librechat.yaml file. These fields are validated through a combination of custom and environmental variables to ensure the correct setup.

To integrate Azure OpenAI with LibreChat, you'll need to configure specific fields in your librechat.yaml file. This ensures the correct setup through a combination of custom and environmental variables.

Credit: youtube.com, Deploying ML Models in Production: An Overview

The librechat.yaml file requires accurate configuration of specific fields to integrate Azure OpenAI with LibreChat. These fields are validated through a combination of custom and environmental variables.

For more information on each field, see the Azure OpenAI section in the Custom Config Docs. The Custom Config Docs provide detailed requirements for the librechat.yaml file.

Integration and API

To integrate Azure OpenAI with your application, you can use the Azure OpenAI SDK, which provides a consistent API to interact with models from various providers. You can install the SDK by adding the Azure OpenAI package to your code and initializing the client with your Azure OpenAI endpoint and API key.

You can also use a custom connector, such as the AzureOpenAIService connector available on GitHub, to interact with the OpenAI APIs in Power Automate. To use the custom connector, you need to navigate to the Power Automate page, click Custom Connectors, and import the connector from GitHub. You can then create a new connection by providing the Azure OpenAI service instance name and API Key.

Credit: youtube.com, Azure OpenAI - Chat Completion Playground and API

To enable use of assistants with Azure OpenAI, you need to set the assistants field to true at the endpoint level and add the assistants field to groups compatible with Azure's Assistants API integration. This allows you to use assistants with Azure OpenAI, but you need to check the compatible regions and models in the Azure docs and ensure that the version is 2024-02-15-preview or later.

Integrate Service

To integrate the Azure OpenAI service, you need to add the Azure OpenAI SDK library to your code. This is done by replacing a comment in the code file for your preferred language, either C# or Python, with the necessary code to add the library.

The Azure OpenAI SDK library can be added by using the following code: `using Azure.AI.OpenAI;` for C# or `from openai import AzureOpenAI` for Python.

Once the library is added, you can initialize the Azure OpenAI client by replacing a comment in the application code. This involves creating a new instance of the `OpenAIClient` class and passing in the endpoint and API key.

Credit: youtube.com, Introduction to Application Integration and API Management Capabilities

Here are the specific steps to initialize the Azure OpenAI client:

  • For C#, use the following code: `OpenAIClient client = new OpenAIClient(new Uri(oaiEndpoint), new AzureKeyCredential(oaiKey));`
  • For Python, use the following code: `client = AzureOpenAI(azure_endpoint = azure_oai_endpoint, api_key=azure_oai_key, api_version="2024-02-15-preview")`

After initializing the client, you can create a system message to provide context to the model. This involves defining a string that describes your application and its purpose.

Here is an example of a system message: `"I am a hiking enthusiast named Forest who helps people discover hikes in their area. If no area is specified, I will default to near Rainier National Park. I will then provide three suggestions for nearby hikes that vary in length. I will also share an interesting fact about the local nature on the hikes when making a recommendation."`

To send a request to the Azure OpenAI model, you need to build a completion options object. This involves specifying the various parameters for your model, such as messages and temperature.

Credit: youtube.com, APIs for Beginners - How to use an API (Full Course / Tutorial)

Here are the specific steps to build the completion options object:

  • For C#, use the following code: `ChatCompletionsOptions chatCompletionsOptions = new ChatCompletionsOptions() { Messages = { new ChatRequestSystemMessage(systemMessage), new ChatRequestUserMessage(inputText), }, MaxTokens = 400, Temperature = 0.7f, DeploymentName = oaiDeploymentName };`
  • For Python, use the following code: `response = client.chat.completions.create(model=azure_oai_deployment, temperature=0.7, max_tokens=400, messages=[{"role": "system", "content": system_message}, {"role": "user", "content": input_text}])`

Once the completion options object is built, you can send a request to the Azure OpenAI model using the client's `GetChatCompletions` method for C# or the `chat.completions.create` method for Python.

Finally, you need to save the changes to your code file to complete the integration process.

Here is a summary of the steps to integrate the Azure OpenAI service:

Chat Completion API

The Chat Completion API is a powerful tool for generating human-like text based on user input. You can use it in various applications, such as chatbots, virtual assistants, and even language translation software.

To get started, you'll need to integrate the Azure OpenAI service into your application. You can do this by adding the Azure OpenAI SDK library to your code and initializing the client with your Azure API credentials.

For example, in C# you can use the following code to initialize the client:

Credit: youtube.com, OpenAI API With Python Tuorial 1-ChatCompletion API and Completion API

```csharp

OpenAIClient client = new OpenAIClient(new Uri(oaiEndpoint), new AzureKeyCredential(oaiKey));

```

Similarly, in Python you can use:

```python

client = AzureOpenAI(

azure_endpoint=azure_oai_endpoint,

api_key=azure_oai_key,

api_version="2024-02-15-preview"

)

```

Once you have the client initialized, you can use it to send requests to the Azure OpenAI model. You'll need to specify the input text, temperature, and other parameters for the model to generate a response.

For example, in C# you can use the following code to send a request:

```csharp

ChatCompletionsOptions chatCompletionsOptions = new ChatCompletionsOptions()

{

Messages =

{

new ChatRequestSystemMessage(systemMessage),

new ChatRequestUserMessage(inputText),

},

MaxTokens = 400,

Temperature = 0.7f,

DeploymentName = oaiDeploymentName

};

ChatCompletions response = client.GetChatCompletions(chatCompletionsOptions);

```

Similarly, in Python you can use:

```python

response = client.chat.completions.create(

model=azure_oai_deployment,

temperature=0.7,

max_tokens=400,

messages=[

{"role": "system", "content": system_message},

{"role": "user", "content": input_text}

]

)

```

The Chat Completion API can also be used in Power Virtual Agents, where you can create a custom connector to use the API. To do this, you'll need to create a new chatbot and add a topic with the name "xbox" and phrases like "xbox" and "gaming". Then, you can add an Ask a question node to get the user response and store it in a variable.

Credit: youtube.com, OpenAI API for Beginners: Creating chat completion

Here's a summary of the steps to integrate the Chat Completion API with Power Virtual Agents:

1. Create a new chatbot

2. Create a new topic with the name "xbox"

3. Add phrases like "xbox" and "gaming"

4. Add an Ask a question node to get the user response and store it in a variable

5. Add a node to use the OpenAI custom connector

6. Configure the chat completion API

7. Return the answer from the Chat completion API

By following these steps, you can integrate the Chat Completion API with Power Virtual Agents and create a more conversational and interactive experience for your users.

Testing and Maintenance

Testing and maintenance are crucial for ensuring the accuracy and reliability of chat completion models like Azure OpenAI.

To start, you'll want to regularly evaluate the performance of your model by checking its accuracy and completion rates. This can be done using metrics such as F1 score and completion rate.

This process is especially important when integrating Azure OpenAI with other systems, as any errors or inconsistencies can have a ripple effect.

It's also essential to update your model periodically to ensure it stays up-to-date with changing language patterns and user behaviors.

Test Your Application

Credit: youtube.com, Maintenance Testing | Impact Analysis | Testing in Nutshell | Neeraj Kumar Singh

To test your application, run it in the interactive terminal pane, ensuring the folder context is set to the folder for your preferred language. Enter the command to run the application, and when prompted, enter the text "What hike should I do near Rainier?".

You can use the Maximize panel size icon in the terminal toolbar to see more of the console text. Observe the output, taking note that the response follows the guidelines provided in the system message you added to the messages array.

The response may vary even when provided the same text, due to the increased randomness. You can run it several times to see how the output may change.

Try using different values for your temperature with the same input. For example, change the temperature parameter value in your request to 1.0 and save the file. Then, run the application again using the prompts above, and observe the output.

Credit: youtube.com, ISTQB Foundation Level| CH#2: Testing Throughout SDLC |Topic 2.4: Maintenance Testing | Video 17 |

Here's a step-by-step guide to testing your application:

  1. Run the application and enter the text "What hike should I do near Rainier?"
  2. Observe the output and take note of the response.
  3. Change the temperature parameter value to 1.0 and save the file.
  4. Run the application again using the prompts above.
  5. Observe the output and compare it to the previous response.

Maintain Conversation History

Maintaining conversation history is crucial for realistic interactions with an AI agent. By providing a history of the conversation in your prompt, you enable the AI model to reference past messages.

To achieve this, you need to add the previous prompt and response to the future prompt you're sending. This involves modifying the code to store conversation history. In C#, you can do this by initializing a list of chat request messages and adding the previous prompt and response to it.

Similarly, in Python, you can initialize an array to store the conversation history. The key is to append the previous input and response to the prompt array, which allows the model to understand the history of your conversation.

By doing so, you'll get a response that references the previous conversation, providing a much more realistic conversation experience. This is particularly useful when asking follow-up questions that require context from previous answers.

Credit: youtube.com, How to Delete All Chat History on Gemini

To control the number of required tokens, you can limit the length of the history to the most recent inputs and responses. This is especially important in production uses, where the token count is limited to 1200. By controlling the conversation history, you can ensure that your application runs smoothly and efficiently.

Advanced Features and Settings

You can unlock more functionality in your Azure OpenAI chat completion by tweaking certain settings.

The `titleModel` setting allows you to specify the model used for generating conversation titles, with options like `gpt-3.5-turbo` or `current_model`.

To enable conversation summarization, set the `summarize` option to `true`. This feature is disabled by default.

You can also customize the model used for generating conversation summaries with the `summaryModel` setting.

Conversational title generation is another feature that can be enabled with the `titleConvo` option, which is disabled by default.

The `titleMethod` setting determines the method used for generating conversation titles, with options like `"completion"` or `"functions"`.

Here are the Azure OpenAI settings at a glance:

Security and Limits

Credit: youtube.com, Azure OpenAI Service - Rate Limiting, Quotas, and throughput optimization

Azure OpenAI Service has a feature called data zones, which are essentially data sovereignty boundaries that can be used to ease load balancing. This allows for a single instance to be deployed with a data zone deployment within a single subscription.

You can deploy a single instance with a data zone deployment within a single subscription, and as you hit the cap for TPM/RPM within that subscription, you can repeat the process with a new subscription and load balance across the two.

Azure OpenAI Service also has a load testing feature that can be used to test the performance of the service. This feature can be used to identify any bottlenecks or issues with the service.

To handle rate limiting, Azure OpenAI Service has a token-based rate limiting feature that can be used to limit the number of requests that can be made to the service. This feature can be used to prevent abuse or excessive usage of the service.

Credit: youtube.com, Azure OpenAI On Your Data and Chat Completions

Here are some key features of the token-based rate limiting feature:

  • Token-based rate limiting can be used to limit the number of requests that can be made to the service.
  • The feature can be used to prevent abuse or excessive usage of the service.
  • It can be used in conjunction with Azure API Management (APIM) to track token usage.

Azure OpenAI Service also has a feature called Azure API Management (APIM) that can be used to track token usage and provide insights into the usage of the service. This feature can be used to identify any issues or bottlenecks with the service.

The service also has a feature called streaming chat completions that can be used to stream chat completions in real-time. However, logging streaming chat completions can be a challenge.

Example and SDK

To get started with Azure OpenAI, you'll need to configure your settings. A comprehensive configuration might include endpoint-level, group-level, and model-level configurations. You can reference the Azure OpenAI Endpoint Configuration Docs for a working example.

To integrate Azure OpenAI with Portkey, you'll need to request access, create a resource, deploy a model, and select your foundation model. You can find the API version and API key in the Azure OpenAI studio under the "View code" UI element.

Credit: youtube.com, How to use Microsoft Azure AI Studio and Azure OpenAI models

Here are the steps to integrate Portkey with Azure OpenAI:

  1. Request access to Azure OpenAI
  2. Create a resource in the Azure portal
  3. Deploy a model in Azure OpenAI Studio
  4. Select your Foundation Model

You can also add metadata to your requests, gateway configs to your Azure OpenAI requests, and tracing to your requests. Additionally, you can set up a fallback from OpenAI to Azure OpenAI APIs.

Example

A comprehensive configuration for Azure OpenAI Endpoint is a complex task, but it's achievable with the right settings.

Here's a breakdown of the different levels of configuration you'll need to consider: Endpoint-level config, Group-level config, and Model-level config.

You'll need to start with the Endpoint-level config, which includes settings like Authentication and Azure OpenAI Instance.

The Group-level config is where you'll set up settings like Model Identification.

A working example of a comprehensive configuration might look like this:

  1. Endpoint-level config
  2. Group-level config
  3. Model-level config

This configuration includes all the necessary settings to get started with Azure OpenAI Endpoint.

Sdk

To integrate Azure OpenAI with Portkey SDK, you'll need to follow these steps. First, add metadata to your requests, which involves including information about the request, such as the user's input.

Credit: youtube.com, What is an SDK

Adding gateway configs to your Azure OpenAI requests is also crucial, as it helps manage how your requests are routed. This involves configuring settings for your requests, such as the API endpoint and authentication.

Tracing Azure OpenAI requests is another important aspect of SDK integration. This allows you to monitor and debug your requests, making it easier to identify any issues.

To set up a fallback from OpenAI to Azure OpenAI APIs, you'll need to configure your Portkey SDK settings to use Azure OpenAI as a fallback in case of errors or failures.

Here's a summary of the steps to integrate Azure OpenAI with Portkey SDK:

Frequently Asked Questions

What is the difference between chat and completions in OpenAI?

The Chat API and Completions API differ in their approach to generating responses, with the Chat API offering advanced features and context management, while the Completions API prioritizes simplicity and customization. If you need a straightforward response, the Completions API is a more efficient choice.

Does Azure OpenAI use ChatGPT?

Azure OpenAI does not use ChatGPT, but rather offers its own conversational AI services. If you're interested in learning more about Azure OpenAI, check out our services and features.

Rosemary Boyer

Writer

Rosemary Boyer is a skilled writer with a passion for crafting engaging and informative content. With a focus on technical and educational topics, she has established herself as a reliable voice in the industry. Her writing has been featured in a variety of publications, covering subjects such as CSS Precedence, where she breaks down complex concepts into clear and concise language.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.