Azure OpenAI Service is Here: My First Hands-On Experience

Hey, you’ve probably been hearing about ChatGPT everywhere by now, right? Since November 2022, it feels like every single conversation in tech circles comes back to OpenAI and what GPT can do. I’ve been spending the last few weeks getting my hands dirty with Azure OpenAI Service, and I wanted to share what I’ve found so far.

Quick background if you’re just catching up: Azure OpenAI Service went generally available in January 2023, giving you access to OpenAI’s models (GPT-3.5 Turbo, Codex, DALL-E, and embeddings) through Azure’s infrastructure. And just yesterday, March 14, GPT-4 was announced. I’m still trying to catch up with everything that’s happening.

Why Azure OpenAI Instead of Just Using OpenAI Directly?

This is the first question everyone asks me, so let me get it out of the way.

At Spektra Systems, we run CloudLabs, which is our hands-on lab platform for tech training. We deal with enterprise customers daily. And enterprise customers care about things like data residency, private networking, RBAC, and compliance certifications. That’s where Azure OpenAI makes a real difference over using the OpenAI API directly.

With Azure OpenAI, your data stays within your Azure subscription. You get VNet integration, managed identity support, and all the Azure compliance certifications (SOC 2, ISO 27001, HIPAA, you name it). Your prompts and completions aren’t used to train OpenAI’s models either. For us, these aren’t nice-to-haves. They’re requirements.

Solo developer building a side project? The OpenAI API directly is probably fine. But for enterprise work, I’d recommend Azure OpenAI without hesitation.

Getting Access and Setting Things Up

Here’s the thing: you can’t just go create an Azure OpenAI resource right away. As of today, you need to apply for access through a form on Microsoft’s website. I applied back in early January and got approved in about a week. I’ve heard some folks waiting longer though, so apply early if you’re interested.

Once you’re approved, the setup is pretty straightforward:

Create an Azure OpenAI resource in the Azure Portal (it’s under "AI + Machine Learning" in the marketplace)
Pick your region. As of March 2023, it’s available in East US, South Central US, and West Europe. Limited regions for now.
Once the resource is created, head over to Azure OpenAI Studio at oai.azure.com

The Studio is a nice web UI where you can deploy models and test them in a playground. I spent a good couple of hours just playing around in there before writing any code.

Deploying Your First Model

In Azure OpenAI, you don’t just call a model name directly. You create a "deployment" of a specific model, give it a name, and then call that deployment. This confused me at first.

To deploy GPT-3.5 Turbo:

Go to Azure OpenAI Studio
Click on Deployments
Create new deployment
Select gpt-35-turbo as the model (note: Azure uses a dash instead of a dot in model names)
Give it a deployment name like "my-gpt35"
Set your tokens-per-minute rate limit

That last part matters. Azure OpenAI uses a quota system based on Tokens Per Minute (TPM). You allocate TPM across your deployments. I started with 120K TPM, which was more than enough for testing.

My First API Call

Here’s the Python code I used for my first call:

import openai

openai.api_type = "azure"
openai.api_base = "https://YOUR-RESOURCE.openai.azure.com/"
openai.api_version = "2023-03-15-preview"
openai.api_key = "YOUR-KEY-HERE"

response = openai.ChatCompletion.create(
    engine="my-gpt35",  # your deployment name, not the model name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what Azure Arc does in two sentences."}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)

Nothing too scary, right? The main gotcha is that you use engine instead of model, and you pass your deployment name, not the model name. I tripped over that for about 20 minutes before figuring it out.

The response came back in about 1.5 seconds and was actually really good. I immediately started thinking about all the ways we could use this at Spektra.

What About Pricing?

Okay, let’s talk money because token costs add up fast. Watch your bills.

As of March 2023, GPT-3.5 Turbo pricing on Azure OpenAI is:

$0.002 per 1K tokens (both input and output)

That’s the same as OpenAI’s direct pricing, which is nice. No Azure markup.

For the older models like text-davinci-003, you’re looking at $0.02 per 1K tokens, which is 10x more expensive. GPT-3.5 Turbo is the obvious choice for most use cases right now.

I ran about 500 test calls over a weekend and my total cost was under $2. Pretty affordable for experimenting.

Use Cases I’m Excited About

I’ve been jotting down ideas since I started playing with this. Here’s what I’m thinking about for our work at Spektra and CloudLabs:

Generating lab instructions is the big one. We create hundreds of hands-on lab guides every year. Right now that’s a manual process. I tested GPT-3.5 Turbo with some of our existing lab content as context, and it did a surprisingly decent job at generating step-by-step instructions for Azure services. It’s not going to replace our content authors, but it could cut the first-draft time in half.

I’m also looking at building a smarter support bot for CloudLabs. When someone gets stuck in a lab, they usually need help with a specific Azure error message or a step that didn’t work. A GPT-powered bot that understands the lab context could be really helpful here.

Code generation is interesting too. I tested GPT-3.5 Turbo with a prompt like "Generate a Bicep template for an AKS cluster with 3 nodes and Azure CNI networking" and the output was about 85% correct. Needed some tweaking, but saved time.

Some of the Gotchas

It’s not all perfect. Here are a few things I ran into:

Regional availability is limited. If your other resources live in a region without Azure OpenAI, expect some latency on cross-region API calls. Not a dealbreaker, but worth planning for.

Content filtering is on by default and you can’t turn it off. Azure has a built-in moderation layer that checks both prompts and responses. For enterprise use this is actually a good thing, but I could see it being annoying if you keep hitting the filter on creative prompts.

The quota limits can be tight. If you’re planning to build anything that handles concurrent users, you’ll need to think carefully about your TPM allocation and possibly request a quota increase. The default quotas are fine for development but won’t cut it for production workloads.

And the models sometimes confidently generate wrong information. I caught GPT-3.5 Turbo giving incorrect Azure CLI commands a few times. Always verify the output, especially for technical content. Don’t just copy-paste into production.

What’s Next

GPT-4 just dropped yesterday and I can’t wait to try it out. From what I’ve seen so far, it’s supposed to be much better at reasoning and accuracy, which would directly help with the code generation and lab content use cases I mentioned.

I’m planning a follow-up post once I get access to GPT-4 on Azure. Specifically, I want to test how much better it handles ARM template generation and whether it makes fewer factual errors about Azure services.

If you’re working with Azure and haven’t applied for Azure OpenAI access yet, go do it now. Even without a specific use case in mind, the playground alone is worth a few hours of your time.

Happy building, folks!

Amit

Assisted by AI during writing