Deploy Hugging Face gated models on Microsoft Foundry

Some Hugging Face models on Microsoft Foundry are gated, meaning you must request access from the model publisher on the Hugging Face Hub before you can deploy them on Foundry. The access request is then tied to your Hugging Face identity and allows you to download the model weights from the Hub.

This guide walks you through how to deploy a Hugging Face gated model on Microsoft Foundry.

0. Pre-requisites

To run the guide, you will need to comply with the following pre-requisites:

An Azure account with an active subscription.
The Azure CLI installed and logged in.
The Azure Machine Learning extension for the Azure CLI.
An Azure Resource Group.
A Hub-based project on Microsoft Foundry (classic, i.e., Azure AI Foundry Hub-based project).

For more information, please go through the steps in Configure Azure Machine Learning and Microsoft Foundry.

In this example, the Azure Machine Learning SDK for Python will be used to create the endpoint and the deployment. Along with it, you will also need to install azure-identity to authenticate with your Azure credentials via Python.

pip install azure-ai-ml azure-identity --upgrade --quiet

Then, setting the following environment variables is recommended as those will be used along the example for the Azure ML Client, so make sure to update and set those values accordingly.

env LOCATION eastus
env SUBSCRIPTION_ID <YOUR_SUBSCRIPTION_ID>
env RESOURCE_GROUP <YOUR_RESOURCE_GROUP>
env WORKSPACE_NAME <YOUR_WORKSPACE_NAME>

You also need to define both the endpoint and deployment names. Those will be used throughout the example.

Endpoint names must to be globally unique per region i.e., even if you don’t have any endpoint named that way running under your subscription, if the name is reserved by another Azure customer, then you won’t be able to use the same name. Adding a timestamp or a custom identifier is recommended to prevent running into HTTP 400 validation issues when trying to deploy an endpoint with an already locked / reserved name. Also the endpoint name must be between 3 and 32 characters long.

import os
from uuid import uuid4

os.environ["ENDPOINT_NAME"] = f"endpoint-{str(uuid4())[:8]}"
os.environ["DEPLOYMENT_NAME"] = f"deployment-{str(uuid4())[:8]}"

1. Identify a gated model in the catalog

When you select a model from the Microsoft Foundry catalog under the Hugging Face collection, if the model is gated, you will see a banner at the top mentioning a Gated Model Access is required, as well as the isGated property set.

2. Request access on Hugging Face

In the model page for the gated model on Microsoft Foundry you will see a link to the model on the Hugging Face Hub. If you then go to the Hub, you can Request access / Agree and access repository action (wording varies by model). Finally, you need to wait for approval from the model authors or organization in charge.

Note that some models approve instantly; others require review that can take up to a day.

3. Create a Hugging Face Token

You can manage your access tokens in your Hugging Face account settings. Create a fine-grained token with Read access to contents of all public gated repositories you can access selected here.

For organizations seeking stronger oversight on their user’s token, Hugging Face Team and Enterprise Plans offer enhanced token governance capabilities to companies.

4. Create an Azure Machine Learning Connection with Custom keys

Create a Custom keys workspace connection with your Hugging Face Token. This connection is how the deployment process authenticates to the Hugging Face Hub to validate access and download the gated model.

az ml connection create \
    --name HuggingFaceTokenConnection \
    --type "Generic" \
    --resource-group $RESOURCE_GROUP \
    --workspace-name $WORKSPACE_NAME \
    --secret "HF_TOKEN=<YOUR_HF_TOKEN_HERE>"

5. Create the Managed Online Endpoint with secret-store access enabled

First you need to authenticate into the Microsoft Foundry via Azure Machine Learning with the Python SDK:

import os
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.getenv("SUBSCRIPTION_ID"),
    resource_group_name=os.getenv("RESOURCE_GROUP"),
    workspace_name=os.getenv("WORKSPACE_NAME"),
)

Then, when creating the Managed Online Endpoint for the deployment, ensure the endpoint is configured to allow access to default secret stores. this is what allows the Managed Online Endpoint in Azure Machine Learning to read the Connection Secret. It can only be done programmatically for now.

from azure.ai.ml.entities import ManagedOnlineEndpoint

endpoint = ManagedOnlineEndpoint(
    name=os.getenv("ENDPOINT_NAME"),
    properties={"enforce_access_to_default_secret_stores": "enabled"},
)

client.begin_create_or_update(endpoint).wait()

This allows the endpoint deployment to read the secret connection value (your token) at deployment time.

6. Create the Managed Online Deployment

After creating the endpoint, you need to create the ManagedOnlineDeployment via the Azure ML Python SDK i.e., create a deployment linked to the given endpoint.

To create the Azure Machine Learning Managed Online Endpoint you don’t need to provide the model ID on the Hugging Face Hub but rather the model URI on Azure Machine Learning formatted as follows:

model_id = "facebook/sam3" # example gated model

model_uri = f"azureml://registries/HuggingFace/models/{model_id.replace('/', '-').replace('_', '-').lower()}/labels/latest"
model_uri

Make sure to select the instance type recommended for the model you want to deploy. You can find it in the model catalog deploy page.

from azure.ai.ml.entities import ManagedOnlineDeployment

deployment = ManagedOnlineDeployment(
    name=os.getenv("DEPLOYMENT_NAME"),
    endpoint_name=os.getenv("ENDPOINT_NAME"),
    model=model_uri,
    instance_type="Standard_NC40ads_H100_v5",
    instance_count=1,
)

client.online_deployments.begin_create_or_update(deployment).wait()

That’s it, you deployed your first Hugging Face gated model on Microsoft Foundry!

Update on GitHub

Microsoft Azure