Integrate your applications with Runpod

You can integrate Runpod compute resources with any system that supports custom endpoint configuration. This guide provides an overview of the many different methods for doing so.

Integrate with Serverless

Runpod Serverless endpoints are REST APIs that accept HTTP requests, execute your code, and return the result via HTTP response. Each endpoint provides a unique URL and abstracts away the complexity of managing individual GPUs/CPUs. To integrate with Serverless:

Create a handler function with the code for your application.
Package your worker into a Docker image and push it to a Docker registry.
Deploy a Serverless endpoint using the Runpod console or REST API.
Start sending requests to the endpoint.

For a full walkthrough of how to create and test custom endpoints, see: Build your first worker.

Integrate with Pods

Pods are self-contained compute environments that you can deploy on Runpod. They’re ideal for applications that require a consistent, predictable environment, such as web applications or backend services with a constant workload. There are two primary methods for integrating a Pod with your application:

HTTP proxy

For web-based APIs or UIs, Runpod provides an automated HTTP proxy. Any port you expose as an HTTP port in your template or Pod configuration is accessible via a unique URL. The URL follows this format: https://POD_ID-INTERNAL_PORT.proxy.runpod.net. For example, if your Pod’s ID is abc123xyz and you exposed port 8000, your application would send requests to https://abc123xyz-8000.proxy.runpod.net.

Direct TCP

For protocols that require persistent connections or fall outside of standard HTTP, use the Direct TCP Ports. When you expose a TCP port, Runpod assigns a public IP address and a mapped external port. You can find these details using the GET /pods/POD_ID endpoint or the Pod connection menu in the Runpod console.

Integrate with OpenAI-compatible endpoints

Many external tools and agentic frameworks support OpenAI-compatible endpoints with little-to-no configuration required. Integration is usually straightforward: any library or framework that accepts a custom base URL for API calls will work with Runpod without specialized adapters or connectors. This means you can integrate Runpod with tools like n8n, CrewAI, LangChain, and many others by simply pointing them to your Runpod endpoint URL and providing your Runpod API key for authentication. You can integrate OpenAI-compatible tools with Runpod using any of the following methods:

Public Endpoints

Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They’re vLLM-compatible and return OpenAI-compatible responses, so you can get started quickly without deploying The following Public Endpoint URLs are available for OpenAI-compatible models:

Qwen3 32B AWQ: https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1
IBM Granite-4.0-H-Small: https://api.runpod.ai/v2/granite-4-0-h-small/openai/v1

For more information on the parameters and responses for each model, check the Public Endpoint model reference.

vLLM workers

Serverless vLLM workers are optimized for running large language models and return OpenAI-compatible responses, making them ideal for tools that expect OpenAI’s API format. When you deploy a vLLM worker, you can access it using the OpenAI-compatible API at this base URL:

https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1

Where ENDPOINT_ID is your Serverless endpoint ID.

Some models require specific vLLM environment variables to work with external tools. You may need to set a custom chat template or tool call parser to ensure your model returns responses in the format your integration expects.For example, you can configure the Qwen/qwen3-32b-awq model for OpenAI compatibility by adding these environment variables in your vLLM endpoint settings:

ENABLE_AUTO_TOOL_CHOICE=true
REASONING_PARSER=qwen3
TOOL_CALL_PARSER=hermes

These settings enable automatic tool choice selection and set the right parsers for the Qwen3 model to work with tools that expect OpenAI-formatted responses.For more information about tool calling configuration and available parsers, see the vLLM tool calling documentation.

SGLang workers

SGLang workers also return OpenAI-compatible responses, offering optimized performance for certain model types and use cases.

Load balancing endpoints

Load balancing endpoints let you create custom endpoints where you define your own inputs and outputs. This gives you complete control over the API contract and is ideal when you need custom behavior beyond standard inference patterns.

Model configuration and compatibility

Third-party integrations

For infrastructure management and orchestration, Runpod also integrates with:

dstack: Simplified Pod orchestration for AI/ML workloads.
SkyPilot: Multi-cloud execution framework.
Mods: AI-powered command-line tool.

Get started

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

Integrate your applications with Runpod

Integrate with Serverless

Integrate with Pods

HTTP proxy

Direct TCP

Integrate with OpenAI-compatible endpoints

Public Endpoints

vLLM workers

SGLang workers

Load balancing endpoints

Model configuration and compatibility

Third-party integrations

Get started

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

​Integrate with Serverless

​Integrate with Pods

​HTTP proxy

​Direct TCP

​Integrate with OpenAI-compatible endpoints

​Public Endpoints

​vLLM workers

​SGLang workers

​Load balancing endpoints

​Model configuration and compatibility

​Third-party integrations

Integrate with Serverless

Integrate with Pods

HTTP proxy

Direct TCP

Integrate with OpenAI-compatible endpoints

Public Endpoints

vLLM workers

SGLang workers

Load balancing endpoints

Model configuration and compatibility

Third-party integrations