Supercharge Your OpenAI API Requests

Batch, schedule, and automate requests effortlessly. Ideal for serverless apps and streamlined job scheduling.

Batch Requests

Efficiently process large amounts of requests to LLM APIs in one go, saving time and resources. Designed to scale to your demand - run a single job at the time, or schedule thousands of them, we'll take care of that.

Standard API

Our APIs are designed to receive requests in the same format as OpenAI and Mistral APIs. You don't have to worry about a different syntax.

Job Scheduling

Schedule and automate your jobs without worrying about their status. We ensure deliverability of your prompts, and retry failed jobs. Receive updates via webhooks (server-side) or WebSockets (client-side).

Serverless Friendly

Designed for serverless applications that don't need to wait for OpenAI API responses. Keep your workflows streamlined, and avoid paying for requests spent waiting for GPT output.

POST https://api.efflo.ai/batch HTTP/1.1

{
  "model": "gpt-3.5-turbo",
  "webhook": "https://your-app.com/api/webhook",
  "jobs": [
    {
      "messages": [{ "role": "user", "content": "Who's been the president of the US in 2005?" }],
      "meta": { "clientId": 1 },
    },
    {
      "messages": [{ "role": "user", "content": "What's the biggest mountain in Peru?" }],
      "meta": { "clientId": 2 },
    },
  ],
}

HTTP/1.1 200 OK

{
  "id": "7GItCmBUP9",
  "jobs": [
    {
      "id": "fTGtwwdE48",
      "meta": { "clientId": 1 },
      "status": "PENDING"
    },
    {
      "id": "wbruqBpWtz",
      "meta": { "clientId": 2 },
      "status": "PENDING"
    }
  ]
}

Features

Schedule multiple generative tasks in a single, fast request without waiting for the final result 🚀
Attach custom key-value attributes for each task using meta object
Receive responses on the provided webhook without having to keep an active connection to the OpenAI APIs
Automatic job retries whenever there's a network failure or API error
Optionally, automatically retry job with a higher model when token length has been exceeded
Control batch status and list jobs in progress using simple REST APIs
Subscribe to a websocket endpoint for the provided batch ID in order to receive real-time batch updates on the client side
Coming soon! Usage dashboard and cost analysis
Coming soon! Rate limiting with meta attributes

How It Works

Batch Your Requests

Initiate the batching process by grouping multiple requests. Optimize processing and reduce response times. Supports OpenAI and Mistral APIs, with more to come.

Schedule Jobs

Utilize the scheduling feature to automate your tasks. Specify the frequency and timing of jobs for hands-free operation.

Receive Updates

Stay informed with real-time updates on job status. Webhooks and websockets ensure prompt notifications for quick decision-making.

Get Results

Retrieve the results of your batched requests efficiently. Easily access and integrate processed data into your applications.

Use Cases

Content Generation at Scale

Effortlessly generate large volumes of content using OpenAI API, leveraging GPT-3.5 or GPT-4. Schedule content creation jobs and receive the results through webhooks, making it easy to manage content pipelines.

Processing large amounts of text

You can leverage Efflo.ai to analyze large amounts of text, while avoiding exceeding context length issues. All of that without maintaining your own queue and scheduling systems!

Optimized Data Analysis

Batch requests for data analysis tasks and schedule recurring jobs for periodic insights. Streamline your data processing workflows and receive updates on job status in real-time.

Easily switch between API providers

Do you want to easily try out large language models from Mistral, or OpenAI API, and compare their results for your usecase? Just bring your API key and it's just a matter of modyfying a single property with our unified interface!

Frequently Asked Questions

Q: How does the batching process work?

A: Our platform allows you to group multiple OpenAI API requests into a single batch. This helps in optimizing efficiency and reducing response times.

Q: What happens if a job fails to execute?

A: In case of job failures, we'll retry the job few times. If it continues to fail, you will be notified through the provided webhook or websocket. You can then take appropriate actions based on the status updates.

Q: Is your service compatible with other LLM API providers?

A: You can integrate and batch requests for other API providers with similar capabilities to OpenAI API, such as Mistral. More integrations to come!