Speed Up Your LLM Workflows with Asyncio: Asynchronous Python for Faster AI
Why asyncio matters for AI
In many AI projects, much of the runtime is spent waiting — waiting for API responses, waiting for I/O, or waiting for multiple calls to complete. Python’s asyncio helps you avoid that idle time by letting I/O-bound tasks run concurrently inside a single thread using async/await syntax.
How asyncio works
asyncio schedules awaitable objects (usually coroutines) on an event loop. Instead of blocking while waiting for I/O, a coroutine yields control with await, letting other coroutines run. A simple analogy: synchronous code is a single checkout lane at a store, while asynchronous code is multiple self-checkouts working at the same time — more throughput when tasks spend most of their time waiting.
Examples: synchronous vs asynchronous tasks
Synchronous example: three sequential calls to a function that sleeps for 2 seconds each. The total time is roughly the sum of waits.
import time
def say_hello():
print("Hello...")
time.sleep(2) # simulate waiting (like an API call)
print("...World!")
def main():
say_hello()
say_hello()
say_hello()
if __name__ == "__main__":
start = time.time()
main()
print(f"Finished in {time.time() - start:.2f} seconds")
Asynchronous version: the three coroutines start almost simultaneously and sleep concurrently, so the total runtime is about the longest individual wait rather than the sum.
import nest_asyncio, asyncio
nest_asyncio.apply()
import time
async def say_hello():
print("Hello...")
await asyncio.sleep(2) # simulate waiting (like an API call)
print("...World!")
async def main():
# Run tasks concurrently
await asyncio.gather(
say_hello(),
say_hello(),
say_hello()
)
if __name__ == "__main__":
start = time.time()
asyncio.run(main())
print(f"Finished in {time.time() - start:.2f} seconds")
Download simulation: multiple downloads running concurrently. Each download simulates a variable duration with asyncio.sleep; concurrently running downloads finish in roughly the longest single download time.
import asyncio
import random
import time
async def download_file(file_id: int):
print(f"Start downloading file {file_id}")
download_time = random.uniform(1, 3) # simulate variable download time
await asyncio.sleep(download_time) # non-blocking wait
print(f"Finished downloading file {file_id} in {download_time:.2f} seconds")
return f"File {file_id} content"
async def main():
files = [1, 2, 3, 4, 5]
start_time = time.time()
# Run downloads concurrently
results = await asyncio.gather(*(download_file(f) for f in files))
end_time = time.time()
print("\nAll downloads completed.")
print(f"Total time taken: {end_time - start_time:.2f} seconds")
print("Results:", results)
if __name__ == "__main__":
asyncio.run(main())
Applying asyncio to LLM-based AI workflows
LLM calls (OpenAI, Anthropic, Hugging Face, etc.) are typical I/O-bound operations: each request spends most time waiting for the server. Running many prompts sequentially quickly adds up. Below is a practical comparison using the OpenAI client: first a synchronous approach, then an async one.
Prerequisites and setup (example):
!pip install openai
import asyncio
from openai import AsyncOpenAI
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Synchronous client example (makes requests one by one):
import time
from openai import OpenAI
# Create sync client
client = OpenAI()
def ask_llm(prompt: str):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
start = time.time()
results = []
for prompt in prompts:
results.append(ask_llm(prompt))
end = time.time()
for i, res in enumerate(results, 1):
print(f"\n--- Response {i} ---")
print(res)
print(f"\n[Synchronous] Finished in {end - start:.2f} seconds")
if __name__ == "__main__":
main()
Asynchronous client example (starts all requests nearly at once):
from openai import AsyncOpenAI
# Create async client
client = AsyncOpenAI()
async def ask_llm(prompt: str):
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
start = time.time()
results = await asyncio.gather(*(ask_llm(p) for p in prompts))
end = time.time()
for i, res in enumerate(results, 1):
print(f"\n--- Response {i} ---")
print(res)
print(f"\n[Asynchronous] Finished in {end - start:.2f} seconds")
if __name__ == "__main__":
asyncio.run(main())
Practical benefits for AI workflows
- Improved throughput: multiple API calls execute while others are waiting, so overall runtime drops.
- Cost and resource efficiency: faster processing can reduce compute time and associated costs.
- Better user experience: concurrent handling of requests makes interactive apps more responsive.
- Scalability: async patterns let you handle many simultaneous tasks without linearly increasing resources.
When to choose asyncio
Use asyncio when tasks are I/O-bound (API calls, network requests, database queries). If your workload is CPU-bound, multiprocessing or other parallelism techniques may be a better fit.
If your AI app makes many LLM requests, combines multiple API calls, or serves many users concurrently, adding asyncio can drastically reduce waiting times and make your system feel snappier.