Generating AI Presentations Locally with Ollama: A Complete Developer Guide

Generating AI Presentations Locally with Ollama: A Complete Developer Guide

Run Presenton with Local AI Models for Privacy-First Presentation Generation


What Exactly is Ollama?

Think of Ollama as Docker for AI models. Instead of making API calls to OpenAI or Anthropic, you download AI models directly to your machine and run them locally. No internet required (after the initial download), no per-token costs, and absolutely no data leaving your infrastructure.

Ollama takes care of all the complex stuff – model management, memory optimization, GPU acceleration – so you can just say "run Llama 3.2" and it works. It's like having ChatGPT running on your laptop, except it's completely under your control.

The best part? These open-source models have gotten seriously capable. Many perform at GPT-3.5 levels for presentation generation tasks, and some excel in specific areas. All running on hardware you control.

Why This Matters for Presentation Generation

When you're generating presentations with AI, you're often working with sensitive information – financial data, strategic plans, customer details, competitive analysis. Many organizations prefer to keep this data on their own infrastructure rather than sending it to external services or APIs.

There's also the cost consideration. If you're generating presentations regularly – automated reports, training materials, or batch processing – API costs can become significant. With local models, you pay once for the hardware and generate unlimited presentations.

Setting Up Local AI Presentations

The setup is straightforward if you're comfortable with Docker. Presenton can automatically manage Ollama models, or connect to an existing Ollama server you're already running.

Basic Setup with Auto-Managed Models

Start with this command:

docker run -it --name presenton -p 5000:80 \
  -e LLM="ollama" \
  -e OLLAMA_MODEL="llama3.2:3b" \
  -e PEXELS_API_KEY="your_pexels_api_key" \
  -e CAN_CHANGE_KEYS="false" \
  -v "./user_data:/app/user_data" \
  ghcr.io/presenton/presenton:latest

This setup automatically downloads and manages the specified model. The first run takes a few minutes for the model download, then generation is fast.

Connecting to Existing Ollama

If you already have Ollama running elsewhere, point Presenton to your existing setup:

docker run -it --name presenton -p 5000:80 \
  -e LLM="ollama" \
  -e OLLAMA_MODEL="llama3.2:3b" \
  -e OLLAMA_URL="http://your-ollama-server:11434" \
  -e PEXELS_API_KEY="your_pexels_api_key" \
  -e CAN_CHANGE_KEYS="false" \
  -v "./user_data:/app/user_data" \
  ghcr.io/presenton/presenton:latest

This approach works well when you have a dedicated server running Ollama with GPU acceleration, or want to share models across multiple applications.

Model Selection Guide

Different models offer different trade-offs between resource usage and output quality:

For Testing: llama3.2:3b (2GB) - Lightweight, runs on most hardware, good for basic presentations

For Production: llama3.3:70b (43GB) - High quality output, supports charts and graphs, requires substantial resources

Balanced Option: deepseek-r1:32b (20GB) - Good quality with reasonable resource requirements

Minimal Resources: gemma3:1b (815MB) - Very lightweight, basic functionality

The key distinction: models with "Graph Support" can generate actual charts and diagrams. Smaller models are limited to text-based slides.

GPU Acceleration

Adding GPU support significantly speeds up generation. With NVIDIA GPUs, add --gpus=all:

docker run -it --name presenton --gpus=all -p 5000:80 \
  -e LLM="ollama" \
  -e OLLAMA_MODEL="llama3.3:70b" \
  -e PEXELS_API_KEY="your_pexels_api_key" \
  -e CAN_CHANGE_KEYS="false" \
  -v "./user_data:/app/user_data" \
  ghcr.io/presenton/presenton:latest

This can reduce generation time from minutes to seconds, especially with larger models. Requires NVIDIA Container Toolkit to be installed.

Configuration Options

Here's what each environment variable controls:

  • LLM="ollama" - Selects Ollama as the AI backend
  • OLLAMA_MODEL="llama3.2:3b" - Specifies which model to use (downloads automatically if needed)
  • OLLAMA_URL="http://..." - Points to external Ollama server (optional)
  • PEXELS_API_KEY="..." - Enables automatic stock image integration
  • CAN_CHANGE_KEYS="false" - Locks configuration for production deployments

Adding Stock Images

For professional-looking presentations, get a free Pexels API key at pexels.com/api. This enables automatic stock image integration, giving you 200 requests per hour on the free tier.

Performance Expectations

Performance varies significantly based on hardware:

Apple Silicon M1/M2: 3B models run smoothly, 8B models are slower but usable, 70B models are very slow RTX 4080/4090: All models run well, including large 70B models CPU Only: Best to stick with smaller models (3B or less) for reasonable generation times Server Hardware: Dedicated GPU servers provide the best performance for production use

API Usage

Once running, the API works identically to cloud-based setups:

curl -X POST http://localhost:5000/api/v1/ppt/generate/presentation \
  -F "prompt=Quarterly sales review with growth charts and competitor analysis" \
  -F "n_slides=10" \
  -F "theme=royal_blue" \
  -F "export_as=pptx"

The advantage is that all data processing happens locally – no external API calls or data transmission.

Production Considerations

For production deployments, consider these optimizations:

Model Persistence: Add -v "./ollama_models:/root/.ollama" to avoid re-downloading models after container restarts.

Memory Planning: Each model loads entirely into RAM/VRAM. Ensure sufficient memory for your chosen models.

Network Isolation: Can run completely offline after initial model downloads.

Model Management: You can switch models without restarting Presenton by updating environment variables.

Common Issues and Solutions

Model download fails: Usually network connectivity or insufficient disk space GPU not detected: Verify NVIDIA Container Toolkit installation Out of memory errors: Model too large for available hardware - try a smaller model Slow generation: Enable GPU acceleration or use a lighter model

Benefits of Local AI for Presentations

The local approach offers several advantages:

  • Privacy: Sensitive data never leaves your infrastructure
  • Cost Control: No per-token fees after initial hardware investment
  • Reliability: No dependency on external API availability
  • Customization: Full control over the AI pipeline and models
  • Compliance: Easier to meet data governance requirements

Expanding Your Setup

Once you have basic presentation generation working, you can:

  • Automate presentation creation from data sources
  • Create organization-specific templates and themes
  • Build presentation APIs for internal applications
  • Scale with multiple Presenton instances
  • Integrate with existing business intelligence tools

Getting Started

The local AI approach works well for organizations that prioritize data privacy, want predictable costs, or need reliable offline operation. Start with a small model to test the workflow, then scale up based on your quality and performance requirements.

For detailed setup instructions, check the Presenton documentation. The Discord community is also helpful for troubleshooting and model recommendations.

Read more