🤖 Ollama AI Cheat Sheet

Complete Guide to Local AI Model Management

Master Ollama with this comprehensive cheat sheet covering installation, model management, API usage, and advanced configurations for running AI models locally.

📦 Installation

Linux

curl -fsSL https://ollama.ai/install.sh | sh # Manual installation: curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/local/bin/ollama chmod +x /usr/local/bin/ollama

Windows

# Download installer from https://ollama.ai/download # Or use Windows Package Manager: winget install Ollama.Ollama

Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama # With GPU support (NVIDIA): docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
💡 Pro Tip
After installation, the Ollama service starts automatically. You can verify it's running by opening http://localhost:11434 in your browser.

Basic Commands

Run a Model
Start an interactive session with a model
ollama run [model_name]
List Models
Show all installed models
ollama list
Pull Model
Download a model from the registry
ollama pull [model_name]
Remove Model
Delete a model from local storage
ollama rm [model_name]
Show Model Info
Display model details and metadata
ollama show [model_name]
Copy Model
Create a copy of a model
ollama cp [source] [destination]

Common Usage Examples

# Start interactive chat with Llama 2 ollama run llama2 # Single prompt (AI responds once) ollama run llama2 "Say hello" ollama run llama2 "Explain quantum computing in simple terms" ollama run llama2 "Write a Python function to sort a list" # Pull the latest Codellama model ollama pull codellama # List all available models ollama list NAME ID SIZE MODIFIED llama2:latest e38ae474bf77 3.8 GB 2 hours ago codellama:7b 8fdf8f752f6e 3.8 GB 3 hours ago # Get model information ollama show llama2 # Remove a model ollama rm llama2:13b

🧠 Available Models

Llama 2 (7B, 13B, 70B)
Meta's flagship language model
Llama 3 (8B, 70B)
Latest Meta model with improved performance
Code Llama (7B, 13B, 34B)
Specialized for code generation
Mistral (7B)
High-performance 7B model
Mixtral (8x7B, 8x22B)
Mixture of experts model
DeepSeek Coder (1.3B, 6.7B, 33B)
Advanced coding model with fill-in-the-middle
DeepSeek LLM (7B, 67B)
General purpose reasoning model
WizardLM (7B, 13B)
Instruction-following model
WizardCoder (15B, 34B)
Advanced code generation
WizardMath (7B, 13B, 70B)
Mathematical reasoning specialist
OpenHermes 2.5 (7B)
High-quality synthetic training data
Neural Chat (7B)
Fine-tuned for conversations
Starling (7B)
RLHF fine-tuned model
Orca Mini (3B, 7B, 13B)
Compact reasoning model
Vicuna (7B, 13B, 33B)
ChatGPT-style conversations
Zephyr (7B)
Helpful assistant model
Dolphin (2.7, 7B, 13B, 70B)
Uncensored fine-tune
Nous Hermes 2 (7B, 13B, 70B)
Versatile assistant model
Yi (6B, 34B)
Bilingual Chinese-English model
Qwen (7B, 14B, 72B)
Alibaba's multilingual model
Phi-2 (2.7B)
Microsoft's compact reasoning model
GPT-OSS (20B, 120B)
Open-source GPT-style models

Model Tags and Versions

📚 Base Language Models

# Llama family ollama pull llama2:7b ollama pull llama2:13b ollama pull llama2:70b ollama pull llama3:8b ollama pull llama3:70b # Mistral family ollama pull mistral:latest ollama pull mixtral:8x7b # GPT-OSS models ollama pull gpt-oss:20b ollama pull gpt-oss:120b

💻 Code-Specialized Models

# Code Llama variants ollama pull codellama:7b-code ollama pull codellama:7b-instruct ollama pull codellama:7b-python # DeepSeek Coder ollama pull deepseek-coder:6.7b ollama pull deepseek-coder:33b # Wizard Coder ollama pull wizardcoder:15b

🧙 Specialized Task Models

# Instruction following ollama pull wizardlm:7b ollama pull wizardlm:13b # Mathematical reasoning ollama pull wizardmath:7b # General reasoning ollama pull deepseek-llm:7b ollama pull deepseek-llm:67b

🌟 Community & Fine-tuned Models

# Conversational models ollama pull openhermes:7b ollama pull nous-hermes2:7b # Multilingual models ollama pull qwen:7b ollama pull yi:34b # Compact models ollama pull phi:2.7b
⚠️ Storage Requirements
Models can be large! 7B models ~4GB, 13B models ~7GB, 70B models ~40GB. Ensure you have sufficient disk space.

🔌 API Usage

Generate Completion

curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }'

Chat Completion

curl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }'

Python Client

pip install ollama # Python usage import ollama response = ollama.chat(model='llama2', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ]) print(response['message']['content'])

JavaScript/Node.js

npm install ollama // JavaScript usage import ollama from 'ollama' const response = await ollama.chat({ model: 'llama2', messages: [{ role: 'user', content: 'Why is the sky blue?' }], }) console.log(response.message.content)

Streaming Responses

curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Tell me a story", "stream": true }' # Python streaming for chunk in ollama.chat( model='llama2', messages=[{'role': 'user', 'content': 'Tell me a story'}], stream=True, ): print(chunk['message']['content'], end='', flush=True)

API Endpoints

Endpoint Method Description
/api/generate POST Generate text completion
/api/chat POST Chat with conversational models
/api/tags GET List available models
/api/show POST Show model information
/api/pull POST Download a model
/api/push POST Upload a model
/api/delete DELETE Remove a model

🔊 Text-to-Speech Integration

Using espeak (Linux)

# Install espeak sudo apt install espeak # Pipe Ollama output to speech ollama run llama2 "Tell me a joke" | espeak # With custom voice settings ollama run llama2 "Hello world" | espeak -s 150 -v en+f3

Using Festival (Linux)

# Install Festival 🚀 Advanced Configuration

Environment Variables

# Set custom data directory export OLLAMA_MODELS=/path/to/models # Change default host and port export OLLAMA_HOST=0.0.0.0:11434 # Enable debug logging export OLLAMA_DEBUG=1 # Set maximum number of parallel requests export OLLAMA_MAX_LOADED_MODELS=3 # GPU configuration export OLLAMA_NUM_PARALLEL=4 export CUDA_VISIBLE_DEVICES=0,1

Model Configuration

# Create custom model with Modelfile cat > Modelfile << EOF FROM llama2 PARAMETER temperature 0.8 PARAMETER num_predict 100 SYSTEM "You are a helpful AI assistant." EOF # Build custom model ollama create mymodel -f Modelfile

Performance Tuning

Parameter Description Default
num_predict Maximum tokens to generate 128
temperature Randomness (0.0-2.0) 0.8
top_k Top-K sampling 40
top_p Top-P sampling 0.9
repeat_penalty Repetition penalty 1.1
num_ctx Context window size 2048

Service Management

# Start Ollama service ollama serve # Run as systemd service (Linux) sudo systemctl enable ollama sudo systemctl start ollama sudo systemctl status ollama # Check service logs journalctl -u ollama -f

🔧 Troubleshooting

Common Issues

Connection Refused
Check if Ollama service is running: ps aux | grep ollama
Restart service: ollama serve
Out of Memory
Try smaller models (7B instead of 13B/70B)
Increase system swap space
Close other applications
GPU Not Detected
Install NVIDIA drivers and CUDA toolkit
Check: nvidia-smi
Restart Ollama after driver installation

Diagnostic Commands

# Check Ollama version ollama --version # Check system resources free -h df -h nvidia-smi # Test API connectivity curl http://localhost:11434/api/tags # Check model status ollama ps # View logs (Linux) journalctl -u ollama --no-pager # View logs (Windows) Get-EventLog -LogName Application -Source Ollama

Performance Optimization

💡 Optimization Tips
• Use SSD storage for model files
• Ensure sufficient RAM (8GB+ for 7B models)
• Enable GPU acceleration when available
• Adjust context window size based on use case
• Use model quantization for lower memory usage

Network Configuration

# Allow external connections export OLLAMA_HOST=0.0.0.0:11434 ollama serve # Configure firewall (Ubuntu) sudo ufw allow 11434 # Test remote connection curl http://your-server-ip:11434/api/tags