Ollama AI Cheat Sheet

📦 Installation

Linux

curl -fsSL https://ollama.ai/install.sh | sh
# Manual installation:
curl -L https://ollama.ai/download/ollama-linux-amd64 -o /usr/local/bin/ollama
chmod +x /usr/local/bin/ollama
            

Windows

# Download installer from https://ollama.ai/download
# Or use Windows Package Manager:
winget install Ollama.Ollama
            

Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# With GPU support (NVIDIA):
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
            

💡 Pro Tip

After installation, the Ollama service starts automatically. You can verify it's running by opening http://localhost:11434 in your browser.

⚡ Basic Commands

Run a Model

Start an interactive session with a model

ollama run [model_name]

List Models

Show all installed models

ollama list

Pull Model

Download a model from the registry

ollama pull [model_name]

Remove Model

Delete a model from local storage

ollama rm [model_name]

Show Model Info

Display model details and metadata

ollama show [model_name]

Copy Model

Create a copy of a model

ollama cp [source] [destination]

Common Usage Examples

# Start interactive chat with Llama 2
ollama run llama2

# Single prompt (AI responds once)
ollama run llama2 "Say hello"
ollama run llama2 "Explain quantum computing in simple terms"
ollama run llama2 "Write a Python function to sort a list"

# Pull the latest Codellama model
ollama pull codellama

# List all available models
ollama list
NAME           ID              SIZE    MODIFIED
llama2:latest  e38ae474bf77    3.8 GB  2 hours ago
codellama:7b   8fdf8f752f6e    3.8 GB  3 hours ago

# Get model information
ollama show llama2

# Remove a model
ollama rm llama2:13b
            

🧠 Available Models

Llama 2 (7B, 13B, 70B)

Meta's flagship language model

Llama 3 (8B, 70B)

Latest Meta model with improved performance

Code Llama (7B, 13B, 34B)

Specialized for code generation

Mistral (7B)

High-performance 7B model

Mixtral (8x7B, 8x22B)

Mixture of experts model

DeepSeek Coder (1.3B, 6.7B, 33B)

Advanced coding model with fill-in-the-middle

DeepSeek LLM (7B, 67B)

General purpose reasoning model

WizardLM (7B, 13B)

Instruction-following model

WizardCoder (15B, 34B)

Advanced code generation

WizardMath (7B, 13B, 70B)

Mathematical reasoning specialist

OpenHermes 2.5 (7B)

High-quality synthetic training data

Neural Chat (7B)

Fine-tuned for conversations

Starling (7B)

RLHF fine-tuned model

Orca Mini (3B, 7B, 13B)

Compact reasoning model

Vicuna (7B, 13B, 33B)

ChatGPT-style conversations

Zephyr (7B)

Helpful assistant model

Dolphin (2.7, 7B, 13B, 70B)

Uncensored fine-tune

Nous Hermes 2 (7B, 13B, 70B)

Versatile assistant model

Yi (6B, 34B)

Bilingual Chinese-English model

Qwen (7B, 14B, 72B)

Alibaba's multilingual model

Phi-2 (2.7B)

Microsoft's compact reasoning model

GPT-OSS (20B, 120B)

Open-source GPT-style models

Model Tags and Versions

📚 Base Language Models

# Llama family
ollama pull llama2:7b
ollama pull llama2:13b
ollama pull llama2:70b
ollama pull llama3:8b
ollama pull llama3:70b

# Mistral family
ollama pull mistral:latest
ollama pull mixtral:8x7b

# GPT-OSS models
ollama pull gpt-oss:20b
ollama pull gpt-oss:120b
            

💻 Code-Specialized Models

# Code Llama variants
ollama pull codellama:7b-code
ollama pull codellama:7b-instruct
ollama pull codellama:7b-python

# DeepSeek Coder
ollama pull deepseek-coder:6.7b
ollama pull deepseek-coder:33b

# Wizard Coder
ollama pull wizardcoder:15b
            

🧙 Specialized Task Models

# Instruction following
ollama pull wizardlm:7b
ollama pull wizardlm:13b

# Mathematical reasoning
ollama pull wizardmath:7b

# General reasoning
ollama pull deepseek-llm:7b
ollama pull deepseek-llm:67b
            

🌟 Community & Fine-tuned Models

# Conversational models
ollama pull openhermes:7b
ollama pull nous-hermes2:7b

# Multilingual models
ollama pull qwen:7b
ollama pull yi:34b

# Compact models
ollama pull phi:2.7b
            

⚠️ Storage Requirements

Models can be large! 7B models ~4GB, 13B models ~7GB, 70B models ~40GB. Ensure you have sufficient disk space.

🔌 API Usage

Generate Completion

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'
            

Chat Completion

curl http://localhost:11434/api/chat -d '{
  "model": "llama2",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}'
            

Python Client

pip install ollama

# Python usage
import ollama

response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])
            

JavaScript/Node.js

npm install ollama

// JavaScript usage
import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)
            

Streaming Responses

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Tell me a story",
  "stream": true
}'

# Python streaming
for chunk in ollama.chat(
    model='llama2',
    messages=[{'role': 'user', 'content': 'Tell me a story'}],
    stream=True,
):
    print(chunk['message']['content'], end='', flush=True)
            

API Endpoints

Endpoint	Method	Description
/api/generate	POST	Generate text completion
/api/chat	POST	Chat with conversational models
/api/tags	GET	List available models
/api/show	POST	Show model information
/api/pull	POST	Download a model
/api/push	POST	Upload a model
/api/delete	DELETE	Remove a model

🔊 Text-to-Speech Integration

Using espeak (Linux)

# Install espeak
sudo apt install espeak

# Pipe Ollama output to speech
ollama run llama2 "Tell me a joke" | espeak

# With custom voice settings
ollama run llama2 "Hello world" | espeak -s 150 -v en+f3
            

Using Festival (Linux)

# Install Festival
🚀
                Advanced Configuration
            

            Environment Variables
# Set custom data directory
export OLLAMA_MODELS=/path/to/models

# Change default host and port
export OLLAMA_HOST=0.0.0.0:11434

# Enable debug logging
export OLLAMA_DEBUG=1

# Set maximum number of parallel requests
export OLLAMA_MAX_LOADED_MODELS=3

# GPU configuration
export OLLAMA_NUM_PARALLEL=4
export CUDA_VISIBLE_DEVICES=0,1
            
Model Configuration
# Create custom model with Modelfile
cat > Modelfile << EOF
FROM llama2
PARAMETER temperature 0.8
PARAMETER num_predict 100
SYSTEM "You are a helpful AI assistant."
EOF

# Build custom model
ollama create mymodel -f Modelfile
            
Performance Tuning
                
                    
                            Parameter
                            Description
                            Default
                        

                    
                            num_predict
                            Maximum tokens to generate
                            128
                        

                            temperature
                            Randomness (0.0-2.0)
                            0.8
                        

                            top_k
                            Top-K sampling
                            40
                        

                            top_p
                            Top-P sampling
                            0.9
                        

                            repeat_penalty
                            Repetition penalty
                            1.1
                        

                            num_ctx
                            Context window size
                            2048
                        

                
            
Service Management
# Start Ollama service
ollama serve

# Run as systemd service (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

# Check service logs
journalctl -u ollama -f
            

Parameter	Description	Default
num_predict	Maximum tokens to generate	128
temperature	Randomness (0.0-2.0)	0.8
top_k	Top-K sampling	40
top_p	Top-P sampling	0.9
repeat_penalty	Repetition penalty	1.1
num_ctx	Context window size	2048

🔧 Troubleshooting

Common Issues

Connection Refused

Check if Ollama service is running: ps aux | grep ollama
Restart service: ollama serve

Out of Memory

Try smaller models (7B instead of 13B/70B)
Increase system swap space
Close other applications

GPU Not Detected

Install NVIDIA drivers and CUDA toolkit
Check: nvidia-smi
Restart Ollama after driver installation

Diagnostic Commands

# Check Ollama version
ollama --version

# Check system resources
free -h
df -h
nvidia-smi

# Test API connectivity
curl http://localhost:11434/api/tags

# Check model status
ollama ps

# View logs (Linux)
journalctl -u ollama --no-pager

# View logs (Windows)
Get-EventLog -LogName Application -Source Ollama
            

Performance Optimization

💡 Optimization Tips

• Use SSD storage for model files
• Ensure sufficient RAM (8GB+ for 7B models)
• Enable GPU acceleration when available
• Adjust context window size based on use case
• Use model quantization for lower memory usage

Network Configuration

# Allow external connections
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

# Configure firewall (Ubuntu)
sudo ufw allow 11434

# Test remote connection
curl http://your-server-ip:11434/api/tags