Master Ollama with this comprehensive cheat sheet covering installation, model management,
API usage, and advanced configurations for running AI models locally.
# Download installer from https://ollama.ai/download# Or use Windows Package Manager:winget install Ollama.Ollama
Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama# With GPU support (NVIDIA):docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
💡 Pro Tip
After installation, the Ollama service starts automatically. You can verify it's running by opening http://localhost:11434 in your browser.
⚡
Basic Commands
Run a Model
Start an interactive session with a model
ollama run [model_name]
List Models
Show all installed models
ollama list
Pull Model
Download a model from the registry
ollama pull [model_name]
Remove Model
Delete a model from local storage
ollama rm [model_name]
Show Model Info
Display model details and metadata
ollama show [model_name]
Copy Model
Create a copy of a model
ollama cp [source] [destination]
Common Usage Examples
# Start interactive chat with Llama 2ollama run llama2# Single prompt (AI responds once)ollama run llama2 "Say hello"ollama run llama2 "Explain quantum computing in simple terms"ollama run llama2 "Write a Python function to sort a list"# Pull the latest Codellama modelollama pull codellama# List all available modelsollama listNAME ID SIZE MODIFIED
llama2:latest e38ae474bf77 3.8 GB 2 hours ago
codellama:7b 8fdf8f752f6e 3.8 GB 3 hours ago# Get model informationollama show llama2# Remove a modelollama rm llama2:13b
npm install ollama// JavaScript usageimport ollama from 'ollama'
const response = await ollama.chat({
model: 'llama2',
messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)
Streaming Responses
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Tell me a story",
"stream": true
}'# Python streamingfor chunk in ollama.chat(
model='llama2',
messages=[{'role': 'user', 'content': 'Tell me a story'}],
stream=True,
):
print(chunk['message']['content'], end='', flush=True)
API Endpoints
Endpoint
Method
Description
/api/generate
POST
Generate text completion
/api/chat
POST
Chat with conversational models
/api/tags
GET
List available models
/api/show
POST
Show model information
/api/pull
POST
Download a model
/api/push
POST
Upload a model
/api/delete
DELETE
Remove a model
🔊
Text-to-Speech Integration
Using espeak (Linux)
# Install espeaksudo apt install espeak# Pipe Ollama output to speechollama run llama2 "Tell me a joke" | espeak# With custom voice settingsollama run llama2 "Hello world" | espeak -s 150 -v en+f3
Using Festival (Linux)
# Install Festival🚀
Advanced Configuration
Environment Variables
# Set custom data directoryexport OLLAMA_MODELS=/path/to/models# Change default host and portexport OLLAMA_HOST=0.0.0.0:11434# Enable debug loggingexport OLLAMA_DEBUG=1# Set maximum number of parallel requestsexport OLLAMA_MAX_LOADED_MODELS=3# GPU configurationexport OLLAMA_NUM_PARALLEL=4export CUDA_VISIBLE_DEVICES=0,1
Model Configuration
# Create custom model with Modelfilecat > Modelfile << EOF
FROM llama2
PARAMETER temperature 0.8
PARAMETER num_predict 100
SYSTEM "You are a helpful AI assistant."
EOF# Build custom modelollama create mymodel -f Modelfile
Performance Tuning
Parameter
Description
Default
num_predict
Maximum tokens to generate
128
temperature
Randomness (0.0-2.0)
0.8
top_k
Top-K sampling
40
top_p
Top-P sampling
0.9
repeat_penalty
Repetition penalty
1.1
num_ctx
Context window size
2048
Service Management
# Start Ollama serviceollama serve# Run as systemd service (Linux)sudo systemctl enable ollamasudo systemctl start ollamasudo systemctl status ollama# Check service logsjournalctl -u ollama -f
🔧
Troubleshooting
Common Issues
Connection Refused
Check if Ollama service is running: ps aux | grep ollama
Restart service: ollama serve
Out of Memory
Try smaller models (7B instead of 13B/70B)
Increase system swap space
Close other applications
GPU Not Detected
Install NVIDIA drivers and CUDA toolkit
Check: nvidia-smi
Restart Ollama after driver installation
Diagnostic Commands
# Check Ollama versionollama --version# Check system resourcesfree -hdf -hnvidia-smi# Test API connectivitycurl http://localhost:11434/api/tags# Check model statusollama ps# View logs (Linux)journalctl -u ollama --no-pager# View logs (Windows)Get-EventLog -LogName Application -Source Ollama
Performance Optimization
💡 Optimization Tips
• Use SSD storage for model files
• Ensure sufficient RAM (8GB+ for 7B models)
• Enable GPU acceleration when available
• Adjust context window size based on use case
• Use model quantization for lower memory usage