BrainHoster
LLM Quickstart Guide
© 2025 Dev Advisory UK Ltd.
This guide provides step-by-step instructions for running Large Language Models (LLMs) on
your BrainHoster VPS using the Ollama framework. Please note that some familiarity with executing commands
in Linux and using the terminal is assumed.
Connect to your VPS using the ssh command from Windows, Linux, or macOS. Open a terminal and run:
ssh username@your_vps_ip
- Replace username with your VPS username (e.g., `root`).
- Replace your_vps_ip with your VPS’s IP address or hostname.
Tip: Use SSH keys for secure, password-free access. Generate a key pair with ssh-keygen and copy it to your VPS using
`ssh-copy-id`.
Run your first model
Once connected, you can run a model using the following command:
ollama run <model_name>:<model_tag>
Example:
ollama run mistral:7b
- First-time setup: When you run a model for the first time, Ollama will automatically download and store it locally to your VPS. This may take some time depending on the model size.
- Terminating a chat session: To exit the current chat session with an LLM, type /bye.
- Model tags: Use :latest for the most recent version or specify a variant (e.g., :16b for 16B parameter models).
Running another model
Execute the command:
ollama run <model_name>:<model_tag> for each model you would like to run.
Examples:
ollama run qwen3:8b
ollama run deepseek-coder-v2:16b
To monitor the resources of you VPS, use the following commands:
Check RAM usage:
free -h
Check SSD space:
df -h
Monitor CPU/memory in real time:
top
To list the installed models and remove the unused ones, use the following commands:
List models:
ollama list
Remove a model:
ollama rm <model_name>:<model_tag>
DeepSeek-Coder-V2:16b
- Description: Open-source Mixture-of-Experts (MoE) model optimized for code generation and reasoning
- Use Cases: Code completion, bug fixing, API integration
- Performance: Matches GPT-4 Turbo in code-specific tasks
- Training Data: 1.5T tokens (code + text)
- Command: ollama run deepseek-coder-v2:16b
DeepSeek-R1:7b
- Description: Open reasoning model with strong performance in natural language understanding and logical deduction
- Use Cases: Conversational AI, data analysis, multi-step reasoning
- Performance: Approaches O3 and Gemini 2.5 Pro in benchmarks
- Training Data: 1T tokens (general text)
- Command: ollama run deepseek-r1:7b
Qwen3:8b
- Description: Latest Qwen series model with dense and MoE variants for versatility
- Use Cases: Multilingual support, enterprise applications, large-scale inference
- Performance: 20% better than Qwen2 on reasoning tasks
- Training Data: 1.5T tokens (global text)
- Command: ollama run qwen3:8b
Llama 3:8b
- Description: The latest generation of Meta’s Llama series, offering improved reasoning, coding, and multilingual capabilities
- Use Cases: Advanced AI assistants, research, and large-scale inference
- Performance: 30% better than Llama 2 7B on reasoning tasks
- Training Data: 15T tokens (diverse global datasets)
- Command: ollama run llama3:8b
Mistral:7b
- Description: High-efficiency 7B model outperforming Llama 2 13B on benchmarks
- Use Cases: Chatbots, content creation, multilingual support
- Performance: 30% faster than Llama 2 13B on standard tasks
- Training Data: 100B tokens (diverse datasets)
- Command: ollama run mistral:7b
For the full list of the models supported by Ollama, visit https://ollama.com/library.
Copyrights and credits for the models referenced in this quickstart guide are attributed to:
- Alibaba Group Holding Limited (Qwen)
- Hangzhou DeepSeek Artificial Intelligence Co., Ltd. (DeepSeek/DeepSeek-Coder)
- Meta Platforms, Inc. (Llama)
- Mistral AI SAS (Mistral)