BrainHoster

LLM Quickstart Guide

Version 1.0

This guide provides step-by-step instructions for running Large Language Models (LLMs) on your BrainHoster VPS using the Ollama framework. Please note that some familiarity with executing commands in Linux and using the terminal is assumed.

Connect to your VPS using the ssh command from Windows, Linux, or macOS. Open a terminal and run:

ssh username@your_vps_ip

Replace username with your VPS username (e.g., `root`).
Replace your_vps_ip with your VPS’s IP address or hostname.

Tip: Use SSH keys for secure, password-free access. Generate a key pair with ssh-keygen and copy it to your VPS using `ssh-copy-id`.

Run your first model

Once connected, you can run a model using the following command:

ollama run <model_name>:<model_tag>

Example:

ollama run mistral:7b

First-time setup: When you run a model for the first time, Ollama will automatically download and store it locally to your VPS. This may take some time depending on the model size.
Terminating a chat session: To exit the current chat session with an LLM, type /bye.
Model tags: Use :latest for the most recent version or specify a variant (e.g., :16b for 16B parameter models).

Running another model

Execute the command: ollama run <model_name>:<model_tag> for each model you would like to run.

Examples:

ollama run qwen3:8b

ollama run deepseek-coder-v2:16b

To monitor the resources of you VPS, use the following commands:

Check RAM usage:

free -h

Check SSD space:

df -h

Monitor CPU/memory in real time:

top

To list the installed models and remove the unused ones, use the following commands:

List models:

ollama list

Remove a model:

ollama rm <model_name>:<model_tag>

DeepSeek-Coder-V2:16b

Description: Open-source Mixture-of-Experts (MoE) model optimized for code generation and reasoning
Use Cases: Code completion, bug fixing, API integration
Performance: Matches GPT-4 Turbo in code-specific tasks
Training Data: 1.5T tokens (code + text)
Command: ollama run deepseek-coder-v2:16b

DeepSeek-R1:7b

Description: Open reasoning model with strong performance in natural language understanding and logical deduction
Use Cases: Conversational AI, data analysis, multi-step reasoning
Performance: Approaches O3 and Gemini 2.5 Pro in benchmarks
Training Data: 1T tokens (general text)
Command: ollama run deepseek-r1:7b

Qwen3:8b

Description: Latest Qwen series model with dense and MoE variants for versatility
Use Cases: Multilingual support, enterprise applications, large-scale inference
Performance: 20% better than Qwen2 on reasoning tasks
Training Data: 1.5T tokens (global text)
Command: ollama run qwen3:8b

Llama 3:8b

Description: The latest generation of Meta’s Llama series, offering improved reasoning, coding, and multilingual capabilities
Use Cases: Advanced AI assistants, research, and large-scale inference
Performance: 30% better than Llama 2 7B on reasoning tasks
Training Data: 15T tokens (diverse global datasets)
Command: ollama run llama3:8b

Mistral:7b

Description: High-efficiency 7B model outperforming Llama 2 13B on benchmarks
Use Cases: Chatbots, content creation, multilingual support
Performance: 30% faster than Llama 2 13B on standard tasks
Training Data: 100B tokens (diverse datasets)
Command: ollama run mistral:7b

For the full list of the models supported by Ollama, visit https://ollama.com/library.

Copyrights and credits for the models referenced in this quickstart guide are attributed to:

Alibaba Group Holding Limited (Qwen)
Hangzhou DeepSeek Artificial Intelligence Co., Ltd. (DeepSeek/DeepSeek-Coder)
Meta Platforms, Inc. (Llama)
Mistral AI SAS (Mistral)

BrainHoster

LLM Quickstart Guide

Introduction

1. Connecting to Your VPS

2. Running LLMs

Run your first model

Running another model

3. Resource Monitoring

4. Storage Management

5. Featured Models

DeepSeek-Coder-V2:16b

DeepSeek-R1:7b

Qwen3:8b

Llama 3:8b

Mistral:7b

6. Copyrights and Credits