Ollama & Xinference
约 446 字
Ollama 文档
https://github.com/ollama/ollama/blob/main/docs/faq.md##how-do-i-configure-ollama-server
命令
ollama show --modelfile qwen ## 检查模型现有 Modelfile
ollama ps ## 查看当前加载到内存中的模型
创建自定义模型
touch Modelfile
vim Modelfile
FROM /Users/zailiang/Downloads/qwen1_5-0_5b-chat-q5_k_m.gguf
## set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
SYSTEM """
If you're asked who you are or who developed it, you have to say I'm an alien from the universe
"""
ollama create example -f Modelfile
ollama list
ollama run example
Ollama OpenAI 兼容性
地址:https://ollama.com/blog/openai-compatibility
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
Ollama 模型默认保存位置
macOS: ~/.ollama/models
Linux: /usr/share/ollama/.ollama/models
Windows: C:\Users\%username%\.ollama\models
Ollama 远程测试状态
API:https://github.com/ollama/ollama/blob/main/docs/api.md
curl http://192.168.2.42:11434/api/tags
Ollama 服务以 0.0.0.0 启动
sudo vim /etc/systemd/system/ollama.service
在 [Service] 部分,添加以下行以设置 OLLAMA_HOST 环境变量,使其监听所有网络接口的指定端口(默认端口为 11434):
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama
sudo netstat -tulnp | grep 11434
Ollama 开启代理
Environment="http_proxy=http://192.168.2.41:20172"
Environment="https_proxy=http://192.168.2.41:20172"
Ollama 运行 Huggingface ModelScope 模型
ollama run hf.co/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M
ollama run modelscope.cn/Qwen/Qwen2.5-3B-Instruct-GGUF:Q3_K_M
此 hf.co
= huggingface.co
https://www.modelscope.cn/docs/models/advanced-usage/ollama-integration
Ollama 不使用推理模式
一:CLI 命令行
使用 --think 参数来控制思考输出:
ollama run deepseek-r1:8b --think=false "你的提问"
二:交互式终端
在启动 ollama run deepseek-r1:8b 后进入终端,你可以输入:
/set nothink
这样接下来所有提问都将关闭思考过程,模型只输出最终回答