使用NEU的TurboVNC
NEU的TurboVNC:
当然新建的时候选gpu版,并且选择合适的核数和内存,就不多说了
根据检查,此VNC是Rocky Linux9.3版本,包含python3.9.18,CUDA version是12.3,但其他深度学习框架基本上全都没有
具有gcc11.5,wget和curl,git是2.39.3
首先安装miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
然后
#初始化 ~/miniconda3/bin/conda init #重新加载bash source ~/.bashrc #查看是否安装成功 conda --version
此时可以观察到conda自带的python是3.13.11,为了稳定性考虑则使用3.11
但直接create环境会出现报错:CondaToSNonInteractiveError
经过查询,原因是需要先接受用户协议
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2
检查是否成功
conda config --show channels
然后创建3.11环境
conda create -n llm-py311 python=3.11 -y conda activate llm-py311 # 验证 python --version which python
安装需要的环境
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
或者使用pip
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
然后安装hugging face和相关库
pip install transformers==4.40.0 datasets==2.18.0 accelerate==0.28.0 pip install evaluate==0.4.1 peft==0.10.0 trl==0.8.6 pip install bitsandbytes==0.43.0 xformers==0.0.26 flash-attn==2.6.3 pip install numpy pandas matplotlib scikit-learn scipy seaborn
安装监控和工具
pip install tensorboard==2.16.0 wandb==0.17.9 tqdm pip install safetensors==0.4.5 protobuf==5.28.1 huggingface-hub==0.25.2 pip install nltk==3.9.1 sentencepiece==0.2.0 tokenizers==0.19.1 pip install ipython jupyter jupyterlab notebook pip install black flake8 pylint isort
配置Jupyter
python -m ipykernel install --user --name llm --display-name "LLM Training (PyTorch 2.3)" jupyter notebook --generate-config
2025
12 24
上一篇
Older
评论
0
点击
40