安装LLaMA-Factory微调chatglm3，修改自我认知

这篇具有很好参考价值的文章主要介绍了安装LLaMA-Factory微调chatglm3，修改自我认知。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

安装git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -r requirements.txt

之后运行

单卡训练，

CUDA_VISIBLE_DEVICES=0 python src/train_web.py，按如下配置

export_model llama-factory,python,人工智能,深度学习

demo_tran.sh

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path /data/models/llm/chatglm3-lora/ \
    --do_train \
    --overwrite_output_dir \
    --dataset self_cognition \
    --template chatglm3 \
    --finetuning_type lora \
    --lora_target query_key_value \
    --output_dir export_chatglm3 \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-3 \
    --num_train_epochs 10.0 \
    --plot_loss \
    --fp16

export_model.sh

python src/export_model.py \
    --model_name_or_path /data/models/llm/chatglm3-lora/ \
    --template chatglm3 \
    --finetuning_type lora \
    --checkpoint_dir /data/projects/LLaMA-Factory/export_chatglm3 \
    --export_dir lora_merge_chatglm3

cli_demo.sh

python src/cli_demo.py \
    --model_name_or_path /data/models/llm/chatglm3-lora/ \
    --template default \
    --finetuning_type lora

注意合并模型的时候，最后复制chatglm3的tokenizer.model和tokenizer_config.json到合并后模型覆盖之后，要修改

export_model llama-factory,python,人工智能,深度学习

export_model llama-factory,python,人工智能,深度学习不覆盖会有这个错误，

Use DeepSpeed方法

deepspeed --num_gpus 3 --master_port=9901 src/train_bash.py \
    --deepspeed ds_config.json \
    --stage sft \
    --model_name_or_path /media/cys/65F33762C14D581B/chatglm2-6b \
    --do_train True \
    --finetuning_type lora \
    --template chatglm2 \
    --flash_attn False \
    --shift_attn False \
    --dataset_dir data \
    --dataset self_cognition,sharegpt_zh \
    --cutoff_len 1024 \
    --learning_rate 0.001 \
    --num_train_epochs 10.0 \
    --max_samples 1000 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 10 \
    --save_steps 1000 \
    --warmup_steps 0 \
    --neft_alpha 0 \
    --train_on_prompt False \
    --upcast_layernorm False \
    --lora_rank 8 \
    --lora_dropout 0.1 \
    --lora_target query_key_value \
    --resume_lora_training True \
    --output_dir saves/ChatGLM2-6B-Chat/lora/train_2023-12-12-23-26-49 \
    --fp16 True \
    --plot_loss True

ds_config.json的格式下面的：

{
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },  
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "overlap_comm": false,
    "contiguous_gradients": true
  }
}

跑成功的效果图：

export_model llama-factory,python,人工智能,深度学习

如果出现下面这个问题，

[E ProcessGroupNCCL.cpp:916] [Rank 3] NCCL watchdog thread terminated with exception: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

可能原因是显卡坏了或者显卡不是同一个型号！文章来源地址https://www.toymoban.com/news/detail-764636.html

到了这里，关于安装LLaMA-Factory微调chatglm3，修改自我认知的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！