4/2/2023 0 Comments Microsoft live mailTraining_args = Seq2SeqTrainingArguments(xxxx) model = om_pretrained( args.model_id, use_cache=False if adient_checkpointing else True, # this is needed for gradient checkpointing ) trainer = Seq2SeqTrainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, data_collator=data_collator, compute_metrics=compute_metrics, ) Therefore I changed to the following code Later I read it from transformers side of the documentation ( ) that if I define the training_arguments before loading model when using zero3 optimization, zero.init is automatically applied. Use_cache=False if adient_checkpointing else True, # this is needed for gradient checkpointing ![]() But when I try to load the model with the following code snippet, there's the _old_init() error. It seems the only valid choice is no offload + zero_init=1. I'm trying to load and train on the flan-t5-xxl (11B) model with a 8x48G GPU + ~150GB CPU machine.Īccording to the estimate with estimate_zero3_model_states_mem_needs_all_live(), the required memory for zero3 optimization is as follows
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |