From: Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT
Summary of all parameters used: (fine-tuning) | |
---|---|
Optimizer used | AdamW |
Training batch size | 32 |
Checkpoint saved | 500 |
Learning rate | 0.00001 |
Training steps | 10k |
Warm-up steps | 320 |