CudaGPT2
GPT2 Hyper Optimization.
GPT2 hyper optimised in Cuda Github Link
GPT2 implementation in C and Cuda.
Tasks did
- Separate kernels for Cuda and Cpu.
- All kernels from scratch except matmul.
- tokenizer implementation.
- inference implementation.
Todo
- Optimizing Attention_forward kernel.
- profiling and further optimizing based on profile.
To run
git clone -b cuda https://github.com/Autobot37/gpt.cpp
python3 pythonscripts/prepare_tokenizer.py
python3 writestate.py
make run_cuda
Dependencies
- python tiktoken module.
- nvidia cuda toolkit.