Now You Can Train LLMs on a Two-Year-Old Desktop-Grade NVIDIA 3090 GPU [Video]

Recently, researchers from the School of Computer Science & Technology, Soochow University, released a research paper titled ‘MemLong: Memory-Augmented Retrieval for Long Text Modeling’, where they successfully extended an LLM’s context window from 2k to 80k tokens on two-year-old desktop-grade NVIDIA 3090 GPU.

This opens new horizons for users with limited hardware access and who still want to use AI applications locally on their computers.

Another highlight of this study was fine-tuning a 3 billion parameter version of MemLong on 0.5 billion tokens, which requires only eight 3090 GPUs for eight hours, showcasing high efficiency in resource use. This means it is not only useful for users who want to run AI applications but also for developers who want to train their models over midrange hardware.

Bensen Hsu, the founder of OpenRead, mentioned that the method was designed to enhance the capabilities of long-context language modelling by utilising an external retriever for historical information retrieval.

…

Watch/Read More