Google has unveiled Turboquant, a new compression algorithm that reportedly reduces memory usage in large language models by up to six times while increasing computation speeds by up to eight times, without degrading response quality, according to Ars Technica. The technology can be applied to existing models without retraining and has been successfully tested. It is expected to enable cheaper AI and more advanced models on the same hardware, with particular potential for mobile AI where limited memory has been a barrier.
Google introduces Turboquant compression algorithm to reduce AI memory usage and speed up computations
Key Points
- Google's Turboquant algorithm reduces AI memory usage by up to six times and speeds up computations by up to eight times.
- The technology can be applied to existing models without retraining and has been successfully tested.
- It may lead to cheaper AI and more advanced models on the same hardware, especially benefiting mobile AI.
Transparency
How we verified this article
LowBased on 3 sources
3 sources3 Involved