Google introduces Turboquant compression algorithm to reduce AI memory usage and speed up computations

Key Points

Google's Turboquant algorithm reduces AI memory usage by up to six times and speeds up computations by up to eight times.
The technology can be applied to existing models without retraining and has been successfully tested.
It may lead to cheaper AI and more advanced models on the same hardware, especially benefiting mobile AI.

Google has unveiled Turboquant, a new compression algorithm that reportedly reduces memory usage in large language models by up to six times while increasing computation speeds by up to eight times, without degrading response quality, according to Ars Technica. The technology can be applied to existing models without retraining and has been successfully tested. It is expected to enable cheaper AI and more advanced models on the same hardware, with particular potential for mobile AI where limited memory has been a barrier.

Transparency

How we verified this article

LowBased on 3 sources

3 sources3 Involved