We develop highly scalable solutions based on the mathematical algorithms. As an example of our solutions is clustering a dataset around centers of “weights” using a recursive branching method, distributing the clustered dataset across computational nodes to perform autonomous AI training, and subsequently combining the obtained model weights to achieve the final result. Current results show us minimal training quality degradation (within the margin of measurement error), while significantly reducing the network speed requirements.
Practical experiments conducted with LLAMA2-70b on H100 computational nodes have demonstrated that for pretraining and fine-tuning tasks, a 3.5x acceleration can be achieved through more efficient workload distribution among GPUs and by reducing the amount of data exchanged over the network.
This became possible by implementing our own heuristic processing, in which we do data clustering. The current model accuracy is only marginally reduced, by no more than 3%, and we plan to eliminate this reduction entirely by early next year. At the same time, the perplexity metric remains comparable. The applied method shows an approximate 300% increase in hardware efficiency.
For larger models, we are investigating the problem of accuracy loss during quantized training and the dequantization of the model after training.
By using clustered datasets, there are promising indications of improved accuracy in quantized models after dequantization and merging the weights into the final model.
We have also optimized inference processes by reducing transmitted information, leading to a 30% improvement in speed performance.
To achieve higher training and token generation speeds, we are designing a model that can be deployed on ASICs, with the capability to update weights directly on the chip (potentially only in the last layers), thereby extending the ASIC’s lifecycle.
One major benefit, beyond the significant cost savings for global use, is environmental impact.
With the advent of AI, an additional 340 TWh of energy will be required, equivalent to approximately 46 new nuclear power plants, 43,500 wind turbines, or 305,000 solar panels. By implementing our technology, this requirement can be reduced by a factor of three, meaning 30 fewer nuclear power plants would need to be built.
The results of experiments can be found in the reports.