I R O N B Y T E

Loading

About Us

We develop highly scalable solutions based on the mathematical algorithms. As an example of our solutions is clustering a dataset around centers of “weights” using a recursive branching method, distributing the clustered dataset across computational nodes to perform autonomous AI training, and subsequently combining the obtained model weights to achieve the final result. Current results show us minimal training quality degradation (within the margin of measurement error), while significantly reducing the network speed requirements.

Learn more

Introduction

We propose the creation of a Data Processing Center with a total capacity of at least 2000 PFlops. Using the IRONBYTE architecture for distributed launch and management of AI computing tasks, this data center focuses on ensuring high availability of the core system, responsible for all tasks and data storage orchestration.

Learn More

Learning

LLM (Large Language Models) training, fine-tuning, ML (Machine Learning)

Storage

Storage of models, datasets, AI software libraries

Scaling

Running models and forming a pipeline of model inferencing and scaling

Data Center Scaling Task

The data center can expand indefinitely by adding the same type of nodes. When exceeding 5,000 nodes, the number of master nodes must be increased for effective task orchestration. Adding next-generation compute nodes will not require changes to the existing architecture or software frameworks.

Irregular Problems

  • Task Combination Challenges
    Combining tasks on an IRONBYTE RIG is contingent upon the availability of required computing resources.
  • Data Center Modernization Issues
    During the planned modernization of a Data Center, tasks must be designated to specific computational nodes, such as those equipped with accelerators of a new architecture.
  • Legacy Task Deployment
    Running legacy tasks within a software environment requires fixed versions of NVIDIA drivers and libraries, which are currently unsupported.
  • Long-term Model Operation
    Supporting inferencing models that need to remain operational for extended periods without requiring frequent changes or updates.

Comparison with Alternative Solutions

Here you can see why IRONBYTE is more cost-effective compared to its competitors.

Parameter / Type IRONBYTE RIG Nvidia server 8xA100 Nvidia server 8xH200
TFlops (FP32) 730 156 536
TFlops (FP8) 6,600 4,992 32,000
RAM GPU (GB) 240 640 1,128
Cost $40,000 $160,000 $400,000
IRONBYTE RIG efficiency factor in synthetic load 100% 1900% 1400%
Cloud 2000 PFlops (synthetic load) $115,000,000 $2,153,000,000 $1,567,000,000
Practical effectiveness of IRONBYTE RIG in a synthetic load Over 40% Over 60%
Analog of IRONBYTE RIG with consideration of common tasks in LLM $115,000,000 $2,153,000,000 / $1,292,000,000 $1,567,000,000 / $635,000,000

Efficiency summary

With the advent of AI, an additional 340 TWh of energy will be required, equivalent to approximately 46 new nuclear power plants, 43,500 wind turbines, or 305,000 solar panels. By implementing our technology, this requirement can be reduced by a factor of three, meaning 30 fewer nuclear power plants would need to be built. The results of experiments can be found in the report.

Tests results
  • Optimized for FP32/16 Computation

    The proposed solution is oriented first of all on computations in FP32/16, which have nowadays the greatest practical sense in ML/LLM problems

  • Memory Constraints Solved

    The issue of limited memory capacity in RIGs (particularly in larger models) has been addressed through the implementation of diverse mathematical solutions, namely IRONBYTE. This approach has demonstrated minimal impact on the quality of learning, specifically in the pre-learning phase

  • Parallelized Training for Speed

    The speed characteristics of inter-node exchange via NVLINK are compensated by the separation (parallelization) of training (pre-training) tasks. This is based on the correlation separation of the dataset between computational nodes and the subsequent combination of model layers obtained on different nodes

  • Unmatched Cost-Performance

    Efficiency at realization of tasks of alternative type, in terms of “pricequality” ratio is more than 1000%

  • Competitive Edge in Specialization

    Efficiency, when implementing the tasks of competitors' specialization, is 40/60% respectively

  • Superior Cost-Quality Ratio

    The “price-quality” advantage, when comparing identical data centers in all parameters, is more than 2 times