Next Monday, 28 Apr 2025, we are launching the public permanently free unlimited, no payment, no catch of Purem – a revolutionary CPU-accelerated AI kernel that matches the performance of leading in-house infrastructures like OpenAI, Tesla, and Meta.
After strict recalculation accounting for GPU-to-CPU architecture and memory parallelism, our CPU implementation of Softmax (6500 ops/sec) outperforms the GPU-optimized FlashAttention recalculated for standard CPUs (estimated 800–5000 ops/sec)
Benchmark Highlights:
• Purem achieves ~6500 ops/sec for large-scale softmax operations on Apple M2 CPU core.
Comparable to production-grade benchmarks:
• OpenAI's FlashAttention: Achieves 800–5000 ops/sec on CPU.
• Meta’s Xformers (PyTorch 2.0): Achieves 1300–1500 ops/sec on CPU.
Key Advantages:
• Local Execution: Runs entirely on your MacBook without the need for specialized hardware or cloud services.
• Cost-Efficient: Delivers performance comparable to billion-dollar infrastructures at a fraction of the cost.
• Open and Transparent: Built with transparency in mind, allowing for easy inspection and integration.
Official Comparing References:
• OpenAI FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: Tri Dao et al., 2022, https://arxiv.org/abs/2205.14135
• FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention: PyTorch Blog, 2024, https://pytorch.org/blog/flexattention/
With Purem, experience the power of industrial-grade AI performance directly on your personal device.
Next Monday, 28 Apr 2025, we are launching the public permanently free unlimited, no payment, no catch of Purem – a revolutionary CPU-accelerated AI kernel that matches the performance of leading in-house infrastructures like OpenAI, Tesla, and Meta.
A major leap for macOS users – explore Purem here: https://worktif.com/
After strict recalculation accounting for GPU-to-CPU architecture and memory parallelism, our CPU implementation of Softmax (6500 ops/sec) outperforms the GPU-optimized FlashAttention recalculated for standard CPUs (estimated 800–5000 ops/sec)
Benchmark Highlights: • Purem achieves ~6500 ops/sec for large-scale softmax operations on Apple M2 CPU core. Comparable to production-grade benchmarks: • OpenAI's FlashAttention: Achieves 800–5000 ops/sec on CPU. • Meta’s Xformers (PyTorch 2.0): Achieves 1300–1500 ops/sec on CPU.
Key Advantages: • Local Execution: Runs entirely on your MacBook without the need for specialized hardware or cloud services. • Cost-Efficient: Delivers performance comparable to billion-dollar infrastructures at a fraction of the cost. • Open and Transparent: Built with transparency in mind, allowing for easy inspection and integration.
Official Comparing References: • OpenAI FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: Tri Dao et al., 2022, https://arxiv.org/abs/2205.14135 • FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention: PyTorch Blog, 2024, https://pytorch.org/blog/flexattention/
With Purem, experience the power of industrial-grade AI performance directly on your personal device.