400x AI Acceleration | Svelte Hacker News

elegantly 7 hours ago

Next Monday, 28 Apr 2025, we are launching the public permanently free unlimited, no payment, no catch of Purem – a revolutionary CPU-accelerated AI kernel that matches the performance of leading in-house infrastructures like OpenAI, Tesla, and Meta.

A major leap for macOS users – explore Purem here: https://worktif.com/

After strict recalculation accounting for GPU-to-CPU architecture and memory parallelism, our CPU implementation of Softmax (6500 ops/sec) outperforms the GPU-optimized FlashAttention recalculated for standard CPUs (estimated 800–5000 ops/sec)

Benchmark Highlights: • Purem achieves ~6500 ops/sec for large-scale softmax operations on Apple M2 CPU core. Comparable to production-grade benchmarks: • OpenAI's FlashAttention: Achieves 800–5000 ops/sec on CPU. • Meta’s Xformers (PyTorch 2.0): Achieves 1300–1500 ops/sec on CPU.

Key Advantages: • Local Execution: Runs entirely on your MacBook without the need for specialized hardware or cloud services. • Cost-Efficient: Delivers performance comparable to billion-dollar infrastructures at a fraction of the cost. • Open and Transparent: Built with transparency in mind, allowing for easy inspection and integration.

Official Comparing References: • OpenAI FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness: Tri Dao et al., 2022, https://arxiv.org/abs/2205.14135 • FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention: PyTorch Blog, 2024, https://pytorch.org/blog/flexattention/

With Purem, experience the power of industrial-grade AI performance directly on your personal device.