Beyond HBM: SK hynix and SanDisk Unveil High Bandwidth Flash for AI Inference

What's New

SK hynix and SanDisk have launched a groundbreaking memory tier called High Bandwidth Flash (HBF), specifically designed to address the emerging AI inference bottleneck. The two companies have begun global standardization of HBF under the Open Compute Project (OCP) consortium, positioning it as a new memory layer between volatile HBM and traditional SSDs. The technology was awarded "Best of Show, Most Innovative Technology" at FMS 2025, signaling strong industry recognition.

This collaboration combines SK hynix's leadership in High Bandwidth Memory with SanDisk's BiCS NAND expertise, targeting first samples in the second half of 2026 and AI inference devices in early 2027.

Technical Deep Dive

Architecture and Specifications

HBF leverages SanDisk's BiCS NAND technology with CBA (CMOS directly Bonded to Array) architecture, enabling high-density, high-speed, and low-power operation. The first-generation specifications are impressive:

Read bandwidth: 1.6 TB/s per stack, with Gen 2 targeting >2 TB/s and Gen 3 aiming for >3.2 TB/s
Capacity: 512 GB per 16-die stack using 256 Gb (32 GB) per die
Performance: Within 2.2% of unlimited-capacity HBM1 in AI inference modeling, despite higher latency
Form factor: Matches HBM4's physical footprint, power profile, and stack height

HBF uses a TSV (Through-Silicon Via) interposer architecture similar to HBM, enabling the high bandwidth densities required for AI workloads. The technology offers 8-16x the capacity of HBM—compare 512 GB per HBF stack to HBM3E's typical 24-48 GB per stack—while delivering similar bandwidth at comparable cost.

Memory Tier Positioning

HBF fills a critical gap in the memory hierarchy. Current PCIe Gen5 SSDs deliver approximately 14 GB/s bandwidth, making them 86x slower than HBM3E's 1.2 TB/s. This massive performance gap creates bottlenecks for inference workloads that must access large model parameters quickly. HBF's non-volatile nature also eliminates refresh power requirements, providing thermal stability advantages in dense data center deployments.

Roadmap

Generation	Read Bandwidth	Stack Capacity	Power Efficiency (vs Gen 1)
Gen 1 (2026)	1.6 TB/s	512 GB	1.0x
Gen 2	>2 TB/s	Up to 1 TB	0.8x
Gen 3	>3.2 TB/s	Up to 1.5 TB	0.64x

Market Impact

The Inference Shift

The timing couldn't be more strategic. By 2030, inference will surpass training to represent more than 50% of AI datacenter compute, according to McKinsey's analysis. This represents a fundamental shift from the training-heavy workloads that drove initial HBM demand. Inference workloads require sustained, real-time access to large model parameters—a use case where HBM's limited capacity becomes a constraint.

Current inference at scale accounts for approximately 34.6% of enterprise AI compute consumption, according to Futurum Group Survey data. As models grow larger and deployment scales, this percentage will accelerate rapidly. Bain & Company projects AI will represent nearly half of all compute workloads by 2030, driven primarily by inference.

Competitive Landscape

The memory market has been dominated by HBM for training workloads, with SK hynix, Samsung, and Micron competing aggressively. HBM3E stacks currently max out at 36-48 GB with 1.2-1.229 TB/s bandwidth. While HBM4, standardized in April 2025, aims for 64 GB per stack, it still can't match HBF's capacity scaling.

HBF's positioning as a standardized solution through OCP could drive broader adoption than proprietary alternatives. The standardization effort kicked off February 25, 2026, with both companies committed to open specifications.

Investment Implications

Iron Mountain projects AI inference will grow at a 79% CAGR through 2030, significantly outpacing training's 25% CAGR. This infrastructure shift will require an estimated $5.2 trillion investment in AI-related capacity globally. Memory solutions that address inference bottlenecks stand to capture significant market share.

What It Means

For Hardware Engineers

HBF enables new architecture possibilities for AI inference systems. The combination of HBM-class bandwidth with near-SSD capacity allows model parameters to remain in high-bandwidth memory rather than constantly shuffling between DRAM and storage. This reduces latency and simplifies system design for inference servers, edge deployments, and even handheld AI devices.

For Data Center Operators

The non-volatile nature of HBF offers operational advantages. Unlike HBM, which requires constant refresh power, HBF retains data without power consumption. This translates to better thermal characteristics and potentially lower operating costs in dense inference deployments. The matching form factor with HBM4 also simplifies integration into existing accelerator designs.

For AI Teams

As model sizes continue growing—particularly for large language models and multimodal AI—inference becomes increasingly memory-constrained. HBF's 512 GB to 1.5 TB roadmap capacity means larger models can run without parameter offloading, improving inference latency and throughput. This is critical for real-time applications like conversational AI and autonomous systems.

Strategic Outlook

The collaboration between SK hynix and SanDisk signals industry recognition that the memory hierarchy needs evolution for the inference era. While HBM remains essential for training, HBF addresses the distinct requirements of deployed AI systems. The open standardization approach through OCP should accelerate ecosystem development, with first commercial deployments expected in 2027.

For organizations planning AI infrastructure investments, HBF represents a new tier to evaluate. The technology won't replace HBM for training, but it could become essential for inference-heavy deployments where capacity and bandwidth must coexist.