#### STORAGE DEVELOPER CONFERENCE





BY Developers FOR Developers

# SDXI Internals and its Journey Towards Standardizing Memory to Memory Data Movement

Presented by: William Moyes

AMD Fellow, End-to-End Server Architecture

#### **SNIA Legal Notice**

- This presentation is a project of SNIA, and the material contained in this presentation is copyrighted by the SNIA unless otherwise noted.
- SNIA member companies and individual members may use this material in presentations and literature under the following conditions:
  - Any slide or slides used must be reproduced in their entirety without modification
  - SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations
- Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion, please contact your attorney.
- The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved.
- The author, the presenter, and SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use
  of this information.

NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.



### Agenda

- Compute, IO, Memory Bubble
  - Current Memory to Memory Data Movement Standard
- Use Cases
  - Application Patterns and benefits of Data Movement & Acceleration
- SNIA SDXI TWG
  - Goals and Tenets
  - A brief introduction to the internals of SDXI Specification
  - SDXI Community
  - SDXI Futures
  - References, Links, and Announcements



#### Legacy Compute, Memory, IO Bubbles



### **Emerging Bubbles**



#### **Shared Design constraints**

- Latency
- Bandwidth
- Coherency
- Control

#### **Current Data Movement Standard**

- Software memcpy is the current data movement standard
  - Stable ISA
- However,
  - Takes away from application performance
  - Incurs software overhead to provide context isolation.
  - Offload DMA engines and their interfaces are vendor-specific
  - Not standardized for user-level software.



### Agenda

- Compute, IO, Memory Bubble
  - Current Memory to Memory Data Movement Standard
- Use Cases
  - Application Patterns and benefits of Data Movement & Acceleration
- SNIA SDXI TWG
  - Goals and Tenets
  - A brief introduction to the internals of SDXI Specification
  - SDXI Community
  - SDXI Futures
  - References, Links, and Announcements



## Application Pattern 1 (Buffer Copies)

Application(User) Memory Memory Buffer Buffer (dest) (source) Memcpy()

 Takes away from application performance



 HW based memory copies can be offloaded without affecting application performance



#### **Application Pattern 2**





- Reduced buffer copies
- HW based offloaded memory copies

STORAGE DEVELOPER CONFERENCE



#### **Application Pattern 3**



#### Data in use Memory Expansion



- Memory expansion expands the memory target surface area for accelerators
- Different tiers of memory
- Diversity in accelerator programming methods



#### **Baremetal Stack View**



Direct HW Access, Access Memory Tiers

DRAM PMEM MMIO CXL Memory

Source and Destination Memory Targets for Data transfer in System Physical Address Space



### Scale Baremetal Apps – Multi-Address Space



#### Scale with Compute Virtualization—Multi-VM address space



### Agenda

- Compute, IO, Memory Bubble
  - Current Memory to Memory Data Movement Standard
- Use Cases
  - Application Patterns and benefits of Data Movement & Acceleration

#### SNIA SDXI TWG

- Goals and Tenets
- A brief introduction to the internals of SDXI Specification
- SDXI Community
- SDXI Futures
- References, Links, and Announcements



### SDXI(Smart Data Accelerator Interface)

- Smart Data Accelerator Interface (SDXI) is a proposed standard for a memory-tomemory data movement and acceleration interface that is -
  - Extensible
  - Forward-compatible
  - Independent of I/O interconnect technology
- SNIA SDXI TWG was formed in June 2020 and tasked to work on this proposed standard
  - 28 member companies, 80+ individual members



#### SDXI Memory-to-Memory Data Movement



#### **SDXI** Design Tenets

- Data movement between different address spaces.
  - Includes user address spaces, different virtual machines
- Data movement without mediation by privileged software.
  - Once a connection has been established.
- Allows abstraction or virtualization by privileged software.
- Capability to quiesce, suspend, and resume the architectural state of a per-address-space data mover.
  - Enable "live" workload or virtual machine migration between servers.
- Enables forwards and backwards compatibility across future specification revisions.
  - Interoperability between software and hardware
- Incorporate additional offloads in the future leveraging the architectural interface.
- Concurrent DMA model.



#### Memory Structures(1) – Simplified view



- All states in memory
- One standard descriptor format
- Easy to virtualize
- Architected function setup and control
  - \*layered model for interconnect specific function management
  - SDXI class code registered for PCle implementations



### Memory Structures(2) – Multiple Contexts



- Multiple Contexts per function
- Ring State directly managed by user space
- One way to log errors
- Per context access to target address spaces(Akey)
- One way to control access to local memory resources from remote functions(Rkey)
- One way to start, stop and administer contexts



#### **Descriptor Ring**

N = ds\_ring\_sz (Number of entries in Queue) Index = 0, N, 2\*N, ... Index = Index = Free N-1, 2\*N-1, 3\*N-1, ... 1, N+1, 2\*N+1, ... Entry ds ring ptr Free Free Entry Entry ds\_ring\_ptr ds\_ring\_ptr + (1) \* 64 Entry Addresses + (N-1) \* 64 (Wraps every N \* 64 bytes) Free Read Index: Write Index: Entry Entry Where producer can Where consumer can start reading enqueued start enqueueing entries more entries Valid Entry Write Index-1: **Indices** *Index of last enqueued entry* (Do not Wrap) to be read by Consumer Indices are from 0 to  $(2^64)-1$ 

Ring starts at memory location ds ring ptr

- Descriptors are processed (issued) inorder by function.
  - Executed out-of-order.
  - Completed out-of-order.
  - Read\_Index is incremented by SDXI function
- Function may aggressively read valid descriptors...
  - Between Read & Write indices w/o waiting on Doorbells from producers.
  - Doorbell ensures new descriptors are recognized.
- Maximum parallelism of operations.
   Quiescing & Serializing state at welldefined boundaries.



### A Standard Descriptor Format (1)



## A Standard Descriptor Format (2)



### Contexts and SDXI Function Groups



#### Multi-Address Space Data Movement within an SDXI function group (2)





### Active Community of SDXI TWG members

- TWG members have contributed in various ways towards improving the specification since initial contribution by founding members
- The contributions have resulted in improvements in many areas like -
  - Architectural behavior of Administrative Start/Stop operations
  - Architectural Context States and Function States
  - Cache Injection mechanisms
  - Deterministic discovery of SDXI Function Groups
  - Rich Completion Status and Error Reporting
  - Improved mechanisms to control local memory access from remote functions
  - Usage models
  - Interconnect agnostic specification language



#### **Active Contributors**

- Curtis Ballard, HPE
- Beau Beachamp, MemVerge
- Richard Brunner, VMware
- Xiangping Chen, Dell
- Don Dutile, IBM
- Paul Hartke, AMD
- Shyam Iyer, Dell Inc
- Travis Hamilton, Arm
- Brian Hirano, Micron
- Frederick Knight, NetApp
- Santosh Kumar, SK Hynix
- James Leighton, Western Digital
- Bill Martin, Samsung

- John Maroney, Micron
- J Metz, AMD
- William Moyes, AMD
- Philip Ng, AMD
- Murali Ravirala, Microsoft
- Dwight Riley, HPE
- Alexandre Romana, Arm
- Glen Sescila, Dell Inc
- Paul Von Stamwitz, Fujitsu
- Jason Wohlgemuth, Microsoft



#### What to expect: SDXI Futures

- Release v1.0
- Plan post v1.0 activities. The current charter includes:
  - New data mover operations for smart acceleration
  - Data mover operations involving persistent memory targets
  - Cache coherency models for data movers
  - Security Features involving data movers
  - Management architecture for data movers(includes connection manager)
- Some additional discussion topics being considered post-v1.0
  - QoS improvements
  - Latency improvements
  - RAS improvements
  - CXL related discussions
  - Heterogenous environments





#### Links

- SDXI Specification v0.9-rev1 available for Public Review
  - https://www.snia.org/tech\_activities/publicreview
- SNIA SDXI Page
  - https://www.snia.org/sdxi
- PM + CS Summit 2021
  - <a href="https://www.snia.org/educational-library/new-path-better-data-movement-within-system-memory-computational-memory-sdxi">https://www.snia.org/educational-library/new-path-better-data-movement-within-system-memory-computational-memory-sdxi</a>
- SDC 2020 presentation
  - https://www.youtube.com/watch?app=desktop&v=iv2GUfnxG-A
- In memory compute summit
  - https://www.youtube.com/watch?v=iv2GUfnxG-A
- SDC 2021 panel session
  - https://www.youtube.com/watch?v=PrlQZF2a4YI
- New Subgroup! SDXI + CS Subgroup
  - Membership Criteria:
    - Should be a member of SNIA's SDXI TWG & CS TWG



#### Conclusion

- Data in use performance needs are increasing
- Variety of accelerator and memory technologies, standardizing memory to memory data movement and acceleration is gaining ground
- SNIA's SDXI (Smart Data Accelerator Interface) TWG standardizing memory to memory data movement and acceleration
- v0.9-rev1 version of the specification out for public review
  - Various use cases, features enabled
  - https://www.snia.org/tech\_activities/publicreview
- TWG members have discussed their motivation towards contributing to this specification
- Enables persistent memory technologies and computational storage use cases



## Q&A



# Please take a moment to rate this session.

Your feedback is important to us.

