CiNii Books - Data parallel C++ : mastering DPC++ for programming of heterogeneous systems using C++ and SYCL

Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices-including GPUs, CPUs, FPGAs and AI ASICs-that are suitable to the problems at hand. This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations. Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems. What You'll Learn Accelerate C++ programs using data-parallel programming Target multiple device types (e.g. CPU, GPU, FPGA) Use SYCL and SYCL compilers Connect with computing's heterogeneous future via Intel's oneAPI initiative Who This Book Is For Those new data-parallel programming and computer programmers interested in data-parallel programming using C++.

Table of Contents

Chapter 1: IntroductionSets expectation that book describes SYCL 1.2.1 with Intel extensions, and that most extensions are proof points of features that should end up in a future version of SYCL. Overview notion of different accelerator architectures doing well on different workloads, and introduce accelerator archs (but don't overdo the topic). Overview/level setting on parallelism and relevant terminology, language landscape, SYCL history.* SYCL key feature overview (single source, C++, multi-accelerator) - intended to draw people in and show simple code* Language versions and extensions covered by this book* Mixed-architecture compute and modern architectures* Classes of parallelism* Accelerator programming landscape (OpenMP, CUDA, TBB, OpenACC, AMD HCC, Kokkos, RAJA)* Evolution of SYCL Chapter 2: Where code executesDescribes which parts of code run natively on CPU versus on "devices". Differentiate between accelerator devices and the "host device". Show more code to increase reader familiarity with program structure.* Single source programming model* Built-in device selectors* Writing a custom device selector Chapter 3: Data management and ordering the uses of dataOverview the primary ways that data is accessible by both host and device(s): USM and buffers. Introduce command groups as futures for execution, and concept of dependencies between nodes forming a DAG.* Intro* Unified Shared Memory* Buffers* DAG mechanism Chapter 4: Expressing parallelismThe multiple alternative constructs for expressing parallelism are hard to comprehend from the spec, and for anyone without major parallel programming experience. This chapter must position the parallelism mechanisms relative to each other, and leave the reader with a conceptual understanding of each, plus an understand of how to use the most common forms.* Parallelism within kernels* Overview of language features for expressions of parallelism* Basic data parallel kernels* Explicit ND-Range kernels* Hierarchical parallelism kernels* Choosing a parallelism/coding style Chapter 5: Error handlingSYCL uses C++-style error handling. This is different/more modern than people using OpenCL and CUDA are used to. This chapter must frame the differences, and provide samples from which readers can manage exceptions easily in their code.* Exception-based* Synchronous and asynchronous exceptions* Strategies for error management* Fallback queue mechanism Chapter 6: USM in detailUSM is a key usability feature when porting code, from C++ for example. When mixed with differing hardware capabilities, the USM landscape isn't trivial to understand. This key chapter must leave the reader with an understanding of USM on different hardware capabilities, what is guaranteed at each level, and how to write code with USM features.* Usability* Device capability levels* Allocating memory* Use of data in kernels* Sharing of data between host and devices* Data ownership and migration* USM as a usability feature* USM as a performance feature* Relation to OpenCL SVM Chapter 7: Buffers in detailBuffers will be available on all hardware, and are an important feature for people writing code that doesn't have pointer-based data structures, particularly when implicit dependence management is desired. This chapter must cover the more complex aspects of buffers in an accessible waym, including when data movement is triggered, sub-buffer dependencies, and advanced host/buffer synchronization (mutexes).* Buffer construction* Access modes (e.g. discard_write) and set_final_data* Device accessors* Host accessors* Sub-buffers for finer grained DAG dependencies* Explicit data motion* Advanced buffer data sharing between device and host Chapter 8: DAG scheduling in detailMust describe the DAG mechanism from a high level, which the spec does not do. Must describe the in-order simplifications, and common gotchas that people hit with the DAG (e.g. read data before buffer destruction and therefore kernel execution).* Queues* Common gotchas with DAGs* Synchronizing with the host program* Manual dependency management Chapter 9: Local memory and work-group barriers* "Local" memory* Managing "local" memory* Work-group barriers Chapter 10: Defining kernels* Lambdas* Functors* OpenCL interop objects Chapter 11: Vectors* Vector data types* Swizzles* Mapping to hardware Chapter 12: Device-specific extension mechanism* TBD Chapter 13: Programming for GPUs* Use of sub-groups* Device partitioning* Data movement* Images and samplers* TBD Chapter 14: Programming for CPUs* Loop vectorization* Use of sub-groups* TBD Chapter 15: Programming for FPGAs* Pipes* Memory controls* Loop controls Chapter 16: Address spaces and multi_ptr* Address spaces* The multi_ptr class* Intefacing with external code Chapter 17: Using libraries* Linking to external code* Exchanging data with libraries Chapter 18: Working with OpenCL* Interoperability* Program objects* Build options* Using SPIR-V kernels Chapter 19: Memory model and atomics* The memory model* Fences* Buffer atomics* USM atomics

by "Nielsen BookData"

Data parallel C++ : mastering DPC++ for programming of heterogeneous systems using C++ and SYCL

Author(s)

Bibliographic Information

Available at 1 libraries

Search this Book/Journal

Note

Description and Table of Contents

Details

Export