Data parallel C++ : mastering DPC++ for programming of heterogeneous systems using C++ and SYCL
Author(s)
Bibliographic Information
Data parallel C++ : mastering DPC++ for programming of heterogeneous systems using C++ and SYCL
Apress Open, c2021
- : [pbk.]
Available at 1 libraries
  Aomori
  Iwate
  Miyagi
  Akita
  Yamagata
  Fukushima
  Ibaraki
  Tochigi
  Gunma
  Saitama
  Chiba
  Tokyo
  Kanagawa
  Niigata
  Toyama
  Ishikawa
  Fukui
  Yamanashi
  Nagano
  Gifu
  Shizuoka
  Aichi
  Mie
  Shiga
  Kyoto
  Osaka
  Hyogo
  Nara
  Wakayama
  Tottori
  Shimane
  Okayama
  Hiroshima
  Yamaguchi
  Tokushima
  Kagawa
  Ehime
  Kochi
  Fukuoka
  Saga
  Nagasaki
  Kumamoto
  Oita
  Miyazaki
  Kagoshima
  Okinawa
  Korea
  China
  Thailand
  United Kingdom
  Germany
  Switzerland
  France
  Belgium
  Netherlands
  Sweden
  Norway
  United States of America
Note
"OneAPI"--P. [4] of cover
Includes index
Description and Table of Contents
Description
Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics.
Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices-including GPUs, CPUs, FPGAs and AI ASICs-that are suitable to the problems at hand.
This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations.
Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems.
What You'll Learn
Accelerate C++ programs using data-parallel programming
Target multiple device types (e.g. CPU, GPU, FPGA)
Use SYCL and SYCL compilers
Connect with computing's heterogeneous future via Intel's oneAPI initiative
Who This Book Is For
Those new data-parallel programming and computer programmers interested in data-parallel programming using C++.
Table of Contents
Chapter 1: IntroductionSets expectation that book describes SYCL 1.2.1 with Intel extensions, and that most extensions are proof points of features that should end up in a future version of SYCL. Overview notion of different accelerator architectures doing well on different workloads, and introduce accelerator archs (but don't overdo the topic). Overview/level setting on parallelism and relevant terminology, language landscape, SYCL history.* SYCL key feature overview (single source, C++, multi-accelerator) - intended to draw people in and show simple code* Language versions and extensions covered by this book* Mixed-architecture compute and modern architectures* Classes of parallelism* Accelerator programming landscape (OpenMP, CUDA, TBB, OpenACC, AMD HCC, Kokkos, RAJA)* Evolution of SYCL
Chapter 2: Where code executesDescribes which parts of code run natively on CPU versus on "devices". Differentiate between accelerator devices and the "host device". Show more code to increase reader familiarity with program structure.* Single source programming model* Built-in device selectors* Writing a custom device selector
Chapter 3: Data management and ordering the uses of dataOverview the primary ways that data is accessible by both host and device(s): USM and buffers. Introduce command groups as futures for execution, and concept of dependencies between nodes forming a DAG.* Intro* Unified Shared Memory* Buffers* DAG mechanism
Chapter 4: Expressing parallelismThe multiple alternative constructs for expressing parallelism are hard to comprehend from the spec, and for anyone without major parallel programming experience. This chapter must position the parallelism mechanisms relative to each other, and leave the reader with a conceptual understanding of each, plus an understand of how to use the most common forms.* Parallelism within kernels* Overview of language features for expressions of parallelism* Basic data parallel kernels* Explicit ND-Range kernels* Hierarchical parallelism kernels* Choosing a parallelism/coding style
Chapter 5: Error handlingSYCL uses C++-style error handling. This is different/more modern than people using OpenCL and CUDA are used to. This chapter must frame the differences, and provide samples from which readers can manage exceptions easily in their code.* Exception-based* Synchronous and asynchronous exceptions* Strategies for error management* Fallback queue mechanism
Chapter 6: USM in detailUSM is a key usability feature when porting code, from C++ for example. When mixed with differing hardware capabilities, the USM landscape isn't trivial to understand. This key chapter must leave the reader with an understanding of USM on different hardware capabilities, what is guaranteed at each level, and how to write code with USM features.* Usability* Device capability levels* Allocating memory* Use of data in kernels* Sharing of data between host and devices* Data ownership and migration* USM as a usability feature* USM as a performance feature* Relation to OpenCL SVM
Chapter 7: Buffers in detailBuffers will be available on all hardware, and are an important feature for people writing code that doesn't have pointer-based data structures, particularly when implicit dependence management is desired. This chapter must cover the more complex aspects of buffers in an accessible waym, including when data movement is triggered, sub-buffer dependencies, and advanced host/buffer synchronization (mutexes).* Buffer construction* Access modes (e.g. discard_write) and set_final_data* Device accessors* Host accessors* Sub-buffers for finer grained DAG dependencies* Explicit data motion* Advanced buffer data sharing between device and host
Chapter 8: DAG scheduling in detailMust describe the DAG mechanism from a high level, which the spec does not do. Must describe the in-order simplifications, and common gotchas that people hit with the DAG (e.g. read data before buffer destruction and therefore kernel execution).* Queues* Common gotchas with DAGs* Synchronizing with the host program* Manual dependency management
Chapter 9: Local memory and work-group barriers* "Local" memory* Managing "local" memory* Work-group barriers
Chapter 10: Defining kernels* Lambdas* Functors* OpenCL interop objects
Chapter 11: Vectors* Vector data types* Swizzles* Mapping to hardware
Chapter 12: Device-specific extension mechanism* TBD
Chapter 13: Programming for GPUs* Use of sub-groups* Device partitioning* Data movement* Images and samplers* TBD
Chapter 14: Programming for CPUs* Loop vectorization* Use of sub-groups* TBD
Chapter 15: Programming for FPGAs* Pipes* Memory controls* Loop controls
Chapter 16: Address spaces and multi_ptr* Address spaces* The multi_ptr class* Intefacing with external code
Chapter 17: Using libraries* Linking to external code* Exchanging data with libraries
Chapter 18: Working with OpenCL* Interoperability* Program objects* Build options* Using SPIR-V kernels
Chapter 19: Memory model and atomics* The memory model* Fences* Buffer atomics* USM atomics
by "Nielsen BookData"