DPCT1091#

メッセージ#

dpct::segmented_reduce 関数は、DPC++ ネイティブバイナリー操作のみをサポートします。“dpct_placeholder” を DPC++ ネイティブバイナリー操作に置き換えます。

詳細な説明#

dpct::segmented_reduce は、次のネイティブバイナリー操作をサポートします。

sycl::plus
sycl::bit_or
sycl::bit_xor
sycl::bit_and
sycl::maximum
sycl::minimum
sycl::multiplies

修正方法の提案

コードを確認して手動で変更します。

例えば、以下のオリジナル CUDA* コードについて考えてみます。

  struct UserMin { 
   template <typename T> 
   __device__ __host__ __forceinline__ T operator()(const T &a, 
   const T &b) const { 
   return (b < a) ? b : a; 
   } 
  }; 
 
  void foo(int num_segments, int *device_offsets, int *device_in, int *device_out, 
  UserMin min_op, int initial_value) { 
  size_t temp_storage_size; 
  void *temp_storage = nullptr; 
 
  cub::DeviceSegmentedReduce::Reduce(temp_storage, temp_storage_size, device_in, 
  device_out, num_segments, device_offsets, 
  device_offsets + 1, min_op, initial_value); 
 
  cudaMalloc(&temp_storage, temp_storage_size); 
 
  cub::DeviceSegmentedReduce::Reduce(temp_storage, temp_storage_size, device_in, 
  device_out, num_segments, device_offsets, 
  device_offsets + 1, min_op, initial_value); 
 
  cudaDeviceSynchronize(); 
  cudaFree(temp_storage); 
 }

このコードは、以下の SYCL* コードに移行されます。

  struct UserMin { 
   template <typename T> 
   __dpct_inline__ T operator()(const T &a, const T &b) const { 
   return (b < a) ? b : a; 
   } 
  }; 
 
  void foo(int num_segments, int *device_offsets, int *device_in, int *device_out, 
   UserMin min_op, int initial_value) { 
  dpct::device_ext &dev_ct1 = dpct::get_current_device(); 
  sycl::queue &q_ct1 = dev_ct1.in_order_queue(); 
 
  /* 
  DPCT1026:0: The call to cub::DeviceSegmentedReduce::Reduce was removed because 
  this call is redundant in SYCL.
  */ 
 
  /* 
  DPCT1092:1: Consider replacing work-group size 128 with different value for 
  specific hardware for better performance.
  */ 
  /* 
  DPCT1091:2: The function dpct::segmented_reduce only supports DPC++ native 
  binary operation.Replace "dpct_placeholder" with a DPC++ native binary 
  operation.
  */ 
  dpct::device::segmented_reduce<128>( 
    q_ct1, device_in, device_out, num_segments, device_offsets, 
    device_offsets + 1, dpct_placeholder, initial_value); 
 
  dev_ct1.queues_wait_and_throw(); 
 }

このコードは次のように書き換えられます。

  void foo(int num_segments, int *device_offsets, int *device_in, int *device_out, 
   UserMin min_op, int initial_value) { 
   dpct::device_ext &dev_ct1 = dpct::get_current_device(); 
   sycl::queue &q_ct1 = dev_ct1.in_order_queue(); 
 
   int max_work_group_size = dev_ct1.get_max_work_group_size(); 
   if (max_work_group_size >= 256) 
     dpct::device::segmented_reduce<256>( 
       q_ct1, device_in, device_out, num_segments, device_offsets, 
      device_offsets + 1, sycl::minimum(), initial_value); 
  else 
    dpct::device::segmented_reduce<128>( 
      q_ct1, device_in, device_out, num_segments, device_offsets, 
      device_offsets + 1, sycl::minimum(), initial_value); 
 
  dev_ct1.queues_wait_and_throw(); 
 }

インテル® DPC++
互換性ツール・
デベロッパー・ガイド
およびリファレンス

DPCT1091

目次

DPCT1091#

メッセージ#

詳細な説明#

修正方法の提案

インテル® DPC++互換性ツール・デベロッパー・ガイドおよびリファレンス

DPCT1091

目次

DPCT1091#

メッセージ#

詳細な説明#

修正方法の提案

インテル® DPC++
互換性ツール・
デベロッパー・ガイド
およびリファレンス