Argument to the collective call should be a table of contiguous tensors located on the different devices.
Example: perform in-place allReduce on the table of tensors:
require 'nccl'nccl.allReduce(inputs)
where inputs is a table of contiguous tensors of the same size located on the different devices.