module Make:
Parallel functions with communications
val shift : int -> 'a Bsml.par -> 'a Bsml.par
Shifts the values from processes to processes. The parallel cost
is n*p+l where n is the average size of the values.
val shift_right : 'a Bsml.par -> 'a Bsml.par
val shift_left : 'a Bsml.par -> 'a Bsml.par
val totex : 'a Bsml.par -> (int -> 'a) Bsml.par
totex
<v0,...,vp-1> evaluates to <f0,...,fp-1>
such as (fi j)=vj. total_exchhange
<v0,...,vp-1> evaluates to <l0,...,lp-1> such as
the jth element of li is vj.
val total_exchange : 'a Bsml.par -> 'a list Bsml.par
exception Scatter
scatter partition from
<v0,...,vp-1>, scatters the
value vfrom which is partioned by the function partition.
partition v pid
indicates the part of v
which will be send
to process pid
(it is possible to send nothing by using the
value None
). from
must be a valid process number,
otherwise Scatter
is raised.
val scatter : ('a -> int -> 'b option) -> int -> 'a Bsml.par -> 'b Bsml.par
val scatter_list : int -> 'a list Bsml.par -> 'a list Bsml.par
Specialized version for lists, arrays and strings
respectively.
val scatter_array : int -> 'a array Bsml.par -> 'a array Bsml.par
val scatter_string : int -> string Bsml.par -> string Bsml.par
exception Gather
gather dst
<v0,...,vp-1> gathers the values
v0,...,vp-1 to process dst
. With gather
the result
is a function f
such as (f i)
gives vi with i being
a valid process number. With gather_list
the result is the
list v{_0};...;v{_p-1}
. gather_list
corresponds to the
function gather
of BSMLlib 0.1. If dst
is not a valid
process, then Gather
is raised.
val gather : int -> 'a Bsml.par -> (int -> 'a option) Bsml.par
val gather_list : int -> 'a Bsml.par -> 'a list Bsml.par
exception Bcast
bcast_direct root
v0,...,vp-1=vn,...,vn if
root
is a valid process number, otherwise Bcast
is raised.
The parallel cost is size*(p-1)*g+l, where size is the
size of the value vroot.
val bcast_direct : int -> 'a Bsml.par -> 'a Bsml.par
val bcast_totex_gen : ('a -> int -> 'b option) ->
((int -> 'b) -> 'c) -> int -> 'a Bsml.par -> 'c Bsml.par
bcast_totex_gen partition paste root v
broadcasts the value
at process root
of parallel vector v
. The algorithm is the
so called total exchange broadcast. It proceeds in two
super-steps: First the value at process root
is scattered
using partition
. Then those parts are totally exchanged and
pasted. For large values this algorithms is faster than
bcast_direct
.
val bcast_totex_list : int -> 'a list Bsml.par -> 'a list Bsml.par
Specialized versions for lists, arrays, strings and values of
any type (but this general version implies the marshalling of
values and then the use of bcast_totex_string
.
val bcast_totex_array : int -> 'a array Bsml.par -> 'a array Bsml.par
val bcast_totex_string : int -> string Bsml.par -> string Bsml.par
val bcast_totex : int -> 'a Bsml.par -> 'a Bsml.par
val scan_direct : ('a -> 'a -> 'a) -> 'a Bsml.par -> 'a Bsml.par
If op
is an associative operation, scan_direct op
<v0,...,vp-1> = <s0,...,sp-1> where si=op
vk. Communication cost: (p-1)*n*g+l where
n is the average size of values vi.
val scan_logp : ('a -> 'a -> 'a) -> 'a Bsml.par -> 'a Bsml.par
Computes the same result than scan_direct
but with
communication cost: i (log p)*2*n*g+l.
val scan_wide : (('a -> 'a -> 'a) -> 'a Bsml.par -> 'a Bsml.par) ->
(('a -> 'a -> 'a) -> 'b -> 'b) ->
('b -> 'a) ->
(('a -> 'a) -> 'b -> 'b) -> ('a -> 'a -> 'a) -> 'b Bsml.par -> 'b Bsml.par
scan_wide par_scan seq_scan last_element map op vv
is used
to compute a parallel scan over a parallel vector of
collections of values. par_scan
is the parallel scan
used. seq_scan
is the sequential scan used. last_element
is a function which return the last element of a
collection. map
is a map function over the collection, op
is the operation used for the reduction and vv
is the
parallel vector of collections.
val scan_wide_direct : (('a -> 'a -> 'a) -> 'b -> 'b) ->
('b -> 'a) ->
(('a -> 'a) -> 'b -> 'b) -> ('a -> 'a -> 'a) -> 'b Bsml.par -> 'b Bsml.par
Specialized version of scan_wide
using scan_direct
as
parallel scan.
val scan_wide_logp : (('a -> 'a -> 'a) -> 'b -> 'b) ->
('b -> 'a) ->
(('a -> 'a) -> 'b -> 'b) -> ('a -> 'a -> 'a) -> 'b Bsml.par -> 'b Bsml.par
Specialized version of scan_wide
using scan_logp
as
parallel scan.
val scan_list_direct : ('a -> 'a -> 'a) -> 'a list Bsml.par -> 'a list Bsml.par
val scan_list_logp : ('a -> 'a -> 'a) -> 'a list Bsml.par -> 'a list Bsml.par
val scan_array_direct : ('a -> 'a -> 'a) -> 'a array Bsml.par -> 'a array Bsml.par
val scan_array_logp : ('a -> 'a -> 'a) -> 'a array Bsml.par -> 'a array Bsml.par
Folds. Similar to scans except that the produced vector
contains the same value everywhere. This value is the value at
the last process if a scan was computed (non wide case) or the
value of the last element of the collection at the last
processor if a wide scan was computed
val fold_direct : ('a -> 'a -> 'a) -> 'a Bsml.par -> 'a Bsml.par
val fold_wide : (('a -> 'a -> 'a) -> 'a Bsml.par -> 'a Bsml.par) ->
(('a -> 'a -> 'a) -> 'b -> 'a) ->
('a -> 'a -> 'a) -> 'b Bsml.par -> 'a Bsml.par
val fold_logp : ('a -> 'a -> 'a) -> 'a Bsml.par -> 'b Bsml.par
val fold_list_direct : ('a -> 'a -> 'a) -> 'a list Bsml.par -> 'a Bsml.par
val fold_list_logp : ('a -> 'a -> 'a) -> 'a list Bsml.par -> 'b Bsml.par
val fold_array_direct : ('a -> 'a -> 'a) -> 'a array Bsml.par -> 'b Bsml.par
val fold_array_logp : ('a -> 'a -> 'a) -> 'a array Bsml.par -> 'b Bsml.par