With the wide expansion of multiprocessor architectures, the analysis and reasoning for programs under weak memory models has become an important concern. This work presents MMFilter, an original constraint solver for generating program behaviors respecting a particular memory model. It is implemented in Prolog using CHR (Constraint Handling Rules). The CHR formalism provides a convenient generic solution for specifying memory models. It benefits from the existing optimized implementations of CHR and can be easily extended to new models. We present MMFilter design, illustrate the encoding of memory model constraints in CHR and discuss the benefits and limitations of the proposed technique.
SyDPaCC is a set of libraries for the Coq proof assistant. It allows to write naive functional programs (i.e. with high complexity) that are considered as specifications, and to transform them into more efficient versions. These more efficient versions can then be automatically parallelised before being extracted from Coq into source code for the functional language OCaml together with calls to the Bulk Synchronous Parallel ML (BSML) library. In this paper we present a new core version of SyDPaCC for the development of parallel programs correct-by-contruction using the theory of list homomorphisms and algorithmic skeletons implemented and verified in Coq. The framework is illustrated on the the maximum prefix sum problem.
Bulk synchronous parallelism (BSP) offers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application domains. BSPlib and its variants are programming libraries for the C language that support the BSP style. Bulk Synchronous Parallel ML (BSML) is a library for BSP programming with the functional language OCaml. It offers parallel operations on a data structure named parallel vector. BSML provides a global view of programs, i.e. BSML programs can be seen as sequential programs working on a parallel data structure (seq of par) while a BSPlib program is written in the SPMD style and understood as a parallel composition of communicating sequential programs (par of seq). The communication styles of BSML and BSPlib are also quite different. The contribution of this paper is a BSPlib-style communication API implemented on top of BSML. It has been designed without extending BSML, but only using the imperative features of the underlying function language OCaml. Programs implemented using this API are very close to programs implemented using a BSPlib library for the C language. It therefore shows that BSML is universal for the BSP model.
Bulk-Synchronous Parallel (BSP) is a bridging model between abstract execution and concrete parallel architectures that retains enough information to model performance scalability. In order to model BSP program executions, Hains adapted the theory of finite automata to the BSP paradigm by introducing BSP automata. In the initial description of the theory, BSP automata had to be explicitly defined as structured sets of states and transitions. The lack of a clean and efficient algorithm for generating them from regular expressions would have prevented this theory from being used in practice. To alleviate this problem we introduce in this paper an algorithm that generates a BSP automaton recognizing a BSP language defined by a BSP regular expression. The main practical use of BSP automata developed in this paper is the parallelization of finite state automata with an explicit distribution and a performance model, that enable parallel matching of regular expressions. Secondarily, BSP regular expressions provide a convenient structure for automatic code generation of imperative BSP program that is also developed in this paper.
The multi-core trend is widening the gap between programming languages and hardware. Taking parallelism into account in the programs is necessary to improve performance. Unfortunately, current mainstream programming languages fail to provide suitable abstractions to do so. The most common pattern relies on the use of mutexes to ensure mutual exclusion between concurrent accesses to a shared memory. However, this model is error-prone and scales poorly by lack of modularity. Recent research proposes atomic sections as an alternative. The user simply delimits portions of code that should be free from interference. The responsibility for ensuring interference freedom is left either to the compiler or to the run-time system. In order to provide enough modularity, it is necessary that both atomic sections could be nested and threads could be forked inside an atomic section. In this paper we focus on the semantics of programming languages providing these features. More precisely, without being tied to a specific programming language, we consider program traces satisfying some basic well-formedness conditions. Our main contribution is the precise definition of atomicity, well-synchronisation and the proof that the latter implies the strong form of the former. A formalisation of our results in the Coq proof assistant is described.
Bulk Synchronous Parallel ML is a high-level language for programming parallel algorithms. Built upon OCaml and using the BSP model, it provides a safe setting for their implementation, avoiding concurrency related problems (deadlocks, indeterminism). Only a limited set of the features of OCaml can be used in BSML to respect its properties of safety: this paper describes a way to add exception handling to this set by extending and adapting OCaml's exceptions. The behaviour of these new exceptions and the syntactic constructs to handle them, together with their implementation, are described in detail, and results over an example are given.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a data structure named parallel vector, which is given by intention. In order to have an execution that follows the BSP model, and to have a simple cost model, nesting of parallel vectors is not allowed. The novelty of this paper is a type system which prevents such nesting. This system is correct w.r.t. the dynamic semantics.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel algorithms, a model of computing. Dead-locks and indeterminism are avoided and the execution time can be then estimated. For large scale applications where parallel processing is helpful, more than one parallel machine is needed. One speaks about metacomputing. A major problem in programming parallel application for such platforms is their hierarchical network structures: latency and bandwidth of global ones are orders of magnitude worse than those of local networks. Here we consider how an extention of the well-suited BSP model might be for these kinds of computing and how extended our functional language can be for this.
This paper presents a new functional parallel language: Minimally Synchronous Parallel ML (MSPML). The execution time can be estimated, dead-locks and indeterminism are avoided. Programs are written as usual ML programs but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on a parallel data structure. It follows the execution and cost model of the Message Passing Machine model (MPM). It shares with Bulk Synchronous Parallel ML its syntax and high-level semantics but it has a minimally synchronous distributed semantics. Experiments have been run on a cluster of PC using an implementation of the Diffusion algorithmic skeleton.
A framework is presented for designing parallel programming languages whose semantics is functional and where communications are explicit. To this end, Brookes and Geva's generalized concrete data structures are specialized with a notion of explicit data layout to yield a CCC of distributed structures called arrays. Arrays' symmetric replicated structures, suggested by the data-parallel SPMD paradigm, are found to be incompatible with sum types. We then outline a functional language with explicitly-distributed (monomorphic) concrete types, including higher-order, sum and recursive ones. In this language, programs can be as large as the network and can observe communication events in other programs. Such flexibility is missing from current data-parallel languages and amounts to a fusion with their so-called annotations, directives or meta-languages.
The BSlambda_p-calculus is a calculus of functional bulk synchronous parallel (BSP) programs. It is the basis for the design of a bulk synchronous parallel ML language. For data-parallel languages, there are two points of view: the programming model where a program is seen as a sequence of operations on parallel vectors, and the execution model where the program is a parallel composition of programs run on each processor of the parallel machine. BSP algorithms are defined by data-parallel algorithms with explicit (physical) processes in order to allow their parallel execution time to be estimated. We present here a distributed evaluation minimally synchronous for BSP execution (which corresponds to the execution model). This distributed evaluation is correct w.r.t. the call-by-value strategy of the BSlmabda_p-calculus (which corresponds to the programming model).
An extension of the lambda-calculus called BSlambda is introduced as a formal basis for functional languages expressing bulk synchronous parallel algorithms. A confluence result is shown. The application of the calculus is illustrated by examples of program proofs and the associated notion of parallel reduction. The reduction process is interpreted in the BSP cost model.
Selected extended and revised papers from the International Workshop and the ACM SAC track on Practical Aspects of High-Level Parallel Programming
Internet of Things (IoT) applications are becoming increasingly critical and require rigorous formal verification. In this paper we target Contiki, a widely used open-source OS for IoT, and present a verification case study of one of its most critical modules: that of linked lists. Its API and list representation differ from the classical linked list implementations, and are particularly challenging for deductive verification. The proposed verification technique relies on a parallel view of a list through a companion ghost array. This approach makes it possible to perform most proofs automatically using the Frama-C/WP tool, only a small number of auxiliary lemmas being proved interactively in the Coq proof assistant. We present an elegant segment-based reasoning over the companion array developed for the proof. Finally, we validate the proposed specification by proving a few functions manipulating lists.
Internet of Things (IoT) applications are becoming increasingly critical and require formal verification. Our recent work presented formal verification of the linked list module of Contiki, an OS for IoT. It relies on a parallel view of a linked list via a companion ghost array and uses an inductive predicate to link both views. A few interactively proved lemmas allow for an automatic proof of other specifications, expressed in the ACSL specification language and proved with the Frama-C/WP tool. In a broader verification context, especially as long as the whole system is not yet formally verified, it would be very useful to use runtime verification, in particular, to test client modules that use the list module. It is not possible with the current specifications, which include an inductive predicate and axiomatically defined functions. In this early-idea paper we show how to define a provably equivalent non-inductive predicate and a provably equivalent non-axiomatic function that belong to the executable subset of ACSL and can be transformed into executable C code. Finally, we propose an extension of Frama-C to handle both axiomatic specifications for deductive verification and executable specifications for runtime verification.
OCaml is a multi-paradigm (functional, imperative, object-oriented) high level sequential language. Types are statically inferred by the compiler and the type system is expressive and strong. These features make OCaml a very productive language for developing efficient and safe programs. In this tutorial we present three frameworks for using OCaml to program scalable parallel architectures: BSML, Multi-ML and Spoc.
This paper is a tutorial introduction to Frama-C, a framework for the analysis and verification of sequential C programs, and in particular its EVA, WP, and E-ACSL plugins. The examples are drawn from Contiki, a lightweight operating system for the Internet of Things.
Cloud Computing has known in the past few years a fast development, leading to a spike in the number of companies competing on providing the best Cloud services. This makes it harder for potential Cloud customers to chose the adequate provider. Despite its wide adoption, many are still hesitant due to the security issues Cloud Computing poses. In this paper, we propose a brokerage solution that formalizes security properties under the form of inter-VM relations, and gives the possibility of setting these security requirements to its customers from the first steps. This solution uses formal methods and the finite model finder KodKod to verify the consistency of the customer's requirements, and to find a placement for his deployment model.
The BSP model (Bulk Synchronous Parallel) simplifies the construction and evaluation of parallel algorithms, with its simplified synchronization structure and cost model. Nevertheless, imperative BSP programs can suffer from synchronization errors. Programs with textually aligned barriers are free from such errors, and this structure eases program comprehension. We propose a simplified formalization of barrier inference as data flow analysis, which verifies statically whether an imperative BSP program has replicated synchronization, which is a sufficient condition for textual barrier alignment.
This paper presents an extension of a library for the Coq interactive theorem prover that enables the development of correct functional parallel programs based on sequential program transformation and automatic parallelization using an algorithmic skeleton named accumulate. Such an algorithmic skeleton is a pattern of a parallel algorithm that is provided as a high-order function implemented in parallel. The use of this framework is illustrated with the bracket matching problem, including experiments on a parallel machine.
Bulk synchronous parallelism (BSP) oﬀers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application domains. BSPlib and its variants are programming libraries for the C language that support the BSP style. Bulk Synchronous Parallel ML (BSML) is a library for BSP programming with the functional language OCaml. It is based on an extension of the lambda-calculus by parallel operations on a data structure named parallel vector. BSML oﬀers a global view of programs, i.e. BSML programs can be seen as sequential programs working on a parallel data structure (seq of par) while a BSPlib program is written in the SPMD style and understood as a parallel composition of communicating sequential programs (par of seq). The communication styles of BSML and BSPlib are also quite diﬀerent.
Skeletal parallelism offers a good trade-off between programming productivity and execution efficiency. In this style of parallelism, an application is a composition of algorithmic skeletons. An algorithmic skeleton captures a pattern of parallel algorithm on a distributed data structure, and is also often associated with a sequential algorithm on a sequential data structure that is the counterpart of the parallel data structure. The algorithmic skeleton approach has been inspired by functional programming. It is therefore very natural to embed algorithmic skeletons in a functional programming language. In this paper we present a new algorithmic skeleton library for the statically typed functional language OCaml, and illustrate its use on some applications. This functional skeletal parallel programming library is implemented using the Bulk Synchronous Parallel ML parallel programming library for OCaml.
Parallel programs based on the Divide&Conquer paradigm could be successfully defined in a simple way using powerlists. These parallel recursive data structures and their algebraic theories offer both a methodology to design parallel algorithms and parallel programming abstractions to ease the development of parallel applications. The paper presents how programs based on powerlists can be implemented in Java using the framework we developed. The design of this framework is based on powerlists theory, but in the same time follows the object-oriented design principles that provide flexibility and maintainability. Examples are given and performance experiments are conducted. The results emphasise the utility and the efficiency of the framework.
Frama-C is an extensible modular framework for analysis of C programs that offers different analyzers in the form of collaborating plugins. Currently, Frama-C does not support the proof of functional properties of concurrent code. We present conc2seq, a new code transformation based tool realized as a Frama-C plugin and dedicated to the verification of concurrent C programs. Assuming the program under verification respects an interleaving semantics, conc2seq transforms the original concurrent C program into a sequential one in which concurrency is simulated by interleavings. User specifications are automatically reintegrated into the new code without manual intervention. The goal of the proposed code transformation technique is to allow the user to reason about a concurrent program through the interleaving semantics using existing Frama-C analyzers.
Bulk-Synchronous Parallel (BSP) is a bridging model between abstract execution and concrete parallel architecture. In order to model BSP program executions Hains adapted the finite automata theory to the BSP paradigm by introducing BSP automata theory. Benefit provided by BSP automata is twofold: modeling BSP program control and parallelizing finite state automata. The lack of generation algorithm of BSP automata and the illusory lack of application of this theory has been preventing this theory from being used. We propose in this paper an algorithm that generates a BSP automaton recognizing a defined BSP language. In order to demonstrate the usefulness of BSP automata and help to design its use, two applications of the BSP automata theory are provided. The parallel recognition of an expression and debugging of a BSP program.
For the sake of modularity, programming languages with atomic sections should offer nesting and inner parallelism in such sections. The possible escape of a thread created inside a section makes these languages also more expressive. In this paper we address the compilation of such a language towards a language with threads and locks. The design decisions of this compilation pass and of the target language were made with respect to the ultimate goal of a mechanised proof of semantic preservation.
It is difficult for customers to select the adequate cloud providers which fit their needs, as the number of cloud offerings increases rapidly. Many works thus focus on the design of cloud brokers. Unfortunately, most of them do not consider precise security requirements of customers. In this paper, we propose a methodology defined to place services in a multi-provider cloud environment, based on functional and non-functional requirements, including security requirements. To eliminate inner conflicts within customers requirements, and to match the cloud providers offers with these customers requirements, we use a formal analysis tool: Alloy. The broker uses a matching algorithm to place the required services in the adequate cloud providers, in a way that fulfills all customer requirements. We finally present a prototype implementation of the proposed broker.
For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.
We consider a simple imperative language with fork/join parallelism and lexically scoped nested atomic sections from which threads can escape. In this context, our contribution is the precise definition of atomicity, well-synchronisation and the proof that the latter implies the strong form of the former. A formalisation of our results in the Coq proof assistant is also available.
The integration of the generate-and-test paradigm and semi-rings for the aggregation of results provides a parallel programming framework for large scale data-intensive applications. The so-called GTA framework allows a user to define an inefficient specification of his/her problem as a composition of a generator of all the candidate solutions, a tester of valid solutions, and an aggregator to combine the solutions. Through two calculation theorems a GTA specification is transformed into a divide-and-conquer efficient program that can be implemented in parallel. In this paper we present a verified implementation of this framework in the Coq proof assistant: efficient bulk synchronous parallel functional programs can be extracted from naive GTA specifications. We show how to apply this framework on an example, including performance experiments on parallel machines.
Research on high-level parallel programming approaches systematically evaluate the performance of applications written using these approaches and informally argue that high-level parallel programming languages or libraries increase the productivity of programmers. In this paper we present a methodology that allows to evaluate the trade-off between programming effort and performance of applications developed using different programming models. We apply this methodology on some implementations of a function solving the all nearest smaller values problem. The high-level implementation is based on a new version of the BSP homomorphism algorithmic skeleton.
The latest developments of the computation systems impose using tools and methodologies able to simplify the development process of parallel software, but also to assure a high level of performance and robustness. Powerlists and their variants are data structures that can be successfully used in a simple, provably correct, functional description of parallel programs, which are divide and conquer in nature. They represent one of the high-level algebraic theories which are appropriate to be used as fundamentals for a model of parallel computation that assures correctness proving. The paper presents how programs defined based on powerlists could be transformed to real code in the functional language OCaml plus calls to the parallel functional programming library Bulk Synchronous Parallel ML. BSML functions follow the BSP model requirements, and so its advantages are introduced in OCaml parallel code. The transformations are based on a framework that assures: simple, correct, efficient implementation. Examples are given and concrete experiments for their executions are conducted. The results emphasise the utility and the efficiency of the framework.
The All Nearest Smaller Values (ANSV) problem is an important problem for parallel programming as it can be used to solve several problems and is one of the phases of several other parallel algorithms. We formally develop by construction a functional parallel program for solving the ANSV problem using the theory of Bulk Synchronous Parallel (BSP) homomorphisms within the Coq proof assistant. The performances of the Bulk Synchronous Parallel ML program obtained from Coq is compared to a version derived without software support (pen-and-paper) and implemented using the Orléans Skeleton Library of algorithmic skeletons, and to a (unproved correct) direct implementation of the BSP algorithm of He and Huang.
We consider a simple imperative language with fork/join parallelism and lexically scoped nested atomic sections from which threads can escape. In this context, our contribution is a formal operational semantics of this language that satifies a specification on execution traces designed in a companion paper.
Algorithmic skeletons in conjunction with list homomorphisms play an important role in formal development of parallel algorithms. We have designed a notion of homomorphism dedicated to bulk synchronous parallelism. In this paper we derive two application using this theory: sparse matrix vector multiplication and the all nearest smaller values problem. We implement a support for BSP homomorphism in the Orléans Skeleton Library and experiment it with these two applications.
Structured parallel models such as algorithmic skeletons offer a global view of the parallel program in contrast with the fragmented view of the SPMD style. This makes program easier to write and to read for users, and offer additional opportunities for optimisation done by the libraries, compilers and/or run-time systems. Algorithmic skeletons are or can be seen as patterns or higher-order functions implemented in parallel, often manipulating distributed data structures. Orléans Skeleton Library (OSL) is a library of parallel algorithmic skeletons, written in C++ on top of MPI, which uses meta-programming techniques for optimisation. Often such libraries have no or limited support for arbitrary distributions of the data structures. In this paper we detail the new OSL skeletons used to manage arbitrary distributions of distributed arrays. We present a parallel regular sampling sort application as an example of application that requires such skeletons.
Exception handling is a traditional and natural mechanism to manage errors and events that disrupt the normal flow of program instructions. In most concurrent or parallel systems, exception handling is done locally or sequentially, and cannot guarantee the global coherence of the system after an exception is caught. Working with a structured parallel model is an advantage in this respect. Algorithmic skeletons, that are patterns of parallel algorithms on distributed data structures, offer such a structured model. However very few algorithmic skeleton libraries provide a specific parallel exception mechanism, and no C++-based library. In this paper we propose the design of an exception mechanism for the C++ Orléans Skeleton Library that ensures the global coherence of the system after exceptions are caught. We explain our design choices, experiment on the performance penalty of its use, and we illustrate how to purposefully use this mechanism to extract the results in the course of some algorithms.
For parallel programs, correctness by construction is an essential feature since debugging is almost impossible. Building correct programs by construction is not a simple task, and usually the methodologies used for this purpose are rather theoretical and based on a pen-and-paper style. A better approach could be based on tools and theories that allow a user to develop an efficient parallel application by easily implementing simple programs satisfying conditions, ideally automatically proved. Powerlists theory and its variants represent a good theoretical base for such an approach, and the Coq proof assistant is a tool that could be used for automatic proofs. The goal of this paper is to model the powerlist theory in Coq, and to use this modelling to program and reason about parallel programs in Coq. This represents the first step in building a framework to ease the development of correct and verifiable parallel programs.
To make parallel programming as widespread as parallel architectures, more structured parallel programming paradigms are necessary. One of the possible approaches are Algorithmic skeletons that are abstract parallel patterns. They can be seen as higher order functions implemented in parallel. Algorithmic skeletons offer a simple interface to the programmer without all the details of parallel implementations as they abstract the communications and the synchronisations of parallel activities. To write a parallel program, users have to combine and compose the skeletons. Orléans Skeleton Library (OSL) is an efficient meta-programmed C++ library of algorithmic skeletons that manipulate distributed arrays. A prototype implementation of OSL exists as a library written with the function parallel language Bulk Synchronous Parallel ML. In this paper we are interested in verifying the correctness of a subset of this prototype implementation. To do so, we give a functional specification (i.e. without the parallel details) of a subset of OSL and we prove the correctness of the BSML implementation with respect to this functional specification, using the Coq proof assistant. To illustrate how the user could use these skeletons, we prove the correctness of two applications implemented with them: a heat diffusion simulation and the maximum segment sum problem.
Orléans Skeleton Library (OSL) provides a set of algorithmic skeletons as a C++ library on top of MPI. The parallel programming approach of OSL is a structure one, which eases the reasoning about the performance and correctness of the programs. In this paper we present the verification of a heat diffusion simulation program, written in OSL, using the Coq proof assistant. The specification used in proving the correctness is a discretisation of the heat equation.
As the usage of cloud becomes pervasive in our lives, it is needed to ensure the reliability, safety and security of cloud environments. In this paper we study a usual software stack of a cloud environment from the perspective of formal verification. We argue that most of the layers could be practically formally verified, even if the work to verify all levels is huge.
Orléans Skeleton Library (OSL) is a library of parallel algorithmic skeletons in C++ on top of MPI. It provides a structured approach towards parallel programming. Skeletons in OSL are based over the bulk synchronous parallelism model. Applications can be developed using different combinations and compositions of the skeletons. This paper illustrates the expressivity and performance predictability of OSL with two applications: a two dimensional heat diffusion simulation, and an exact n-body simulation. Experiments using these applications are performed on parallel machines.
Orléans Skeleton Library (OSL) is a library of parallel algorithmic skeletons in C++ on top of MPI. It provides a structured approach towards parallel programming. Skeletons in OSL are based on the bulk synchronous parallelism model. In this paper we present formal semantics of OSL: a programming model and its properties proved with the Coq assistant.
Bulk Synchronous Parallel ML (BSML) is a structured parallel functional programming language. It extends a functional programming language of the ML family with a polymorphic data structure and a very small set of primitives. In this paper we describe a framework for reasoning about BSML programs using the Coq interactive theorem prover and for extracting actual parallel programs from proofs. This framework is illustrated through a simulation application based on heat equation.
With the current generalisation of parallel architectures arises the concern of applying formal methods to parallelism. The complexity of parallel, compared to sequential, programs makes them more error-prone and difficult to verify. Bulk Synchronous Parallelism (BSP) is a model of computation which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. We propose a framework for refining a sequential specification toward a functional BSP program, the whole process being done with the help of the Coq proof assistant. To do so we define BH, a new homomorphic skeleton, which captures the essence of BSP computation in an algorithmic level, and also serves as a bridge in mapping from high level specification to low level BSP parallel programs.
Program calculation, being a programming technique that derives programs from specification by means of formula manipulation, is a challenging activity. It requires human insights and creativity, and needs systems to help human to focus on clever parts of the derivation by automating tedious ones and verifying correctness of transformations. Different from many existing systems, we show in this paper that Coq, a popular theorem prover, provides a cheap way to implement a powerful system to support program calculation, which has not been recognized so far. We design and implement a set of tactics for the Coq proof assistant to help the user to derive programs by program calculation and to write proofs in calculational form. The use of these tactics is demonstrated through program calculations in Coq based on the theory of lists.
The existing solutions to program parallel architectures range from parallelizing compilers to distributed concurrent programming. Intermediate approaches propose a more structured parallelism: Algorithmic skeletons are higher-order functions that capture the patterns of parallel algorithms. The user of the library has just to compose some of the skeletons to write her parallel application. When one is designing a parallel program, the parallel performance is important. It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model such as the Bulk Synchronous Parallel (BSP) model. We present OSL, the Orléans Skeleton Library: it is a library of BSP algorithmic skeletons in C++. It offers data-parallel skeletons on arrays as well as communication oriented skeletons. The performances of OSL are demonstrated with two applications: heat equation and FFT.
Bulk Synchronous Parallel ML is a high-level language for programming parallel algorithms. Built upon OCaml and using the BSP model, it provides a safe setting for their implementation, avoiding concurrency related problems (deadlocks, indeterminism). Only a limited set of the features of OCaml can be used in BSML to respect its properties of safety: this paper describes a way to add exception handling to this set by extending and adapting OCaml's exceptions. After a precise definition of the problems that arise and an informal description of the solutions, an extension of BSML is proposed. Formal semantics define the behaviour in all possible cases, followed by a short description of the implementation.
Bulk Synchronous Parallel ML is a high-level language for programming parallel algorithms. Building upon OCaml and using the BSP model, it provides a safe setting for their implementation, avoiding concurrency related problems (deadlocks, indeterminism). Only a limited set of the features of OCaml can be used in BSML to respect its safety: this paper describes a way to add exception handling to this set by extending and adapting OCaml's exceptions. The behaviour of these new exceptions and the syntactic constructs to handle them, together with their implementation, are described in detail, and results over an example are given.
The Bulk Synchronous Parallel ML (BSML) library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the lambda-calculus by parallel operations on a parallel data structure named parallel vector. This paper presents the execution model, given as a distributed semantics, of the parallel juxtaposition, a primitive of parallel composition for BSML. It also presents an implementation, which follows the semantics, and experiments using the parallel juxtaposition.
The BSMLlib is a library for parallel programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector. The execution time can be estimated, dead-locks and indeterminism are avoided. Programs are written as usual functional programs (in Objective Caml) but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on parallel vectors. It follows the execution and cost model of the Bulk Synchronous Parallel model. The paper presents the lastest implementation of this library and experiments of performance prediction.
Bulk Synchronous Parallel ML is a functional parallel language based on the Bulk Synchronous Parallelism model of computation. Deadlocks are avoided and programs are deterministic. The performance of programs can be accurately predicted. This paper addresses the optimization of BSML programs through the compilation of BSML primitives to Objective Caml code with calls to a low level library rather than their direct implementation as a high level library.
The Bulk Synchronous Parallel ML (BSML) is a functional language for Bulk Synchronous Parallel (BSP) programming, on top of the sequential functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. The Objective Caml language is a functional language but it also offers imperative features. This paper presents formal semantics of BSML with references, assignment and dereferencing.
The BSlambdasigmaUparrow-calculus calculus is a calculus of functional BSP programs with enumerated parallel vectors and explicit substitution. This confluent calculus is defined and proved confluent. These results constitute the core of a formal design for a Bulk Synchronous Parallel dialect of ML (BSML) as well as a framework for proving parallel abstract machine which can evaluate BSML programs.
This paper presents the design of the core of a parallel programming language called CDS*. It is based on explicitly-distributed concrete data structures and features compositional semantics, higher-order functions and explicitly distributed objects. The denotational semantics is outlined, the (equivalent) operational semantics is presented and a new realization of the latter is given as a rewriting system. Simulated execution of examples illustrates the language's flexibility and explicit control of data layout on the parallel network.
Minimally Synchronous Parallel ML is a functional parallel language whose execution time can then be estimated and dead-locks and indeterminism are avoided. Programs are written as usual ML programs but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on a parallel data structure. It follows the cost model of the Message Passing Machine model (MPM). In the current implementation, the asynchrony is limited by a parameter called the asynchrony depth. When processes reach this depth a global synchronization occurs. This is necessary to avoid memory leak. In this paper we propose a mechanism to avoid such synchronization barriers. This mechanism relies on a more complex management of the communication environments but with a small and parametrized overhead.
Minimally Synchronous Parallel ML is a functional parallel language whose execution time can then be estimated and dead-locks and indeterminism are avoided. Programs are written as usual ML programs but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on a parallel data structure. It follows the cost model of the Message Passing Machine model (MPM). This paper explore the writing of an additional communication function using this small set of primitives. It could also be considered as a primitive rather than a function written using primitives. So we developed a low-level implementation of this function and compared it with the high-level implementation in terms of efficiency.
This paper presents a new functional parallel language: Minimally Synchronous Parallel ML. It shares with Bulk Synchronous Parallel ML its syntax and high-level semantics but it has a minimally synchronous distributed semantics. It follows the cost model of the Message Passing Machine model (MPM).
This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence of deadlock. Broadcast algorithms are implemented using the core library.
This paper presents an extension of the functional parallel language BSML (Bulk Synchronous Parallel ML) and of the BSlambda-calculus (a calculus of functional bulk synchronous parallel programs) with pattern matching of parallel values.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. In a direct-mode BSP algorithm, the physical structure of processes is made explicit. The execution time can then be estimated and dead-locks and indeterminism are avoided. The BSMLlib library has been implemented for the Objective Caml language. But there is currently no full implementation of such a language and an abstract machine is needed to validate such an implementation. Our approach is based on a bytecode compilation to a parallel abstract machine performing exchange of data and synchronous requests derived from the ZAM, the efficient abstract machine of the Objective Caml language.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a data structure named parallel vector, which is given by intention. In order to have an execution that follows the BSP model, and to have a simple cost model, nesting of parallel vectors is not allowed. The novelty of this paper is a type system which prevents such nesting. This system is correct w.r.t. the dynamic semantics which is also presented.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. In a direct-mode BSP algorithm, the physical structure of processes is made explicit. The execution time can then be estimated and dead-locks and indeterminism are avoided. The BSMLlib library has been implemented for the Objective Caml language. But there is currently no full implementation of such a language and an abstract machine is needed to have a certified implementation. Our approach is based on a byte-code compilation to a parallel abstract machine performing exchange of data and synchronous requests derived from the abstract machine of the Caml language.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector. An attempt to add a parallel composition to this approach led to a non-confluent calculus and to a restricted form of parallel composition. This paper presents a new, simpler and more general semantics for parallel composition, renamed here parallel juxtaposition, as well as an associated cost model derived from the BSP cost model.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. Those operations are flat and allow BSP programming in direct mode but it is impossible to express directly divide-and-conquer algorithms. This paper presents a new construction for the BSMLlib library which can express divide-and-conquer algorithms. It is called parallel superposition because it can be seen as the parallel composition of two BSP threads which can each use all the processors. An associated cost model derived from the BSP cost model is also given.
BSMLlib is a functional data-parallel library for programming bulk-synchronous parallel (BSP) algorithms in Objective CAML. This article demonstrates the expressivity of BSMLlib operations on elementary algorithms. Other operations, on a type of parallel sets, illustrate how BSP exchange phases are written as small, simple programs. A longer example, inspired by parallel join algorithms for relational databases, demonstrates the advantage of using BSMLlib's explicit processes for dynamic load balancing. Finally, suggestions are made for extending the library and building a complete BSML language with the same operations.
The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the lambda-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. A first implementation of this library was based on the BSPlib library, which is not longer supported nor updated. Being the basis of a framework for Grid computing, a new implementation of the BSMLlib based on MPI has been designed. Experimental results on a cluster of PCs are presented.
The work described here is part of our research program to investigate new paradigms for declarative parallel programming through special interpretations of standard semantics. Our approach is a purely functional approach to programming of BSP algorithms. An attempt to add a parallel composition to this approach led to a non-confluent calculus. This paper presents a solution for adding the parallel composition to the weak call-by-value strategy of the BSlambda_p and to the BSMLlib library.
A functional data-parallel language called BSML is designed for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. Its aim is to combine the generality of languages like NESL with the predictable performance of direct-mode BSP algorithms. The BSML operations are motivated and described. Experiments with a library implementation of BSML show the possibility and limitations of parallel performance prediction in this framework.
The BSlambda_p calculus is a calculus of functional BSP programs on enumerated parallel vectors. This confluent calculus is defined and a parallel cost model is associated with a weak call-by-value strategy. These results constitute the core of a formal design for a BSP dialect of ML.
This paper presents the design for a purely functional parallel language with higher-order functions. Data layout is explicit in the language and arbitrary synchronizations can be specified in the non-standard type system based on concrete data structures. We present its denotational semantics, operational semantics, full abstraction property and program examples to illustrate its key features.
With the current generalization of parallel architectures and increasing requirement of parallel computation arises the concern of applying formal methods, which allow specifications of parallel and distributed programs to be precisely stated and the conformance of an implementation to be verified using mathematical techniques. However, the complexity of parallel programs, compared to sequential ones, makes them more error-prone and difficult to verify. This calls for a strongly structured form of parallelism which should not only be equipped with an abstraction or model that conceals much of the complexity of parallel computation, but also provide a systematic way of developing such parallelism from specifications for practically nontrivial examples. Transformational programming is a methodology that offers some scope for making the construction of efficient programs more mathematical. Program calculation is a kind of program transformation based on the theory of constructive algorithmics. An efficient program is derived step-by-step through a sequence of transformations that preserve the meaning and hence the correctness. With suitable data-structures, program calculation can be used for writing parallel programs. However, once an efficient (and correct with respect to the initial specification) formulation of the program is obtained through transformations, the program is then implemented using a parallel library of algorithmic skeletons most of the time written in C++ with calls to a communication library such as MPI. There is no formal correspondence between the efficient program obtained by transformation and the C++ skeletal program. Moreover the transformation itself is usually a pen-and-paper process that could contain errors. The SyDPaCC system is a set of libraries for the Coq proof assistant that allows to write naive functional programs then to transform them into efficient versions that could be automatically parallelised within the framework before being extracted from Coq to real code in the functional language OCaml plus calls to the parallel functional programming library Bulk Synchronous Parallel ML The tutorial is an introduction to the Coq proof assistant and the SyDPaCC system for the systematic development of correct and verified parallel programs.
Frama-C is a software analysis framework that provides a common infrastructure and a common behavioral specification language to plugins that implement various static and dynamic analyses of C programs. Most plugins do not support concurrency. We have proposed conc2seq, a Frama-C plugin based on program transformation, capable to leverage the existing huge code base of plugins and to handle concurrent C programs. In this paper we formalize and sketch the proof of correctness of the program transformation principle behind conc2seq, and present an effort towards the full mechanization of both the for- malization and proofs with the proof assistant Coq.
With the wide expansion of multiprocessor architectures, the analysis and reasoning for programs under weak memory models has become an important concern. This work presents an original constraint solver for detecting program behaviors respecting a particular memory model. It is implemented in Prolog using CHR (Constraint Handling Rules). The CHR formalism provides a convenient generic solution for specifying memory models, that benefits from the existing optimized implementations of CHR and can be easily extended to new models. We briefly present the solver design, illustrate the encoding of memory model constraints in CHR and discuss the benefits and limitations of the proposal.
Cloud hypervisors are critical software whose formal verification can increase our confidence in the reliability and security of the cloud. This work presents a case study on formal verification of the virtual memory system of the cloud hypervisor Anaxagoros, a microkernel designed for resource isolation and protection. The code under verification is specified and proven in the Frama-C software verification framework, mostly using automatic theorem proving. The remaining properties are interactively proven with the Coq proof assistant. We describe in detail selected aspects of the case study, including parallel execution and counting references to pages, and discuss some lessons learned, benefits and limitations of our approach.
Scientific experiments in many domains generate a huge amount of data whose size is in the range of hundreds of megabytes to petabytes. These data are stored on geographically distributed and heterogeneous resources. Researchers who need to analyze and have a fast access to such data are also located all over the globe. Queries executed by these researchers may require the transfer of huge amount of data over the wide area network in a reasonable time. Due to these emerging needs, the grid infrastructure which connects widely geographically distributed and heterogeneous computing and storage resources was born. In this paper, we are interested in treating join queries on the grid. We propose a new parallel algorithm allowing to reduce communication and disk Input/Output costs to minimum.
BSML, or Bulk Synchronous Parallel ML, is a high-level language based on ML and dedicated to parallel computation. In this paper, an extended type system that guarantees the safety of parallel programs is presented. It prevents non-determinism and deadlocks by ensuring that the invariants needed to preserve the structured parallelism are verified. Imperative extensions (references, exceptions) are included, and the system is designed for compatibility with modules.
Bulk Synchronous Parallel ML or BSML is a high-level language for programming parallel algorithms. Built upon the Objective Caml language, it provides a safe setting for implementing Bulk Synchronous Parallel (BSP) algorithms. It avoids concurrency related problems: deadlocks and non- determinism. BSML is based on a very small core of parallel primitives that extended functional sequential programming to functional BSP programming with a parallel data structure and operations to manipulate it. However, in practice the primitives for writing the parallel non-communicating parts of the program are not so easy to use. Thus we designed a new syntax that makes programs easier to write and read. Revised BSML is presented and its expressiveness and performance are illustrated through an application example.
Minimally Synchronous Parallel ML (MSPML) is a functional parallel programming language. It is based on a small number of primitives on a parallel data structure. MSPML programs are written like usual sequential ML program and use this small set of functions. MSPML is deterministic and deadlock free. The execution time of the programs can be estimated. Divide-and-conquer is a natural way of expressing parallel algorithms. MSPML is a flat language: it is not possible to split the parallel machine in order to implement divide-and-conquer parallel algorithms. This paper presents an extension of MSPML to deal with this kind of algorithms: a parallel composition primitive.
BSPlib is a programming library for C and Fortran which supports bulk synchronous parallelism (BSP). This paper is about a formal semantics for the DRMA programming style of the BSPlib library. The aim is to study the behavior of BSPlib programs and to propose some syntactic characterizations used to provide guarantees on semantic properties. This work is the basis for future tools dedicated to the validation of BSPlib programs.
Bulk Synchronous Parallel ML (BSML) is an extension of the functional language Objective Caml to program Bulk Synchronous Parallel (BSP) algorithms. It is deterministic, deadlock free and performances are good and predictable. Parallelism is expressed with a set of 4 primitives on a parallel data structure called parallel vector. These primitives are pure functional ones: they have no side-effect. It is thus possible, and we did it, to prove the correctness of BSML programs using a proof assistant like Coq. The BSλ-calculus is an extension of the λ-calculus which models the core semantics of BSML. Nevertheless some principles of BSML are not well captured by this calculus. This paper presents a new calculus, with a projection primitive, which provides a better model of the core semantics of BSML.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel algorithms, a model of computing. Dead-locks and indeterminism are avoided and the execution time can be then estimated. For large scale applications where parallel processing is helpful, more than one parallel machine is needed. One speaks about metacomputing. A major problem in programming parallel application for such platforms is their hierarchical network structures: latency and bandwidth of global ones are orders of magnitude worse than those of local networks. Here we consider how an extention of the well-suited BSP model might be for these kinds of computing and how extended our functional language can be for this.
BSMLlib is a functional data-parallel library for programming bulk-synchronous parallel (BSP) algorithms in Objective CAML. The definition of BSMLlib and performance prediction for elementary operations have been discussed in Sfp2000. This article demonstrates the expressivity of operations on elementary algorithms. Other operations, on a type of parallel sets, illustrate the advantage of explicit BSP exchange phases: the possibility of predicting and minimising communication and synchronization costs. A more complex example, inspired by parallel join algorithms for relational databases, demonstrates the advantage of using BSMLlib's explicit processes for dynamic load balancing. Finally, suggestions are made for extending the library and building a complete BSML language with the same operations.
An extension of the lambda-calculus called BSlambda is introduced as a formal basis for functional languages expressing bulk synchronous parallel algorithms. A local confluence result is explained and a call-by-value evaluation strategy is defined. The operational meaning of the calculus is illustrated by two programs defining the binary fold algorithm, one flat and one parallel-recursive.
Bulk Synchronous Parallel ML (BSML) est une extension du langage fonctionnel Objective Caml, fondé sur un modèle structuré de parallélisme, le modèle BSP. Ce modèle assure au programmeur BSML la sûreté d'exécution tout en lui laissant le strict contrôle des processeurs. Le modèle de prévision de performances de BSML est simple et réaliste. Le parallélisme est exprimé en utilisant un ensemble de primitives fonctionnelles pures sur une structure de données parallèle appelée vecteur parallèle. Cependant, les programmes sont souvent difficiles à écrire et leur mise au point peut être fastidieuse. Nous proposons dans cet article une nouvelle syntaxe et une sémantique associée dans le but d'écrire des programmes plus courts et plus lisibles. Nous formalisons la syntaxe et la sémantique classiques ainsi que les nouvelles syntaxe et sémantique, puis nous les modélisons en Coq. Leur confluence est établie.
Program calculation, being a programming technique that derives programs from specification by means of formula manipulation, is a challenging activity. It requires human insights and creativity, and needs systems to help human to focus on clever parts of the derivation by automating tedious ones and verifying correctness of transformations. Different from many existing systems, we show in this paper that Coq, a popular theorem prover, provides a cheap way to implement a powerful system to support program calculation, which has not been recognized so far. We design and implement a set of tactics for the Coq proof assistant to help the user to derive programs by program calculation and to write proofs in calculational form. The use of these tactics is demonstrated through program calculations in Coq based on the theory of lists.
Cet article présente l'ajout, à un langage fonctionnel parallèle appelé Minimally Synchronous Parallel ML (MSPML), d'une primitive pour la composition parallèle, appelée juxtaposition. MSPML est un langage fonctionnel parallèle basé sur un petit nombre de primitives sur une structure de données parallèle. Les programmes sont écrits comme des programmes ML usuels en utilisant ce petit ensemble de fonctions. MSPML est déterministe et sans déblocages. Le temps d'exécution des programmes peut étre estimé. Il a une sémantique asynchrone, c'est-à-dire sans barrière de synchronisation globale. Cette propriété s'avère intéressante pour des programmes non équilibrés à chaque étape. Toutefois il ne permettait pas d'écrire des algorithmes diviser-pour-règner parallèles qui sont courants dans la littérature. La juxtaposition remédie à cette limitation.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a data structure named parallel vector, which is given by intention. In order to have an execution that follows the BSP model, and to have a simple cost model, nesting of parallel vectors is not allowed. The novelty of this paper is a type system which prevents such nesting. An algorithm for type inference is also presented.
A functional data-parallel language called BSML has been designed for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. Its aim is to offer predictable and scalable performance for BSP algorithms written as functional programs. The current implementation of BSML is a library and has not been validated w.r.t the language's formal definition. As a library, it does not allow static tools to support performance management. As a first step in the construction of a complete BSML language, we have defined a parallel environment machine from our language's evaluation semantics and begun testing the scalability and predictability of its performance. This paper presents the machine design and the results of our experiments.
The work described here is part of our research program to investigate new paradigms for declarative parallel programming through special interpretations of standard semantics. Our approach is a purely functional approach to programming of BSP algorithms. An attempt to add a parallel composition to this approach led to a non-confluent calculus. This paper presents a solution for adding the parallel composition to the weak call-by-value strategy of the BSlambda_p and to the BSMLlib library.
BSMLlib is a functional data-parallel library for programming bulk-synchronous parallel (BSP) algorithms in Objective CAML. The definition of BSMLlib and performance prediction for elementary operations have been discussed in Sfp2000. This article demonstrates the expressivity of operations on elementary algorithms. Other operations, on a type of parallel sets, illustrate the advantage of explicit BSP exchange phases: the possibility of predicting and minimising communication and synchronization costs. A more complex example, inspired by parallel join algorithms for relational databases, demonstrates the advantage of using BSMLlib's explicit processes for dynamic load balancing. Finally, suggestions are made for extending the library and building a complete BSML language with the same operations.
Après une présentation du calcul confluent BSλ, extension du λ-calcul par des opérations parallèles permettant l'expression d'algorithmes BSP, nous proposons trois nouvelles opérations parallèles étendant le calcul. Le calcul et cette nouvelle extension sont illustrés par différentes algorithmes de diffusion donnés dans une syntaxe à la ML.
Nous d'efinissons une extension minimale du langage fonctionnel Caml par des op'erations parall`eles BSP, d'eriv'ees d'un mod`ele d'enotationel simple avec processus explicites et statiques. Le langage obtenu, BSML, pr'eserve la s'emantique compositionnelle et permet des pr'evisions pr'ecises de performances gr^ace au mod`ele de co^uts BSP. Nous l'illustrons sur des algorithmes parall`eles de base et montrons comment un simulateur BSML permet,a`l'aide des param`etres BSP des machines, de pr'evoir les performances r'eelles.
This paper presents the design for a purely functional parallel language with higher-order functions. It is based on specialisation of a non-standard model of lambda calculus (concrete data structure): array data structures. We present its denotational semantics, operational semantics, full abstraction property and program examples to illustrate its key features.
Frama-C is a framework that provides a common infrastructure and a common behavioral specification language to plugins that implement various static and dynamic analyses of C programs. Most plugin do not support concurrency. We have proposed conc2seq a plugin that handles concurrent C programs, based on program transformation to leverage the existing huge code base of plugins. In this paper we formalize and sketch the correctness of the program transformation principle behind conc2seq, and present an effort towards the full mechanization of both the formalization and proofs with the proof assistant Coq
We consider a simple imperative language with fork/join parallelism and lexically scoped nested atomic sections from which threads can escape. In this context, our contribution is the precise definition of atomicity, well-synchronisation and the proof that the latter implies the strong form of the former. A formalisation of our results in the Coq proof assistant is also available.
Orleans Skeleton Library (OSL) is a library of parallel algorithmic skeletons in C++ on top of MPI. It provides a structured approach towards parallel programming. Skeletons in OSL are based over the bulk synchronous parallelism model. Applications can be developed using different combinations and compositions of the skeletons. This paper illustrates the expressivity of OSL with two applications: a two dimensional heat diffusion simulation, and an exact n-body simulation. Experiments using these applications are performed on parallel machines.
Orléans Skeleton Library (OSL) is a library of parallel algorithmic skeletons in C++ on top of MPI. It provides a structured approach towards parallel programming. Skeletons in OSL are based on the bulk synchronous parallelism model. In this paper we present formal semantics of OSL: a programming model and its properties proved with the Coq assistant.
Program calculation, being a programming technique that derives programs from specification by means of formula manipulation, is a challenging activity. It requires human insights and creativity, and needs systems to help human to focus on clever parts of the derivation by automating tedious ones and verifying correctness of transformations. Different from many existing systems, we show in this paper that Coq, a popular theorem prover, provides a cheap way to implement a powerful system to support program calculation, which has not been recognized so far. We design and implement a set of tactics for the Coq proof assistant to help the user to derive programs by program calculation and to write proofs in calculational form. The use of these tactics is demonstrated through program calculations in Coq based on the theory of lists.
Skeleton programming enables programmers to build parallel programs easier by providing efficient ready-made parallel algorithms. The diffusion skeleton was proposed (associated with a method for program derivation) to abstract a good combination of primitive skeletons, such as map, parallel reduction and parallel prefix sum (scan). The BSMLlib library whose design is based on formal semantics is a library for the Objective Caml language to support Bulk Synchronous Parallelism. It offers a small set of primitives which permits to write any deterministic BSP algorithm. In this paper we study the implementation of the diffusion parallel skeleton using the BSMLlib library in a tutorial way.
Minimally Synchronous Parallel ML is a functional parallel language whose execution time can then be estimated and dead-locks and indeterminism are avoided. Programs are written as usual ML programs but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on a parallel data structure. It follows the cost model of the Message Passing Machine model (MPM). This paper explore the writing of an additional communication function using this small set of primitives. It could also be considered as a primitive rather than a function written using primitives. So we developed a low-level implementation of this function and compared it with the high-level implementation in terms of efficiency.
The Bulk Synchronous Parallel ML (BSML) is a functional language for Bulk Synchronous Parallel (BSP) programming. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. We present the formal proofs of correctness of BSML programs in the Coq proof assistant. Such development demonstrates the usefulness of higher-order logic in the process of software certification and parallel applications. They also show that proof of rather complex parallel algorithms may be made with inductive types by using the certified programs.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. In a direct-mode BSP algorithm, the physical structure of processes is made explicit. The execution time can then be estimated and dead-locks and indeterminism are avoided. The BSMLlib library has been implemented for the Objective Caml language. But there is currently no full implementation of such a language and an abstract machine is needed to validate such an implementation. Our approach is based on a bytecode compilation to a parallel abstract machine performing exchange of data and synchronous requests derived from the ZAM, the efficient abstract machine of the Objective Caml language.
The Bulk Synchronous Parallel ML (BSML) is a functional language for Bulk Synchronous Parallel (BSP) programming. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. We present the formal proofs of correctness of BSML programs in the Coq proof assistant. Such development demonstrates the usefulness of higher-order logic in the process of software certification and parallel applications. They also show that proof of rather complex parallel algorithms may be made with inductive types by using the certified programs.
The BSMLlib library is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. The BSFC++ library is a library for Functional Bulk Synchronous Parallel programming in C++ which is based on the FC++ library. We present those libraries and give experimental results. For comparaison, MPI/C++ versions of the same programs were also ran on our cluster of PCs.
The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calcul by parallel operations on a parallel data structure named parallel vector, which is given by intention. A first implementation of this library was based on the BSPlib library, which is not longer supported nor updated. Being the basis of a framework for Grid computing, a new implementation of the BSMLlib based on MPI has been designed. Experimental results on a cluster of PCs are presented.
This paper presents the BSFC++ library for functional bulk synchronous parallel programming in C++. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. This guarantees the determinism and the absence of deadlock. Broadcast algorithms are implemented using the core library.
The BSMLlib is a library for Bulk Synchronous Parallel (BSP) programming with the functional language Objective Caml. It is based on an extension of the λ-calculus by parallel operations on a parallel data structure named parallel vector, which is given by intention. An attempt to add a parallel composition to this approach led to a non-confluent calculus and to a restricted form of parallel composition. This paper presents a new, simpler and more general semantics for parallel composition, renamed here parallel juxtaposition, as well as an associated cost model derived from the BSP cost model.
We have designed a functional data-parallel language called BSML for programming bulk-synchronous parallel (BSP) algorithms in so-called direct mode. In a direct-mode BSP algorithm, the physical structure of processes is made explicit. The execution time can then be estimated and dead-locks and indeterminism are avoided. The BSMLlib library has been implemented for the Objective Caml language. But there is currently no full implementation of such a language and an abstract machine is needed to validate such an implementation. Our approach is based on a natural semantics and a bytecode compilation to a parallel abstract machine performing exchange of data and synchronous requests derived from the abstract machine of the Caml language.
The BSλ_{p}-calculs an extension of the λ-calculus by bulk synchronous parallel (BSP) operations on a parallel data structure named parallel vector. This paper presents how functional composition is preserved in this framework both from the semantics point of view and from the cost model point of view. Those operations are flat and allow BSP programming in direct mode but it is impossible to express directly divide-and-conquer algorithms. This paper also present a new kind of composition, a new construction for the BSλ_{p}-calculus which can express divide-and-conquer algorithms. It is called parallel superposition. An associated cost model derived from the BSP cost model is also given.
The BSlambda-calculus, a formal basis for functional languages expressing bulk synchronous parallel algorithms, is presented. It is then shown to be confluent.
This paper is an introduction to BSlambda a calculus of parallel-recursive BSP programs. The calculus is presented, the confluence of its flat subset is stated and examples are gieven in an ML-like syntax.
Certains problèmes nécessitent des performances que seules les machines massivement parallèles peuvent offrir. Leur programmation demeure néanmoins difficile. Les travaux étudiant le mélange de la programmation fonctionnelle et du parallélisme se répartissent en deux catégories : les extensions explicitement parallèles des langages fonctionnels -- mais les langages obtenus sont soit indéterministes soit brisent l'aspect fonctionnel pur -- et les implantations parallèles avec sémantique fonctionnelle -- mais les langages obtenus n'expriment pas directement les algorithmes parallèles et ne permettent pas la prévision des temps d'exécution. L'approche des langages à patrons, dans lesquels seulement un ensemble fixé d'opérations sont exécutées en parallèle, est intermédiaire. Leur sémantique fonctionnelle est explicite mais leur sémantique opérationnelle parallèle est implicite. L'ensemble de patrons doit étre le plus complet possible mais cet ensemble s'avère dépendant du domaine d'application. Nous approfondissons cette position intermédiaire avec pour objectif de parvenir à des langages universels dans lesquels le code source permet de déterminer le coét. Cette dernière exigence nécessite que soient explicites dans les programmes les lieux du réseau de processeurs de la machine. Une approche dénotationnelle nous a amené à une étude des limites de l'expressivité des langages fonctionnels parallèles à processus explicites. Une approche opérationnelle nous a amené à définir des λ-calculs BSP (le modèle BSP ajoute une notion de processus explicites au parallélisme de données) confluents et universels pour les algorithmes BSP. Nous avons également analysé les conditions précises d'implantation parallèle de tels calculs, expérimenté le style de programmation associé et constaté qu'ils sont suffisamment expressifs et que la prévision des temps d'exécution parallèle y est possible. Abstract: Some problems require performance that only massively parallel computers offer, but their programming is still difficult. Works on functional programming and parallelism can be divided in two categories: explicit parallel extensions of functional languages -- where resulting languages are either non-deterministic or non functional -- and parallel implementations with functional semantics -- where resulting languages don't express parallel algorithms directly and don't allow the prediction of execution times. Algorithmic skeletons languages, in which only a finite set of operations (the skeletons) are run in parallel, constitutes an intermediate approach. Their functional semantics is explicit but their parallel operational semantics is implicit. The set of algorithmic skeletons has to be as complete as possible but it is often dependent on the domain of application. We explore this intermediate position thoroughly in order to obtain universal parallel languages where source code determines execution cost. This last requirement forces the use of explicit processes corresponding to the parallel machine's processors. A denotational approach leads us to study the expressiveness of functional parallel languages with explicit processes. An operational approach leads us to define BSP lambda-calculi (the BSP model adds a notion of explicit process to data-parallelism) that are confluent and universal for BSP algoritms. We also analyze the conditions for parallel implementation of such calculi. We have experimented the associated programming style and have observed that they are sufficiently expressive and allow the prediction of execution times.
For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.
With parallel machines being now the norm (from smart-phones to super-computers) and no longer restricted to a niche, concurrent and parallel programming has also to become widespread. To do so, it has to move to structured paradigms, in the same way sequential programming did more than forty years ago. Moreover the concern of applying formal methods arises, which allow specifications of parallel and distributed programs to be precisely stated and the conformance of an implementation to be verified using mathematical techniques. However, the complexity of parallel programs, compared to sequential ones, makes them more error-prone, difficult to verify and almost impossible to debug. Parallel programs correctness by construction is thus an essential feature. To build correct programs by constructions is not a simple task, and usually the methodologies used for this purpose are rather theoretical based on a pen-and-paper style. Such theoretical methodologies are usually based on parallel recursive structures. They are named “parallel” because their recursive nature makes them very suitable for divide-and-conquer and thus parallelism. However when one writes an application using parallel recursive structures, it is not yet implemented in parallel. A realisation of parallel recursive structures and their basic operations should be implemented using more concrete (and actually more “flat” than recursive) parallel programming languages. Moreover, a better approach could be based on software tools and formalised theories that allow a user to develop an efficient parallel application by implementing easily simple programs satisfying conditions, ideally automatically, proved. For this purpose proof assistants make the reasoning checkable by a machine, may provide some automation in reasoning, and actual programs may be extracted from developments within these proof assistants. This talk will present an effort for providing a framework based on the Coq proof assistant for the systematic development of correct programs for parallel computing.
Le projet PaPDAS a pour objectif de fournir un environnement pour le développement systématique de programmes parallèles se basant sur l'algorithmique constructive (théorie des listes) et permettant soit d'exécuter très efficacement les programmes obtenus, soit de les compiler avec un compilateur vérifié gérant le parallélisme. Dans cet exposé je présente la dérivation formelle et l'extraction depuis Coq d'un programme parallèle fonctionnel pour le problème All Nearest Smaller Values (ANSV) et la comparaison avec une implantation directe non vérifiée d'un autre algorithme pour résoudre le même problème.
The All Nearest Smaller Values (ANSV) problem is an important problem for parallel programming as it can be used to solve several problems and is one of the phases of several other parallel algorithms. We formally develop by construction a functional parallel program for solving the ANSV problem using the theory of Bulk Synchronous Parallel (BSP) homomorphisms within the Coq proof assistant. The performances of the Bulk Synchronous Parallel ML program obtained from Coq is compared to a version derived without software support (pen-and-paper) and implemented using the Orléans Skeleton Library of algorithmic skeletons, and to a (unproved correct) direct implementation of the BSP algorithm of He and Huang.
The goal of this work is threefold. First we aim at modelling the algebraic structures used in the GTA framework and prove the lemma used to calculate programs from their GTA from inside the Coq proof assistant. Second we aim at providing some automation inside Coq to ease the definition of applications, with the formal verification that they actually fullfill the requirements of the GTA framework. Third we aim at providing the way to extract automatically actual parallel functional programs and MapReduce programs from these Coq developments. This talk will present the current status of this work-in-progress. This is a joint work with Kento Emoto, Julien Tesson, and Frederic Dabrowski.
Our general goal is to ease the development of correct and verifiable parallel programs with predictable performances using theories and tools to allow a user to develop an efficient application byimplementing simple programs satisfying conditions easily, or ideally automatically, proved. To attain this goal, a framework should be based on (1) high-level algebraic theories of parallel programs, (2) modelling of these theories inside interactive theorem provers (also named proof assistants) such as Coq, to be able to write and reason on programs, (3) axiomatisation of lower level parallel programming primitives and their use to implement the high-level primitives in order to extract actual parallel code from the developments make inside proof assistants, (4) verified compilers to ensure that the compiled code actually preserves the semantics (and yet the correctness) of the extracted source code. This first lecture presents the big picture, an overview of the other lectures and the many open questions related to this goal.
Coq is a proof assistant based on the Curry-Howard correspondence relating terms of a typed Lambda calculus with proof trees of a logical system in natural deduction form. The calculus behind Coq is the calculus of (co)-inductive constructions, an extension of the initial Calculus of Constructions. The main usages of Coq are: (1) program verification, (2) modelling and proof of properties programming languages semantics, (3) formalised mathematical theories. This lecture is a short introduction to Coq for program verification.
Bulk Synchronous Parallel ML (BSML) is a structured parallel functional programming language based on the Bulk Synchronous Parallelism (BSP) which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. BSML extends a functional programming language of the MLfamily with a polymorphic data structure and a very small set of primitives. In this lecture we describe a library for reasoning about BSML programs using the Coq interactive theorem prover.
Bulk Synchronous Parallel ML (BSML) is a structured parallel functional programming language based on the Bulk Synchronous Parallelism (BSP) which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. BSML extends a functional programming language of the MLfamily with a polymorphic data structure and a very small set of primitives. In this lecture we describe a framework for extracting actual parallel programs from proofs of BSML programs in Coq. This framework is illustrated through several example applications.
Program calculation, being a programming technique that derives programs from specification by means of formula manipulation, is a challenging activity. It requires human insights and creativity, and needs systems to help human to focus on clever parts of the derivation by automating tedious ones and verifying correctness of transformations. Different from many existing systems, we show in this lecture that Coq provides a cheap way to implement a powerful system to support program calculation. We design and implement a set of tactics for the Coq proof assistant to help the user to derive programs by program calculation and to write proofs in calculational form. The use of these tactics is demonstrated through program calculations in Coq mainly based on the theory of lists.
Algorithmic skeletons in conjunction with list homomorphisms play an important role in formal development of parallel algorithms. We have designed a notion of homomorphism dedicated to bulk synchronous parallelism. In this lecture, the BH theory is presented, as well as its formalisation in Coq. An implementation of a BH skeleton is implemented in BSML and verified in Coq. In practice a BH skeleton for the C++ Orléans Skeleton Library C++ could also be used. The framework is illustrated through various applications including sparse matrix vector multiplication and the all nearest smaller values problem.
Emoto, Fisher and Hu have integrated the generate-and-test programing(GTA) paradigm and semirings for aggregation of results, to propose a novel parallel programming framework for MapReduce, and to demonstrate how the framework can be efficiently implemented as a library to support parallel programming on Hadoop. In this lecture, we first present the modelling the algebraic structures used in the GTA framework and prove the lemma used to calculate programs with the Coq proof assistant. Second we show how to provide some automation inside Coq to ease the definition of applications, with the formal verification that they actually fulfill the requirements of the GTA framework. Third we provide the way to extract automatically actual parallel functional programs and MapReduce programs from these Coq developments.
The goal of this work is threefold. First we aim at modelling the algebraic structures used in the GTA framework and prove the lemma used to calculate programs from their GTA from inside the Coq proof assistant. Second we aim at providing some automation inside Coq to ease the definition of applications, with the formal verification that they actually fullfill the requirements of the GTA framework. Third we aim at providing the way to extract automatically actual parallel functional programs and MapReduce programs from these Coq developments. This talk will present the current status of this work-in-progress. This is a joint work with Kento Emoto, Julien Tesson, and Frederic Dabrowski.
Les squelettes algorithmiques sont des motifs abstraits qui apparaissent souvent dans la programmation parallèle. Issus de la programmation fonctionnelle, ce sont des fonctions d'ordre supérieur implantées en parallèle et fournies au programmeur. De manière classique, ils offrent une interface simple à l'utilisateur sans les détails du parallélisme. Nous proposons dans cet article une série de squelettes algorithmiques implantés avec le langage fonctionnel parallèle BSML (Bulk Synchrounous Parallel ML). Ces squelettes permettent de manipuler une structure de données répartie, des tableaux équitablement répartis. La spécification de ces squelettes est données par un type de module de l'assistant de preuve Coq. Une implantation de ce type de module, et donc correcte par rapport à la spécification des squelettes, est réalisée à l'aide de la partie fonctionnelle de Coq augmentée d'une axiomatisation des primitives de BSML. L'extraction de cette implantation en Coq donne du code BSML prouvé correct. Ces squelettes parallèles peuvent alors être compilés par le compilateur BSML et enfin exécutés sur une grande variété de machines parallèles. Nous prouvons également correctes des applications en utilisant la spécification des squelettes et extrayons leurs implantations en BSML.
As the usage of the cloud becomes pervasive in our lives, it is needed to ensure the reliability, safety and security of cloud environments. In this paper we study a usual software stack of a cloud environment from the perspective of formal verification. This software stack ranges from applications to the hypervisor. We argue that most of the layers could be practically formally verified, even if the work to verify all levels is huge.
Bulk Synchronous Parallel ML or BSML is a high-level language for programming parallel algorithms. Built upon the Objective Caml language, it provides a safe setting for implementing Bulk Synchronous Parallel (BSP) algorithms. It avoids concurrency related problems: deadlocks and non- determinism. BSML is based on a very small core of parallel primitives that extended functional sequential programming to functional BSP programming with a parallel data structure and operations to manipulate it. However, in practice the primitives for writing the parallel non-communicating parts of the program are not so easy to use. Thus we designed a new syntax that makes programs easier to write and read. Revised BSML is presented and its expressiveness and performance are illustrated through an application example.
With the current generalisation of parallel architectures arises the concern of applying formal methods to parallelism. The complexity of parallel, compared to sequential, programs makes them more error-prone and difficult to verify. Bulk Synchronous Parallelism (BSP) is a model of computation which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. We propose a framework for refining a sequential specification toward a functional BSP program, the whole process being done with the help of the Coq proof assistant. To do so we define BH, a new homomorphic skeleton, which captures the essence of BSP computation in an algorithmic level, and also serves as a bridge in mapping from high level specification to low level BSP parallel programs.
Bulk Synchronous Parallel ML or BSML is a high-level language for programming parallel algorithms. Built upon the Objective Caml language, it provides a safe setting for implementing Bulk Synchronous Parallel (BSP) algorithms. It avoids concurrency related problems: deadlocks and non-determinism. BSML is based on a very small core of parallel primitives that extended functional sequential programming to functional BSP programming with a parallel data structure and operations to manipulate it. However, in practice the primitives for writing the parallel non-communicating parts of the program are not so easy to use. Thus we designed a new syntax that makes programs easier to write and read: Revised BSML. In this talk, Revised BSML is presented and its expressiveness and performance are illustrated through an application examples. We also designed a formal semantics and proved its properties using the Coq proof assistant.
The design of parallel programs and parallel programming languages is a trade-off. On one hand the programs should be efficient. But the efficiency should not come at the price of non portability and unpredictability of performances. The portability of code is needed to allow code reuse on a wide variety of architectures and to allow the existence of legacy code. The predictability of performances is needed to guarantee that the efficiency will always be achieved on any architecture. Another very important characteristic of parallel programs is the complexity of their semantics. Deadlocks and indeterminism often hinder the practical use of parallelism by a large number of users. To avoid these undesirable properties, a trade-off has to be made between the expressiveness of the language and its structure which could decrease the expressiveness. Bulk Synchronous Parallelism (BSP) is a model of computation which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism: deadlocks are avoided and indeterminism is limited to very specific cases in the BSPlib. BSP programs are portable across many parallel architectures. The Bulk Synchronous Parallel ML language (BSML) is based on a confluent extension of the λ-calculus. Thus it is deadlock free and deterministic. Being a high-level language, programs are easier to write, to reuse and to compose. It is even possible to certify the correctness of BSML programs with the help of the Coq proof assistant. However the correctness of these programs is ensured with respect to the programming model of BSML. BSML, as data-parallel languages, has a programming model that is a formal or informal semantics of the primitive operations it provides as viewed by the programmer. However the realization of the primitives on parallel architecture, or execution model, is quite different. Thus to guarantee the correct execution of these programs in parallel, it is necessary to verify the equivalence of the programming model and the execution model. This talk presents the Bulk Synchronous Parallel language, several formal semantics of BSML from the programming model to the semantics of an abstract parallel machine, and proof of correctness of BSML programs with the Coq proof assistant.
Parallel divide-and-conquer parallel algorithms are very common in the litterature on parallel algorithms. In order to ease their implementation in Minimally Synchronous Parallel ML (MSPML), we add a new primitive to this asynchronous parallel functional language. MSPML is based on a parallel data structure called parallel vectors and operations to manipulate them. This paper presents semantics and an implementation of MSPML with parallel composition.
The goal of this project is to produce a programming environment in which certified parallebgl programs can be written, proved and safely executed. More technically, several subgoals are studied to achieve the overall goal of the project: proofs, using the Coq proof assistant, of programs written with the functional parallel languages we developped; certification of the compilers for these languages, from source code to bytecode for parallel virtual machines and the certification of these virtual machines; proofs of imperative bulk synchronous parallel programs first written with a small dedicated language and then with an extension of C or Java; automatic performance prediction of the programs written with these languages.
Certains problèmes comme la simulation de phénomènes physiques ou chimiques ou la gestion de bases de données de grande taille nécessitent des performances que seules les machines parallèles peuvent offrir. En plus de ces domaines d'application, on utilise les machines parallèles en recherche opérationnelle, en extraction de connaissances, en réalité virtuelle et en vérification. Les machines parallèles servent ainsi à un sous-ensemble important de toutes les applications de l'informatique. Leur programmation demeure néanmoins plus difficile que celle des machines séquentielles et la conception de langages adaptés est un sujet de recherche actif. Cet exposé présentera : - d'une part la machine parallèle du Laboratoire d'Algorithmique, Complexité et Logique (LACL), appelée "bicephale.univ-paris12.fr" qui est de type grappe de PC et les bases de l'utilisation pratique de celle-ci ; - d'autre part le langage de programmation de haut-niveau Bulk Synchronous Parallel ML (BSML) développé au LACL et quelques exemples implémentés avec BSML et qui ont fait l'objet d'expérimentations sur la grappe de PC.
De nombreux problèmes comme la simulation de phénomènes physiques ou la manipulation de bases de données de grandes tailles nécessitent des performances que seuls les ordinateurs massivement parallèles, voire les méta-ordinateurs, peuvent fournir.A partir d'un modèle de parallélisme structuré possèdant un modèle de prévision de performances (Bulk Synchronous Parallel ou BSP) nous avons conéu un ensemble d'opérations universelles qui permet la programmation de n'importe quel algorithme de ce modèle. Une approche opérationnelle nous a amené à définir des extensions du lambda-calcul par des primitives BSP. Une bibliothèque basée sur ces calculs, la bibliothèque BSML a été développée pour le langage fonctionnel Objective Caml. La programmation BSML s'appuie sur une structure de données parallèle polymorphe. Les programmes sont des fonctions Objective Caml usuelles mais manipulent cette structure de données parallèle à l'aide d'un petit ensemble d'opérations dédiées. BSML suit le modèle BSP et est donc garanti sans inter-blocage. La confluence des calculs garantit quant à elle le déterminisme de BSML. Un nouvel axe de recherche, présenté dans cet exposé, est d'avoir un langage pour le metacomputing, c'est-à-dire pour la programmation de grappes de machines parallèles (qui sont elles-mémes souvent des grappes de PC) que nous appelons méta-ordinateurs. Pour obtenir ce langage, appelée Departmental Metacomputing ML, nous avons combiné le langage BSML, pour la programmation de chaque noeud parallèle, et notre nouveau langage MSPML (Minimally Synchronous Parallel ML) pour la coordination de l'ensemble. MSPML diffère de BSML par l'absence de barrières de synchronisation globale.