Parallel Multiverse
Parallel Multiverse
Parallel Multiverse
Parallel Multiverse
Abstract Loop mechanisms such as for and while have always been a major concern for programmers regarding code performance, [Kes10]. This happens because these mechanisms are intricate part of the most exhaustive and demanding software implementations. Intel Threading Building Blocks (Intel TBB) paradigms provide a way for the programmer to parallelize these chunks of code allowing the overall performance, in some cases, to increase signicantly. The topic of this master project is related with a commonly known area, the development of Eclipse C++ Development Tools (CDT) plug-ins, with which, we intend to lead the programmer use TBB algorithms. The plug-in provides semi-automatic recognition and transformation of compatible loops and containers to corresponding TBBs high level, parallel programming paradigms, abstracting the programmer from known diculties in parallelism such as race conditions and synchronization of threads.This is made possible by the analysis of the abstract syntax tree(AST) through tree pattern matching algorithms. This plug-in intends to feature aided transformations for parallel for algorithms in at iteration as well as containers such as vectors, FIFO queues and hash maps. This plug-in is built upon Pascal Kesselis DeepSpace-8 project. With further exploration of TBBs capacity for parallelism many other extensions could be made for this plug-in. Paradigms such as scalable memory allocation, mutual exclusion, atomic operations, timing and task scheduling would add more insight to the program allowing its functionalities to go even further. All these extensions are of great interest for a possible future development, although to implement them, a greater depth analysis of the C++ language, through semantic analysis, would be required. A task not easily performed due to the languages ambiguous semantics analysis, Section 3.2.1.
Contents
1 Management Summary 1.1 Problem Denition . . . . . . . . . . . . 1.2 Solution Approach . . . . . . . . . . . . 1.2.1 Parallel Studio 2011 Approach . 1.2.2 TBBs algorithms transformations 1.3 Results . . . . . . . . . . . . . . . . . . . 1.3.1 Time Analysis . . . . . . . . . . 1.4 Conclusion Outlook . . . . . . . . . . . 2 Introduction 2.1 Task Description . . . . . . . 2.1.1 Initial Position . . . . 2.1.2 Objectives of Thesis . 2.1.3 Expressive Examples . 2.2 Intel Thread Building Blocks 2.2.1 Basic Algorithms . . . 2.2.2 Containers . . . . . . 2.3 C++ . . . . . . . . . . . . . . 2.3.1 Lambda expressions in 3 3 4 5 7 8 9 10 11 12 12 12 12 13 14 15 16 16 18 18 19 20 23 25 25 26 27 27 27 28 29 29 29 29 30
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C++11
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3 Implementation 3.1 Overview . . . . . . . . . . . . . . 3.2 Architecture . . . . . . . . . . . . . 3.2.1 Analysis . . . . . . . . . . . 3.2.2 Transformation . . . . . . . 3.3 Containers . . . . . . . . . . . . . . 3.3.1 Analysis . . . . . . . . . . . 3.3.2 Transformation . . . . . . . 3.4 Parallel For . . . . . . . . . . . . . 3.4.1 Analysis . . . . . . . . . . . 3.4.2 Transformation . . . . . . . 3.5 Namespace directives and includes 3.6 ASTHelper . . . . . . . . . . . . . 3.7 Eclipse Integration . . . . . . . . . 3.7.1 Codan . . . . . . . . . . . . 3.7.2 IMarker Resolution . . . . . 3.7.3 Plug-in Unit Tests . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
4 Conclusion 34 4.1 Goal Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Appendix 35 5.1 Walk-through Project Set-up . . . . . . . . . . . . . . . . . . . . 35 5.2 GCC C++11 compliant Set-up . . . . . . . . . . . . . . . . . . . 39 5.3 Project Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 1
Management Summary
The present chapter aims to summarize the general intents of the built plugin. Descriptions will range between the problem denition, passing through the approach made, having into account Parallel Studio 2011 and ending with a conclusion.
1.1
Problem Denition
Loops have always been responsible for demanding most of the runtime resources, [Kes10]. Also, most of the loop blocks found resemble similar structures and, many times, very similar functionalities. Pattern recognition is useful when trying to nd common features in loops which intend to perform add, update and delete elements from a certain container. Such examples, have always been the most commonly found looping procedures. Situations such as seen in Listing 1.1 can be a good example of commonly found loops. v e c t o r <int> vec ; sum = 0 . 0 ; f o r ( int k = 0 ; k != vec . s i z e ( ) ; k++){ sum += vec [ i ] ; } Listing 1.1: Loop Example With the largely spread multi-core architecture, being present in most of todays mainstream computers, it can be considered that running serialized code is a huge waist of resources and performance in cases, such as loops, where parallelism can be applied. SumFoo s f ( a ) ; p a r a l l e l r e d u c e ( b l o c k e d r a n g e <s i z e t >(0 ,n , I d e a l G r a i n S i z e ) , sf ) ; Listing 1.2: TBB Transformed Code Being so, due to the current lack of support to parallelism in C++03, and although the use of libraries, to provide such features, has been largely spread in
the last few years, there is currently a huge gap in the aid provided to the programmer to suggest possible parallelism implementation. As such, the existence of such a tool, which could detect potential parallelism and suggest transformations, would be of great interest. Being so, our work revolves around this same matter by pointing out potentially viable solutions for the programmers code. Also, the user may discard using threads due to the fact that analysing code manually is time consuming and generally not worth the eort. Thread structuring and implementation dier from library to library and getting acquainted with all the specics can be demanding. Additionally, manually transforming a loop to a parallelized paradigm can introduce new errors. The result for a possible transformation to the example in Listing 1.1 can be seen in Listing 1.2. The rst line in the code transformation refers to the newly generated constructor method of the class which will represent the code intended to be parallelized. More details on this class methods and data members will be given in Section 2.2. The second line refers to the TBBs method which will allow the transformation to take place, by refering the block range which has the following structure: blocked range <T >(begin,end,IdealGrainSize). Also the last argument in the parallel reduce call refers to the class element constructed before which will contain certain methods necessary to the parallel reduce paradigm. As many programmers are currently unaware of the benets provided; by automating coding processes they could look into possible solutions they may have unawarely discarded.This suggestions and transformations are the subject of the Parallel Multiverse Project which makes use of the TBBs library to allow parallelism to be performed through the use of threads.
1.2
Solution Approach
Completely automated solutions to analyze and transform the semantics of code properties are impossible due to the innite number of possibilities, [HU06]. Nonetheless, suggesting possible solutions to the user by partially analysing code and using pattern recognition can be a solution to the problem. By implementing such detections through the use of a plug-in for Eclipse (IDE), pin pointing possible solutions in certain parts of the code is useful, by letting the user take the decision of performing the transformation or not. Our approach to solve the issue, is made by analysing the AST (Abstract Syntax Tree) by making code pattern matching. This approach is based on Pascal Kesselis implementation on the Deep-Space 8 project. An example can be seen in Figure 1.1 [Kes10]. All decisions taken on this implementation seek only to help the user nd possible solutions and not restrain the same from extrapolating into other possible ways of implementing the same code. This solution approach was made merely to recognize certain patterns such as nd and for algorithms and some containers like vectors, hash maps, and queues. It can easily be determined that to fully extend recognition of patterns to all possible solutions would be impossible and so only a select few algorithms will be analysed. Another approach as came across our path while investigating possible solutions to our projects subject.Intel Parallel Studio (IPS) is an add-on/plug-in for Microsoft Visual Studio. As it will be discussed more thoroughly ahead, this
Figure 1.1: Base Find Algorithm Pattern used in Deep-Space 8 [Kes10]. can be a powerful tool when aiding parallel program building as it enhances applications performance and improves productivity as you develop application software, but the core intent diers from our projects goals. Nonetheless, this is explained more thoroughly in the next section.
1.2.1
The Microsofts Visual Studio plug-in, Parallel Studio, aims to help C/C++ developers parallelize their code. The intent of this section is to determine whether or not some features of this tool are worthwhile complementing our Eclipse plug-in. Parallel Studio 2011 consists in a set of independent tools, which are: Parallel Amplier, Parallel Composer, Parallel Inspector and Parallel Advisor, [Int11a]. Parallel Amplier gives information about code performance. Namely, it determines which of the code functions are more demanding and how CPU usage is being distributed throughout all cores by recording data from the O/S data structure; mainly the stack calls and the instruction pointers. This information is displayed through a call tree. The relevant bits of code are called Hotspots[Cep10] as seen in Figure 1.2. Concurrency can also be analysed by this functionality by determining how long do threads, during code execution, eectively run. The nal information is broken down per function. Parallel Composer is a tool that integrates Intel C++ Compiler, TBB, IPP (Integrated Performance Primitives) and Parallel Debugger Extension. One of Parallel Composers biggest features, is the ability to specify multi-threaded 5
Figure 1.2: Parallel Amplier Analysis Overview.[Cep10] features by the OpenMP standard by optimizing the compiler and libraries. Includes Intel Integrated Performance Primitives (Intel IPP), an extensive library of multicore-ready, highly optimized software functions for digital media and data-processing applications[Lio10b]. Parallel Inspector is a debugging tool to nd memory and threading errors. Used as preventive tool. Consult Figure 1.3.
Figure 1.3: Parallel Inspector Error Analysis[Lio10c] Parallel Advisor is yet another analysis tool. The developer marks the code that he wants to add parallelism to via annotations, the developer then implements parallelism and the plug-in will determine whether or not this change has improved performance. This tool has an evident workow and presupposes an iterative process. Consult Figure 1.4.
Figure 1.4: Parallel Advisor Performance Analysis[Lio10a] Although Intel Parallel Studio(IPS) is a good framework for analysing and measuring code performance, in terms of parallelism it doesnt really provide any semi-automated solution for code transformation as we set out to produce. With IPS, the developer knows where he can improve his code and when a parallel transformation is benecial, but he is not aided on perform it. With our plug-in the developer knows how to make the transformation, but he doesnt know if it is benecial. In conclusion the two tools have dierent purposes. Maybe some features of IPS could be implemented in another Eclipse plug-in, for code analysis, but it is not in the scope of this project.
1.2.2
Intels Thread Building Blocks makes available a range of paradigms which enable the parallelization of the encountered loops and container routines. C++ standard containers can be replaced with concurrent containers which allows multiple threads to concurrently access and update items. Typical C++ STL containers do not permit concurrent updates. Attempts to modify them concurrently can result in corrupting the containers content. STL containers can be wrapped in a mutex to make them safe for concurrent access, by letting only one thread operate on the container at a time, but that approach eliminates concurrency, thus restricting parallel speed-up. With concurrent containers, concurrency can be implemented by using lock-free algorithms and ne-grained locking in which multiple threads operate on the container by locking only those portions they really need to lock. Although such containers have higher levels of concurrency, highly-concurrent containers come at a cost. They typically have higher overheads than regular STL containers and should only be used when speed-up from the additional concurrency that they enable outweighs their slower sequential performance (i.e. they should not be used in a serial 7
environment). Implementations can be simplied to: concurrent hash map <Key, T > concurrent vector <T > concurrent queue <T,Alloc > Which correspond to C++ containers: hash map <Key, T > vector <T > queue <T > Comparing these concurrent containers to C++ STL containers, there are not many syntactical dierences, although the rst are optimized to use parallelization routines such as parallel reduce or parallel for which can then perform parallelism in a more ecient way. Some of these containers implement their own methods, like concurrent queue which has its own push and pop which waits until they can succeed without the extra need for mutex or other blocking mechanisms to be implemented by the user. More details will be given in Section 2.2.
1.3
Results
The results obtained from the Eclipse CDT plug-in, identifying possible transformations through pattern code matching can be seen as pointed out information in the left side bar. By hovering over the icon, the user obtains a message informing what that piece of code resembles and by clicking on the marker, the user gets a list of all possible suggestions for transformations. Consult Figure 1.5.
Figure 1.5: Marker information icons displayed on left side bar Information displayed to the user with the corresponding notifying message. Double click in the left panel item to trigger the action. Consult Figure 1.6. 8
Figure 1.7: Final result displayed after transformations Final result, after the transformation is performed can be seen in Figure 1.7. These transformations are there merely to advert the user that such is possible, but it is never to be taken as a necessary step. Transformations may cripple/alter the users initial intent for the code.
1.3.1
Time Analysis
Even though we had set out as an objective to perform an analysis on what the plug-in actually improves we did not perform any measurements. Thus remaining uncertain whether or not the plug-in actually provides improvements on large code application. This was partially due to ignorance regarding how to automate code transformation on a large scale i.e. transform an whole project at once (without having to manually execute each alteration) but also because no analysis of the loops body is made some transformations would alter/hinder the codes initial purpose which obviously makes it non-viable to make large scale transformations.
1.4
Conclusion Outlook
Although we did not achieve what was dened as the objectives of the project we acquired valuable understanding of C++ STL algorithms, C++ advanced concepts, such as the application of lambda expressions, TBB algorithms and data structures, parallel computing concepts, modern parser internals and code transformation techniques, Eclipse CDT inner workings with special regard to Codan and IMarkerResolution usage and also JUnit testing of CDTs plugins and working with continuous integration tools. The implementation of this project, represents only a simplied version of what could be made when interpreting the functionalities that TBB oers. Features such as atomic operations, which are now a standard in C++11, along with complex iterations spaces for the C++ loops could be a future attempt to enhance this plug-in capabilities. Additionally, the introduction of other algorithms apart from the used for loops would be of great interest.
10
Chapter 2
Introduction
For decades, it was possible to improve performance of a CPU by shrinking the area of the integrated circuit, which drove down the cost per device. As manufacturing techniques reach theoretical limits in miniaturization, [Moo65] increased use of parallel computing in the form of multi-core processors has been pursued to improve overall processing performance. This came to happen due to the increasing demand of computational resources, from all areas of software development nowadays. Since computer manufacturers have long implemented multiprocessing (SMP) designs using discrete CPUs, the issues regarding implementing multi-core processor architectures and supporting them with software are well known at least for scientic purposes and industrial demands. The implementation of such architectures in personal computers has driven more and more programmers to develop multi-threading software which can take full advantage of the hardware where they run their programs on. Although such programming methodologies have existed for quite some time they are not used as often as they should. Nonetheless implementations using threading libraries compatible with most of the well know programming languages are in most cases possible. In this case we have chosen Intel Thread Building Blocks, [Int11b], for our implementation, since it provides a high level of abstraction in which the user can safely produce multi-threaded code without having to worry about major concerns such as, data racing or granularity. One of the main ideas regarding this project can be explained by the following phrase: Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly non-deterministic, and the job of the programmer becomes one of pruning that non-determinism. [Lee06] This chapter aims to describe basic implementation decisions and also provides the explanations on why there are some limitations when implementing code analysis tools.
11
2.1
Task Description
To give readers some contextualization we start by detailing which was the starting point for this project, Section 2.1.1, move on to the intended objectives to achieve, Section 2.1.2, and nish o with some examples on which code paradigms are eligible to be transformed, Section 2.1.3.
2.1.1
Initial Position
Although drawbacks can be pointed out to multi-threaded code implementations, we believe that the benets overwhelm the downsides. Taking this into account, we believe that helping users to use paradigms for parallelization as much as possible is a good way to help users get acquainted with these concepts. The goal is to provide users with a tool that suggests possible solutions which they may not yet have come across, while retaining the same intention and increasing overall performance. This kind of tool is not currently seen in most Integrated Development Environments(IDEs) such as Eclipse C++ Development Tooling(CDT) or even Visual Studio which lack any aided transformation of C++ loops and containers to TBBs paradigms or even any other threading library. Having all that into account, we based our approach in the Deep-Space 8 projects implementation [Kes10], by complementing its code analysis features with our own and taking advantage of its structure to implement the intended transformations.
2.1.2
Objectives of Thesis
Transformations Considered By evaluating C++ coding patterns, which repeat themselves along many mainstream applications, decisions have been taken to choose which algorithms would be picked for transformation. See chapter 1.3 for more details. Since our analysis tool is based on Pascal Kesselis project[Kes10], much of the chosen algorithms analysis was already partially implemented. Code Patterns Analysis After deciding which coding patterns would be used for transformations, pattern stipulation was built, so it could match potentially transformable loops. Not to be seen as a determined fact to all detected cases, but much more as a suggestion. The user must always take this into account. Proposed Transformations The most promising patterns detected suffer a transformation which adapts the currently implemented code to a TBB paradigm. This is made available as an Eclipse CDT plug-in.
2.1.3
Expressive Examples
This subsection presents some examples of what kind of transformations are possible. Containers The two listings below Listing 2.1 and Listing 2.2 illustrate the transformations of STL containers into TBB concurrent containers in our plug-in. 12
v e c t o r <i n t > vec ; queue<s t r i n g > que ; hash map<s t r i n g , v e c t o r <i n t >> map ; s t d : : v e c t o r <i n t > qualVec ; s t d : : queue<s t r i n g > qualQue ; g n u c c x : : hash map<s t r i n g , v e c t o r <i n t >> qualMap ; Listing 2.1: Container C++ original code c o n c u r r e n t v e c t o r <i n t > vec ; c o n c u r r e n t q u e u e <s t r i n g > que ; c o n c u r r e n t h a s h m a p <s t r i n g , v e c t o r <i n t >> map ; tbb : : v e c t o r <i n t > qualVec ; tbb : : queue<s t r i n g > qualQue ; tbb : : hash map<s t r i n g , v e c t o r <i n t >> qualMap ; Listing 2.2: Container C++ transformed code, after plug-in execution Loops Loop transformation on our project relates to transformation of for statements applied to unidimensional iteration spaces with automatic chunking features. The two listings bellow ( 2.3 and 2.4 ) show an example of said transformations: v e c t o r <i n t > vec ; f o r ( i n t i = 0 ; i != vec . s i z e ( ) ; i ++) { i f ( vec [ i ] == 5 ) { vec [ i ] += 1 ; } } Listing 2.3: Loop C++ original code tbb : : p a r a l l e l f o r ( s i z e t ( 0 ) , ( s i z e t ) vec . s i z e ( ) , [ & ] ( size t i ) { i f ( vec [ i ] == 5 ) { vec [ i ] += 1 ; } } ); Listing 2.4: Loop C++ transformed code, after plug-in execution
2.2
Intel Threading Building Blocks or TBB is a C++ template library developed by Intel Corporation for multi-threaded applications.[Int11b] The library abstracts access to multiple core architecture by allowing the operations to be treated as tasks, which we will not cover in our analysis. Our usage of TBB will be limited to transformation of basic algorithms and data structures. 13
2.2.1
Basic Algorithms
Provided bellow is information regarding TBB basic algorithms that we had set out to implement. parallel for Template function parallel for that performs parallel iteration over a range of values. A parallel for(rst,last,step,f) represents parallel execution of the loop: f o r ( auto i= f i r s t ; i <l a s t ; i+=s t e p ) f ( i ) ; The index type must be an integral type. The loop must not wrap around. The step value must be positive. If omitted, it is implicitly 1. There is no guarantee that the iterations will run in parallel. A parallel for(range,body,partitioner) provides a more general form of parallel iteration. It represents parallel execution of body over each value in range. The optional partitioner species a partitioning strategy. The following listings provide examples of both at iteration spaces (2.5) and complex iteration spaces (2.6). p a r a l l e l f o r ( b l o c k e d r a n g e <i n t >( a , b ) , avg ) ; Listing 2.5: parallel for in at iteration spaces parallel for ( P a r a l l e l M e r g e R a n g e <I t e r a t o r >( begin1 , end1 , begin2 , end2 , out ) , ParallelMergeBody <I t e r a t o r >() , simple partitioner () ); Listing 2.6: parallel for in complex iteration spaces
parallel reduce Template function parallel reduce computes reduction over a range. The parallel reduce template has two forms. Functional form takes advantage of lambda transformations. The following listing is an example of usage of parallel reduce (2.7). f l o a t ParallelSum ( f l o a t array [ ] , s i z e t n ) { return paral lel red uce ( b l o c k e d r a n g e <f l o a t >( array , a r r a y+n ) , 0. f , [ ] ( c o n s t b l o c k e d r a n g e <f l o a t >& r , f l o a t i n i t )> f l o a t { f o r ( f l o a t a=r . b e g i n ( ) ; a!= r . end ( ) ; ++a ) i n i t += a ; return i n i t ; }, [ ] ( f l o a t x , f l o a t y )> f l o a t { r e t u r n x+y ;
14
2.2.2
Containers
Hand-written C++ container classes or STLs arent thread safe: Example with an STL map <int,SomeClass >: Thread 0 : mymap [ 1 ] = SomeClass ( ) ; Thread 1 : mymap [ 2 ] = SomeClass ( ) ; Although we are adding values to two distinct keys there is no assurance on the correct behaviour. The map may become corrupt. Another trivial example of operations that can lead to unexpected behaviour: Thread 0 : mymap [ 1 ] = SomeClass ( ) ; Thread 1 : mymap . e r a s e ( 1 ) ; This scenario produces a race condition, the outcome depends on which thread performs its operation rst. Additionally, these kind of operations might even systematically turn out the desired outputs, evading testing. Common defusing of these time-bombs is possible with the usage of wrapping locks and other mechanisms. The disadvantage is that synchronization methods are usually complex,hinder code readability and performance. Intel TBB provides an alternative, with thread safe containers, [Int11b]: concurrent hash map concurrent queue concurrent vector All of TBBs concurrent containers are implemented using ne-grained locking, this means that although the data structure is somewhat locked preserving thread safety, it remains accessible to other threads. Only a portion of the structure is locked. concurrent vector Unlike in STLs vector implementation, a TBB vectors element location in memory will not change. The TBB version is an unrolled linked list. In short, concurrent vector provides random access to its elements, ensures safe concurrent growing and adding new elements does not invalidate existing iterators and does not change indices of existing items. As stated in the TBB documentation. [TBB12]
15
concurrent queue Queues follow the rst-in rst-out principle, if several threads are pushing and popping concurrently, there is no clear rst. The Intel TBB template class concurrent queue guarantees that pushing and popping will be done in the same order. Interleaving of values pushed by dierent threads is possible. concurrent hash map A concurrent hash map is a container of elements of type std::pair <const Key,T >. When accessing a container element, reading or modifying operations act as smart pointers and enable the atomic access to elements.
2.3
2.3.1
C++
Lambda expressions in C++11
Lambda expressions are a new feature in C++11, [Com11] (and are generally popular in modern functional programming languages). We had not yet been familiarized with this new feature. The basic loop transformations performed by our plug-in require lambda expression usage, therefore we had to get acquainted with such concepts. lambda expressions are anonymous functions that maintain state and can access the variables from target scope. Figure 2.1 illustrates the basic syntax of a lambda expression. The list bellow relates to that same gure.
1. lambda introducer or capture clause 2. lambda parameter list 3. mutable specication 4. exception specication 5. return type 6. lambda body
16
Capture clause Inside the capture clause, variables that are prexed with & are accessed by reference, variables that dont have such prex are accessed by value. Default capture mode can also be specied by using & (accesses all captured variables by reference) or =(accesses all captured variables by value) as the rst element of the capture clause. In our plug-in for the kind of transformations performed all lambda expressions use the default capture mode with the variables being accessed by reference ( and therefore always use & ). Lambda Declarator Lambda Declarator eld is optional and is composed of the following structures: Parameter list Lambda expression parameter list is like a functions parameter list except that it cannot have default arguments, unnamed parameters and it cannot be a variable length argument list. Mutable Specication When variables captured by value need to have their value changed, usage of mutable is required. Exception Specication This specication is used to signal that the lambda expression does not throw any exception. Return Type Specication of return type, this is optional when the lambda expression has no return value. Lambda Body The body of a lambda expression can access parameters, local variables,class data members in the event the lambda expression is declared inside a class and static variables. As said above the lambda expression body also has access to variables it captures from the current scope.
17
Chapter 3
Implementation
In this section we intend to demonstrate the reasons for the structure applied to the program and the correspondent methodology along with some implementation details and program ow. Also details will be provided on what has been maintained from Pascal Kesselis work and the modications we provided.
3.1
Overview
We can get from Figure 3.1 a general overview of the system. From the gure, one can detect that the system basically starts by detecting the code imputed by the programmer and use the analysis module and utilize the correspondent semantic analyser to perform the analysis.
Figure 3.1: System Overview These semantic analysers match the written code with applicable C++ containers or loops that are possible to transform. Any container or loop that matches, is assigned an Eclipse resource marker. With each applied marker, the user will be able to choose from the available choices the resolution marker which he desires to apply in that specic situation. Possible situations in which this could occur are displayed next in Figure 3.2 and Figure 3.3.
18
Figure 3.3: Loop Resolutions Once the user clicks one of the markers, made available in the left blue parallel bar to code writing section, the resolutions seen before will pop-up and if the user desires so, he/she must double click on one of the available functions to perform the transformation. If it is performed the user will trigger the second major module of the system which is the transformation module.
3.2
Architecture
Figure 3.4: System Overview The main goal of this section is to provide a deeper analysis on the global 19
overview of the system provided in Section 3.1. The analysis will be made by representing the respective classes and interfaces implemented as class diagrams and dependency views. Also, information will be provided on the main dierences between the base implementation, Deep Space 8 project[Kes10], and what is newly produced.
3.2.1
Analysis
As built in the basis, the plug-in divides itself into two major sections in terms of analysis. The tree pattern matching and the semantic analyser sections. Tree Pattern Matching Most of the previously existing classes implemented such as the Capture and Reference which then by denition of the system, are sub classes of the IPattern interface implemented in Deep Space 8, [Kes10], were not utilized in our work. Although, IPattern is still central for the recognition of pattern analysis and consequent subdivision into the various patterns.
Figure 3.5: IPattern structure From Figure 3.4 we can see both of the built classes which implement the IPattern interface. Both have their own implemented satises methods which procure to identify if the node received matches with the correspondent class type to be analysed. This has been utilized to check if the developed analysers match with the desired nodes to be later transformed. All the other functionalities provided such as logical operators and Dominates, Dominates immediately and Sibling relations were left behind by being deemed unnecessary for the level of complexity of the algorithms and paradigms to implement. Any other implemented details can be obtained from the Deep Space 8 report [Kes10], which are already well documented there. PatternSearchVisitor The PatternSearchVisitor class implemented in Deep Spcae 8 Project [Kes10], allows for pattern detection under an AST Node and subsequent storing of the traversed nodes that match the requested pattern. Usage of this class is done in the following manner: 20
P a t t e r n S e a r c h V i s i t o r v i s i t o r = new P a t t e r n S e a r c h V i s i t o r ( new I s I n s t a n c e O f ( IASTDeclaration . c l a s s ) ) ; parent . accept ( v i s i t o r ) ; Set<IASTNode> o f f e n d i n g N o d e s = new HashSet<IASTNode>( v i s i t o r . getNodes ( ) ) ; Listing 3.1: Usage of PatternSearchVisitor. After execution of the above code block the oendingNodes list of nodes would be populated with the child nodes of node parent (can be any ASTNode in the AST) that are an instance of the class CPPASTDeclaration. Other class types can be instantiated along the process, depending on the parent type node desired to search for. Another clear example is the CPPASTForStatement node. More advanced patterns can be devised as long as they extend the IPattern interface, and implement the inherited operators and variables. Semantic Analysis
Figure 3.6: ISemanticAnalyzer structure The semantic analysers represent the core of the analysis module. Using the algorithms described earlier, they verify a given code segment and nodes within the abstract syntax tree oending their topic. Good examples on how the built classes are provided with an IASTTranslationUnit and analysed for a given Container or a For Loop Statement can be seen in Figure 3.8, where the ContainerSemanticAnalyzer and the FindParallelFor SemanticAnalyzer classes are demonstrated.
21
A Semantic Analyzer provides the classes which implement it with a correspondent Set <IASTNode > that matches its implemented rules. It will process the AST by analysing all the existent interfaces of type ISemantiAnalyzers in Set<ISemanticAnalyzer> and report the correspondent problems associated with them. Also, some modications were provided like the getIds method which returns an array of existent IDs for the same analyser checker, so that multiple problems can be reported later. It is also worthwhile to mention that our plug-in uses the Codan framework to match nodes with its correspondent UI marker. Containers For the Containers analysis one must have in mind a few details. To later perform a transformation one must rst analyse the types that the template container receives and its correspondent associated namespace. The chosen containers to analyze were: Vector Queue Hash Map Figure 3.6 highlights the sections that were taken into consideration:
Figure 3.7: Containers Analysis The chosen Hash Map belongs to an external library called gnu cxx, being selected for its statement type values acceptance. All others, are STL containers. For Loop
Figure 3.8: Containers Analysis In the case of the for loops analysis one must take into consideration that we limited the acceptance to only those, that take a statement as the rst variable and 2 expressions as the second and third variable. The analysis accepts both integer types initializations as well as function calls for the upper limit of the iteration space. Body analysis was not performed in any way. To do such, would require an immense amount of work and also, the intention of this plugin is merely to render possible transformations and is not to be seen as a fully automated tool with the correspondent consequences to the analysis matching. All the refereed elds can be seen in Figure 3.7. 22
3.2.2
Transformation
This section is dedicated to explain how the transformation classes are tted into the system. Figure 3.8 displays the correspondent structure of both built classes. Also, classes ReplaceContainerWithConcurrentContainerResolutionAggregator and ReplaceLoopByParallelForParadigmAggregator were built in order to maintain structural coherence, and follow the same prospects as in the Deep Space 8 project [Kes10].
Figure 3.9: Transformation Module The containers transformations applied are based on the analysis done discussed in Section 3.2.1. The transformation occurs by applying Pattern Search Visitors in the desired nodes to be altered or higher in the hierarchy nodes for purposes of rewriting the AST. After that is made, building of new nodes is necessary to ensure new paradigms denition, when the chosen nodes do not possess the required operator members to alter their respective content or behaviour. As such, CPPASTLiteralExpression nodes are required to enforce what we want to implement. Nonetheless, their usage is very limited in order not to replace several nodes with multiple signicances, with a CPPASTLiteralExpression, which to the AST would be seen as a single node. After all modications are performed, one must exchange the highest level, as necessary, AST hierarchy node, to make it possible to rewrite the AST with the modications done without multiple rewritings. The use of the range module, implemented in Deep Space 8 [Kes10], was not made, because it is built to check more details then what we intended, and as such to follow the same strategy would require more lines of code. Nonetheless, we admit that it could prove useful for future cases development due to structural coherence purposes. A detail that must be taken into consideration, is the fact that the introduc-
23
tion of an include directive can be made multiple times, since through the AST analysis one cannot deduce information regarding the include directives. Containers To apply modications to the selected marker, the necessary modications were implemented in the ReplaceContainerWithConcurrentContainerResolution class. The process followed was introduced before. Still, some remarks must be done. Once we implemented the required modications to all containers in this class, the ASTHelper, a class implemented in Deep Space 8 [Kes10] to which we added some functionalities, Section 3.6, is used to dierentiate types of concurrent containers. Also, some remarks must be made regarding the applicability of the using and include directives. These steps are both made by the insertBefore function call, and a nal replace function call is performed to change all the other modications. After all that, all introduced modications take eect by calling the rewriteAST.perform() function, and the procedure is taken into eect. For Loops For the loops case, the implemented class was ReplaceLoopByParallelForParadigm. In this class the basic procedures were the same. Special references must be made nonetheless, to a type restraint made in terms of the second argument of the for loop arguments statement. Both rst and second arguments correspond to the iteration values limits and both are casted do size t types due to requirements by the implementation of theparallel for template function. It will only accept as limits, for the iteration space, nodes of the types CPPASTLiteralExpression and CPPASTFunctionCallExpression as discussed in section 3.2.1. All the arguments for the for loop statement are added to an array which accepts elements of the type IASTInitializerClause. Namespace inducing is made by default and the headers placement is done as before. Also remark must be made to the fact that the CPPASTFunctionCallExpression created for the building of the new parallel for paradigm must be cast to a CPPASTExpressionStatement in order to have compliance with the previously existent node. Finally, we must report an attempt to implement a more customizable version of the parallel for algorithm which would allow the user to set the granularity and respective partitioning functionalities, but the implemented version was deemed unreliable since it didnt run correctly. As such, we left as a possible development for future developers who might wish to pick up where we left. All it takes is to add an extra Id identier to the respective Semantic Analyser, which is currently commented, and the possible resolution will show up in the list of LoopMarkerResolution implemented resolutions. This version of the parallel for paradigm consists in a 3 argument accepting template like before. But instead, it takes as rst argument a blocked range type element, which receives in its construction both limits for the iteration space and nally the granularity value. The second argument corresponds to the lambda expression and the last one consists of setting the partitioning type. This, has multiple accepting types such as auto partitioning or simple partitioning which will tell the function respectively, if it should set the granularity size by using its own
24
algorithm or by specifying the value in the third argument of the blocked range construction. More information on this and what range of values to choose from, please consult TBBs documentation[Int11b].
3.3
Containers
As visible in the Expressive Examples subsection, in section 2.1.2, container transformation in the scope of this project targets three data structures. These are the vector, queue and hash map structures that are transformed into concurrent vector, concurrent queue and concurrent hash map TBB containers correspondingly.
3.3.1
Analysis
In order to make the necessary transformations, with resource to the ASTRewrite class from CDT, understanding of the ASTs structure was necessary specially, when trying to get information regarding the specic node types.
Figure 3.10: AST representation of an example vector. Figure 3.10, illustrates the ASTs representation of a vector declaration. The name of the classes of nodes are directly related to the kind of entity they represent in C++ code. In the analysis of the containers syntax, the most
25
important node type is the CPPASTQualiedName. This node holds within its structure the information about the namespace and type of structure used. In the case of Figure 3.10 regarding std::vector, extraction of this information is made with resource to the resolveBinding method available to every node that extends IASTName. Take into special consideration, that the namespace used to detect hash map occurrences is gnu cxx since the STL currently does not have a hash map structure implemented.
3.3.2
Transformation
Figure 3.11: AST representation of the vector represented in 3.10 after transformation. Figure 3.11 illustrates the ASTs representation of a TBBs concurrent vector declaration as stated above the picture. Comparing with Figure 3.10 it is clear that creation of a synthetic CPPASTQualiedName node with the same children structure but with dierent CPPASTName nodes is required. The new values for the name of these CPPASTName nodes would respectively be the namespace and name of the structure (e.g. tbb and concurrent vector). Additionally the previous CPPASTQualiedName node has to be replaced with the newly created one with resource to the replace function in CDTs ASTRewrite.
26
3.4
Parallel For
The implemented parallel for transformations performed by our plug-in relate exclusively to unidimensional, simple iteration spaces for loops. The following listings provides an additional example of these kind of transformations: f o r ( int i = 0 ; i != vec . s i z e ( ) ; i ++) { (...) } Listing 3.2: Loop C++ original code. tbb : : p a r a l l e l f o r ( s i z e t ( 0 ) , ( s i z e t ) vec . s i z e ( ) , [ & ] ( size t i ) { (...) } Listing 3.3: Loop C++ transformed code, after plug-in execution.
3.4.1
Analysis
Analysis of the loops body is not made, which means that the developer can essentially make invalid transformations, i.e. transformation of for loops which semantics arent compliant with a multi-threaded environment. This is mainly due to the fact that by design the plug-in is not intended to guarantee validity of the performed transformations, but rather to automate these transformations for a developer who knows what hes doing. Figure 3.12 illustrates the ASTs representation of a for loop statement. As we can seen, the rst three children of the CPPASTForStatement type node are the nodes that we are interested in, transformation wise, as they represent initialization, condition and increase clauses. In the case of this example, they are correspondingly a CPPASTDeclarationStatement, a CPPASTBinaryExpression and a CPPASTUnaryExpression, but in other situations they can be more complex nodes (i.e. have more children) and our implementation would still correctly perform the transformation. Once again detection of the correspondent pattern will rely exclusively in the usage of the PatternSearchVisitor class.
3.4.2
Transformation
As can be seen from Figure 3.13 the structure of a transformed for loop is a bit more complex than the structure of the original loop 3.12. First because obviously there is no node type that represents parallel for constructs and secondly because it it necessary to specify the tbb namespace. Our implementation treats the parallel for as a normal function call expression (CPPASTFunctionCallExpression type node as child to CPPASTExpressionStatement that species that this is a statement just as the loop for statement is in the pre-transformation example) and species in a qualied name (via node type CPPPASTQualiedName with the tbb and the parallel for values.
27
Since the parallel for template requires its range to be specied as a unsigned integer type further transformation of the tree is required, changing the range variable types to type size t. Additionally, a lambda expression must be created in order to dene the for loop body as an anonymous inline function. Lambda expressions are represented by the AST in the manner illustrated by the gure. Nodes of the type CPPASTLambdaExpression have some specic methods such as: setCaptureDefault(CaptureDefault value) used to specify the type of default capture mode clause, either by reference (&) or by copy(=); setDeclarator(ICPPASTFunctionDeclarator dec) to pass the index to the loops body; setBody(IASTCompoundStatement body) that is used to specify the loops body. After constructing the synthetic node with the structure described, writing of the new node to the AST is done with resource to the replace function in CDTs ASTRewrite class.
3.5
Usage of both concurrent containers and parallel for statements require specication of namespace tbb and an include of the tbb library. The include directive is added on transformation, in Listing ?? an example can be seen of an include directive added after a vector transformation, note that the directive will be inserted before the rst node of the AST, typically bellow earlier include directives. #include tbb / c o n c u r r e n t v e c t o r . h tbb : : c o n c u r r e n t v e c t o r <int> vec ; Listing 3.4: Transformation that adds the include directive The specication of namespace is done explicitly (with name qualiers) and with namespace directives depending on the situation. As the two listings bellow illustrate: s t d : : v e c t o r <int> vec ; tbb : : c o n c u r r e n t v e c t o r <int> vec ; Listing 3.5: Original vector declaration with name qualiers and correspondent transformation using namespace s t d ; v e c t o r <int> vec ; using namespace s t d ; using namespace tbb ; c o n c u r r e n t v e c t o r <int> vec ; Listing 3.6: Original vector declaration with namespace directives and correspondent transformation
28
For explicit namespace specication (e.g. tbb::concurrent vector) alteration of the tree is necessary as seen on Figure 3.10 by manipulation of the CPPASTQualliedName node (each child of this node represents an isolated namespace specication). Namespace directives (e.g. using namespace tbb) and include directives are added as literal nodes to the begining of the tree with resource to ASTRewrites createLiteralNode and insertBefore methods : ASTRewrite r = ASTRewrite . c r e a t e ( u n i t ) ; IASTNode t b b I n c l u d e = r . c r e a t e L i t e r a l N o d e ( #i n c l u d e \ tbb / +concurrentContainerName + . h\\n ) ; r . i n s e r t B e f o r e ( u n i t , null , t b b I n c l u d e , null ) ; Since include analysis was not the scope of this project and the include information is not present in the ASTs structure multiple includes might be possible when performing multiple transformations, solutions to this problem would be either making a verication if the include to be inserted already exists (with a dierent method of analysis) or usage of the Includator plug-in ( developed by the IFS ) that would organize the includes. [fS10] Also, one last mention must me be made to the auto partitioning feature of this parallel for template implemented. It denes as default the auto partioning property which will set the algorithm to chose the best chunk size to use. More details on an implemented version that allows the altering of the parameters referring to this, can be seen in Section 4.
3.6
ASTHelper
This was a class implemented by Pascal Kesseli in the Deep Space 8 project [Kes10]. In short it is responsible for some CDT AST-related operations, mainly type checking and type converting. Additionally added some functionalities to this class, namelly a printAST(IASTNode) function for debuggin purposes and a namespaceDirectiveExists(IASTNode node, String directive) function that checks if a using namespace directive of a given name exists inside the structure of a given node, used for container transformations.
3.7
Eclipse Integration
In order to develop the project we had to get acquainted with two extensions of the Eclipse CDT framework, these are Codan and Eclipse IMarker Resolution.
3.7.1
Codan
Codan provides an interface for static code analysis plug-ins such as ours. Programmers have direct access to the AST (Abstract syntax trees) by means of a series of interfaces, namely IASTTranslationUnit (which represents the highest node in the trees hierarchy). Codan also provides means to dynamically assign markers to AST Nodes that have been marked with an issue by the static code analysis plug-in. Normally the programmer would have to specify the nodes position in the le in order to assign a marker. Codan abstracts the developer from bothersome interaction with the raw IResources, allowing placement of 29
markers directly in the AST nodes. Codan also provides UI integration with the Eclipse CDT for conguring newly created static code checkers.
3.7.2
IMarker Resolution
Eclipse provides an interface for handling marker resolution in markers placed by means of the process described above. This interface is IMarkerResolutions, additionally these are generated by IMarkerResolutionGenerator. Our implementation revolves around these interfaces. IMarkerResolutionGenerator has to be registered in the plugin.xml le and links markers to the appropriate resolution. In Figure 3.9 our two implementations of this interface are represented.
3.7.3
The unit tests of our plug-in are present in external les with the .rts extension, in the les it is possible to specify the tests to be performed, the structure of the le can be evident in the example bellow: //! QualifiedVectorDeclaration //#ch . h s r . i f s . ds8 . t e s t s . u i . marker . ReplaceContainerWithConcurrentContainerResolutionTest //@. c o n f i g f i l e n a m e=c o n t a i n e r . cpp l i n e =3 // @ c o n t a i n e r . cpp #i n c l u d e <v e c t o r > i n t main ( ) { s t d : : v e c t o r <i n t > vec ; } //= #i n c l u d e <v e c t o r > i n t main ( ) { tbb : : c o n c u r r e n t v e c t o r <i n t > vec ; } Listing 3.7: Test for qualied vector declaration transformations In the example above QualiedVectorDeclaration is the name of the test while ReplaceContainerWithConcurrentContainerResolutionTest is the class of the test, the line=3 refers to the line on which the marker is located, in other words, it refers to the line of the problem we want to test. Other information included in the .rts le relates to the two blocks of code that are to be compared, the rst is pre-transformation and the second is post-transformation, these blocks are divided by the //= sequence of characters. One .rts le might dene several tests. The class referenced by the .rts le (ReplaceContainerWithConcurrentContainerResolutionTest in the case of the example) denes which resolution to apply to tests marker. In addition, both the .rts le and the test class must be added to a running test suite, this is done in the RefactoringTestSuite class. After the test set-up it is necessary to run the application as an Eclipse plug-in unit test. This will produce an eclipse child application that will create
30
temporary les for each test and execute or plug-in. A screen as the one illustrated in Figure 3.14 will pop-up, marking with the color green the successful tests.
31
Figure 3.13: AST representation the loop statement represented in 3.12 after transformation. 33
34
Chapter 4
Conclusion
TBB was decided as an implementation goal, as it oers a good approach to expressing parallelism in a C++ program. It is a library that helps you take advantage of multi-core processor performance without having to be a threading expert. Intel TBB is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanisms for scalability and performance. Although we decided to use TBB we realize that there are some other methodologies to implement multithreaded code, specially with the introduction of the newly added C++11 standard which states as a basis to have its own threading facilities and compliance with its new memory model but still, we believe that this high level approach could still be interesting in a way of motivating non-experts into getting acquainted with this kind of concepts. The decision to use Eclipse PDE was that the basis for our work was already provided as a plug-in in this same environment, from the work of the Deep Space 8 project[Kes10]. All our implemented functionalities were based in his modulated structure which allowed easy integration of newly added classes and interfaces and respective modications of existent ones. All the project was divided into separated modules within dierent packages so that organizing code would be an easier task. Test driven developments were used based on jUnit plug-in-tests which cover the possible transformations to perform.
4.1
Goal Achievement
In the development of this project we encountered many diculties, dealing with technologies that we didnt previously had contact with: C++11 (lambdas and other advanced C++ concepts), Eclipse CDT, Codan, aided parsing, code transformation and AST concepts, specic CDT interfaces such as IMarkerResolution, TBB, parallel computing in general and other Eclipse specic concepts. As such, contrary to our initial assessment of the project, this proved to be a dicult challenge. We believe that the support and supervision provided was adequate and the further advancement in relation to our initial objectives was limited mainly by our management and knowledge restraints.
35
Chapter 5
Appendix
5.1 Walk-through Project Set-up
The following guide attempts to describe all the steps to set-up the development environment for this project. The guide describes the steps in a Windows system, although set-up in other systems follows relatively the same process (tested). Download and Install PDE. Version 3.7.1 of Eclipse PDE. (Newer versions might also work) All Platforms - http://download.eclipse.org/eclipse/downloads/ drops/R-3.7.1-201109091335/index.php Windows 32 - http://www.eclipse.org/downloads/download. php?file=/eclipse/downloads/drops/R-3.7.1-201109091335/ eclipse-SDK-3.7.1-win32.zip&url=http://mirror.switch. ch/eclipse/eclipse/downloads/drops/R-3.7.1-201109091335/ eclipse-SDK-3.7.1-win32.zip&mirror_id=63 Extract the compressed le, run eclipse and select workspace. Install needed plugins Add the plugins repository with Help > Install New Software ... > Add and then install them. Subclipse SVN - http://subclipse.tigris.org/update_1.4. x m2eclipse MAVEN- http://m2eclipse.sonatype.org/sites/ m2e FindBugs (optional) - http://findbugs.cs.umd.edu/eclipse EclEmma (optional) - http://update.eclemma.org Eclipse Metrics (optional) - http://metrics.sourceforge.net/ update CDT source is maintained in GIT source control system, to install GIT plugin, go to Help > Install New Software ..., select Indigo in the Work with eld and search for git. 36
Maven Integration for Eclipse JDK Warning resolution, skip this step if you didnt get a warning like the one illustrated in Figure 5.1 Install JDK All Platforms - http://www.oracle.com/technetwork/java/javase/downloads/jdk7u1-download-513651.html Windows 32 - http://download.oracle.com/otn-pub/java/jdk/7u1b08/jdk-7u1-windows-i586.exe Ubuntu 10.04 - http://happy-coding.com/install-sun-java6-jdkon-ubuntu-10-04-lucid/ Go to eclipse.ini in the Eclipse folder and add -vm <path of JDK> Bear in mind that if JDKis installed before eclipse step b isnt needed. Checkout the project from SVN File > Import > SVN > Checkout Projects from SVN Create a new repository location with url : https://sinv-56017.edu.hsr.ch/svn/ds8/ and accept the certicate. Inside trunk select all ve projects: ch.hsr.ifs.ds8 ch.hsr.ifs.ds8.feature ch.hsr.ifs.ds8.parent ch.hsr.ifs.ds8.tests ch.hsr.ifs.ds8.update
Checkout CDT Projects The easiest way is to check all the projects and then remove the unneeded ones, however you can consult the list bellow for a more selective list of projects (on Windows): org.eclipse.cdt org.eclipse.cdt.Codan-feature org.eclipse.cdt.Codan.checkers org.eclipse.cdt.Codan.checkers.ui org.eclipse.cdt.Codan.core org.eclipse.cdt.Codan.core.cxx org.eclipse.cdt.Codan.internal.ui.cxx org.eclipse.cdt.Codan.ui org.eclipse.cdt.Codan.ui.cfgview org.eclipse.cdt.core org.eclipse.cdt.core.tests org.eclipse.cdt.core.win32 org.eclipse.cdt.debug.core org.eclipse.cdt.debug.mi.core org.eclipse.cdt.debug.mi.ui org.eclipse.cdt.debug.ui 37
org.eclipse.cdt.dsf org.eclipse.cdt.dsf.ui org.eclipse.cdt.gdb org.eclipse.cdt.gdb.ui org.eclipse.cdt.launch org.eclipse.cdt.make.core org.eclipse.cdt.make.ui org.eclipse.cdt.managedbuilder.core org.eclipse.cdt.managedbuilder.gnu.ui org.eclipse.cdt.manageduilder.ui org.eclipse.cdt.platform-feature org.eclipse.cdt.ui org.eclipse.cdt.ui.tests
Window > Open Perspective > Other... > Git Repository Exploring (Check Home directory Warning can be ignored). Paste this url git://git.eclipse.org/gitroot/cdt/org.eclipse.cdt.git in the Git Repository column and Next. In the Branch Selection select the most recent branch, in this case select both cdt 8 0 and master. Choose where to locally store the projects. After the download , right click on Working Directory and Import Projects. Leave the default selected Import Existing Projects, nish the wizard by importing all the projects and wait for the Workspace to build. Change API Baseline Go back to java perspective, go to the Problems View. If see An API baseline has not been set for the current workspace. continue, if not go to step 7. Window > Preferences > Plug-In Development > API Baselines > Add Baseline... Choose a name and a location, typically the directory plugins/ inside eclipse folder (if CDT is installed on PDE). Perform a full build. Removing Unneeded CDT Projects At this stage, the projects that are marking problems are the ones that arent needed. Delete these projects, in Windows they are: org.eclipse.cdt.core.linux org.eclipse.cdt.core.lrparser org.eclipse.cdt.core.lrparser.tests org.eclipse.cdt.core.lrparser.xlc 38
Custom Conguration Left click the main project ch.hsr.ifs.ds8. Run as > Eclipse Application. A new instance of Eclipse has opened, close it. With the main project selected, Run > Run congurations... > Eclipse Application select the newly created Eclipse Application conguration. Arguments > VM arguments: add -XX:MaxPermSize=512M and Apply Child Application In this new Eclipse instance you must create a c++ project to run the plugin through. In this case, change perspective to C/C++, File > New > C++ Project and choose Hello World C++ Project. Install MinGW In Windows, if problems like the one illustrated in Figure 5.1 arise, you might need to install MinGW. On the child Eclipse application, right click on project > Properties > C/C++ General > Paths and Symbols > Includes > GNU C++ and add the following include directories (assuming mingw was installed in in C:/mingw/): c:/mingw/lib/gcc/mingw32/4.5.2/include/c++ c:/mingw/lib/gcc/mingw32/4.5.2/include/c++/mingw32 c:/mingw/lib/gcc/mingw32/4.5.2/include/c++/backward c:/mingw/include c:/mingw/lib/gcc/mingw32/4.5.2/include c:/mingw/lib/gcc/mingw32/4.5.2/include-xed
Bear in mind that if mingw is installed before creating the project step b isnt needed. Rebuild index for project.
39
Figure 5.1: Possible warning when the Java JDK isnt correctly congured in eclipse
5.2
For the development of this project we installed GNUs GCC 4.7 on an Ubuntu 10.04.3 LTS. This was required to access new C++11 features such as lambda expressions. This subsection describes the steps in installing it. Listings represent shell commands. Needed packages installation:
sudo aptg e t i n s t a l l mpc libmpcdev l i b m p f r dev l i b p p l 0 .10 dev l i b c l o o g ppldev z l i b 1 g z l i b 1 g dev l i b c 6 devi 3 8 6 m4 f l e x z l i b c z l i b 1 g z l i b 1 g dev
Next is the installation of gmp, unfortunately gmp isnt available on the Ubuntu repository, so download the latest version from http://packages.ubuntu.com/ pt/source/maverick/amd64/gmp (in this case we used gmp 4.3.2+dfsg.orig.tar.gz). Extract it, access the folder, run the congure script inside, compile and install:
t a r x v f gmp 4 . 3 . 2 + d f s g . o r i g . t a r . gz cd gmp 4 . 3 . 2 + d f s g ./ configure make sudo make i n s t a l l make c h e c k
Now in GNUs website download the latest version of GCC http://gcc.parentingamerica. com/snapshots/ in this case we used gcc-4.7-20120114.tar.bz2. Extract the le 40
Figure 5.2: Possible warning when MinGW is not installed and create a build folder in that directory for the installation:
t a r x j v f gcc 4.7 20120114. t a r . bz2 mkdir b u i l d
Next run the congure script in the GCC extracted directory with the options provided below:
./ configure \ d i s a b l e c h e c k i n g \ e n a b l e l a n g u a g e s=c , c++ \ e n a b l e m u l t i a r c h \ e n a b l e s h a r e d \ e n a b l e t h r e a d s=p o s i x \ programs u f f i x =4.6 \ withgmp=/ u s r / l o c a l / l i b \ withmpc=/ u s r / l i b \ withmpfr=/ u s r / l i b \ w i t h o u t i n c l u d e d g e t t e x t \ withsystemz l i b \ witht u n e=g e n e r i c
Make the build and install it (this step might take some time):
make sudo make i n s t a l l
Finally, run your CDT plug-in and in the child Eclipse instance under your current C++ project properties access C/C++ Build >Settings >GCC C++ Compiler >Miscellaneous >Other ags and append -std=c++11 -std=gnu++11 to the current ags.
41
5.3
Project Plan
Along all of this projects timeline, we have been conducting weekly meetings with our supervisor, Prof. Peter Sommerlad, in order to check on the current state of the project. Between each week we had to prepare our meetings by dening which goals had been done, a ToDo list specifying features to be done and also corrections needed, an agenda with global topics to discuss and nally some ndings we may or not have made during the week. These meetings have given us the chance to discuss possible solutions to existing problems and also adjust our ToDo list. We can clearly see the modications done between Figure 5.3 and Figure 5.4.
Figure 5.4: Last week meeting schedule. We started out this project with great ambition. Although, with the passing of time, we realized that we had to shorten the amount of work. With that comes the most signicant change by eliminating the multidimensional spaces iteration implementation. Also, we did not managed to implement the parallel reduce paradigm which is stated in the last week ToDo list but, preferred instead to improve the parallel for paradigm. Decision which was agreed between the group, after the last meeting had occurred. We believe that the division of time given to implementation and researching was a bit naive, since we needed to start implementing code earlier, at least two 42
or three weeks before. Also, the analysis of Pascals work has been given a short amount of time, since his implementation was still a bit complicated for us to grasp at rst.
43
Bibliography
[Cep10] Shannon Cepeda. Intel parallel amplier under the hood, September 2010. [Com11] C++ Standards Committe. September 2011. [fS10] [HU06] C++11 standard iso/iec 14882:2011,
Institute for Software. Includator. http://includator.com/, 2010. John Hopcroft and Jerey Ullman. Introduction to automata theory, languages, and computation, November 2006.
[Int11a] Intel. Intel parallel studio. http://software.intel.com/en-us/ articles/intel-parallel-studio-home/, 2011. [Int11b] Intel. Intel threading building blocks reference manual, November 2011. [Kes10] [Lee06] Pascal Kesseli. Loop analysis and transformation towards stl algorithms. Deep-Space-8 Project, Fall Semester 2010. Edward A. Lee. The problem with threads. 2006.
[Lio10a] Steve Lionel. Intel parallel advisor how to, September 2010. [Lio10b] Steve Lionel. Intel parallel composer overview, September 2010. [Lio10c] Steve Lionel. Intel parallel inspector how to, September 2010. [Mic12] MSDN Microsoft. Lambda expression syntax, January 2012.
[Moo65] Gordon E. Moore. Cramming more components onto integrated circuits, 1965. [TBB12] Intel TBB. concurrent vector class template reference, January 2012.
44