* Departement of Math and Computer Science, Southern Arkansas University
100 E University, Magnolia, AR 71753, USA.
This research concerns about the performance optimization or improvement of computation running on top of the platform, namely Apache Hadoop in particular. However, the methods and algorithms proposed in this research supposed to be applicable to other platforms as well without loss of generality. The diagram below  shows the computation flow on Hadoop, which primarily consists of one name node and multiple slave nodes.
A MapReduce process  is composed of two primary functions as follows: 1) Map function: takes a set of input/key value pairs and produces a set of intermediate ouput key/value pairs; 2) Reduce function: takes intermediate key2 and a set of values for the key as illustrated in the following. Map (Key1, Value1) -> List (Key2, Value 2) Reduce (Key2, List(Value2)) -> List(Value2)
Thread Management System Calls
A simple yet popular example of MapReduce computation to perform the word count function is shown below  and is used in this research as a base benchmark, for illustration purpose of the flow of a MapReduce computation. An input file is split into blocks and processed at different nodes to reduce overall processing time.
Implementation of Processes in Unix
Threads in Linux
Scheduling in Unix