Data races occur in multi threaded programs when two or more threads access the same memory location concurrently, at least one of the accesses is for writing, and the threads are not using any exclusive locks to control their accesses to that memory. In this tutorial you will learn how to detect and prevent data races using the COEMS technology.
We have recorded a screen-cast that shows you how to use the COEMS Data Race Checker on the hardware. Read on for a detailed description of the tool, and please also browse our training material on race checking. Please follow the subtitles; there is no audio.
Data race example
For example, let us assume you wrote the following C code.
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER; /* Dummy mutex required for data race checking */
int x = 0;
void* count(void *arg) {
for ( int i = 0; i < 10000000; i++ ) {
x++;
}
return NULL;
}
int main() {
pthread_t p1, p2;
pthread_create( &p1, NULL, count, NULL );
pthread_create( &p2, NULL, count, NULL ); //fight!
pthread_join(p1, NULL);
pthread_join(p2, NULL);
printf( "Counted %d\n", x );
return 0;
}
You have two threads calling the function count
. Since there are no synchronisation mechanisms between threads on a shared variable x
a data race may happen. Let us assume that the current value of x
is 5. One thread may read the current value ofx
, but before this thread increments x
, the other thread may also read the same value. Then, both threads may store 6 in x
. Such situation happens any time both threads read the same value. So, the final value of x
may vary in each run.
Usage
Detecting data races in your C program at runtime with TeSSLa requires the following steps:
- Download the COEMS lock instrumentation tool.
- Write a C program.
- Instrument and compile the source code into a binary file using the COEMS lock instrumentation tool and the LLVM compiler.
- Instantiate a TeSSLa specification template for data race detection with the global shared variables and mutexes which you want to observe.
- Run your program, to obtain a trace as a side-effect of our instrumentation.
- Analyze the output trace with the TeSSLa interpreter and a predefined TeSSLa specification template for detecting data races.
Most of these steps can easily be collected in a Makefile. They can be run either in the COEMS hardware specific ecosystem (i.e., ARM box) for online monitoring, or in a personal computer with the COEMS toolkit for offline analysis. In both cases, the analysis procedure and results should be roughly similar, even though the offline analysis is destined for those that cannot afford the benefits of the COEMS hardware. We will refer to the execution traces generated by the first case as “hardware traces”; and the execution traces obtained in the second environment as “software traces”.
The TeSSLa specification for data race detection is based on the lockset algorithm. The TeSSLa specification is generated from a template that is tailored to each program and should be regenerated every time you recompile the C code.
When using hardware-tracing, the COEMS hardware will efficiently pre-process some of the events, and you only need to use the TeSSLa interpreter on the output to check for race reports. In software-tracing, the TeSSLa interpreter must run the entire TeSSLa specification for detecting data races on the generated trace-file instead of the hardware output.
1-Code instrumentation
The first step for producing execution traces that are analyzable by TeSSLa consists on instrumenting and compiling the source code with the COEMS additions. The coems_instrument_and_compile.sh script compiles the source code (e.g., counting.c) into a binary of the same name (counting). By default, the script compiles, instruments and runs the source code for the localhost CPU architecture (i.e., offline trace analysis). The flag -c
compiles and instruments but prevents running the binary. Additionally, the hardware instrumentation flag (-a
) cross-compiles the source code with ITM support for ARM (i.e., online trace analysis).
// Offline monitoring
$ bash coems_instrument_and_compile.sh -c counting.c
// Online monitoring
$ bash coems_instrument_and_compile.sh -c -a counting.c
2-Instantiate the TeSSLa Spec Template
As software- and hardware-tracing produce slightly different output events, we provide another script to produce the correct TeSSLa specification depeding on your choice. The coems_mkDR-script simplifies the user interface with the .tessla generators as it selects which template to use, takes the symbolic names of shared variables and mutex, and among other complementary adjustments, automatically extracts the memory addresses using gdb. For software tracing, simply instantiate the data race template with the addresses of the shared variable x
and lock m
.
Note that for hardware-based tracing, special steps are required to configure the COEMS hardware. Memory addresses are cast to uint16_t
for ARM, so don’t be surprised if you see different addresses in the specification from gdb
would give you. Take a look at the -i
flag, which ensures that the TeSSLa specification is adapted for running on the COEMS hardware infrastructure instead of offline analysis.
// Offline monitoring
$ mkDR.sh -i ./counting "x" "m" > spec.tessla
// Online monitoring
$ mkDR.sh -i ./counting "x" "m" > spec.tessla
3-Running the Code
The execution of the instrumented binary will automatically produce an output trace file (i.e., traces.log) that will feed the TeSSLa interpreter in the next step.
In case of running the instrumented binary in the COEMS hardware for online monitoring, we need to split the TeSSLa specification for data race checking spec into two parts: the first part that runs on the COEMS hardware, and the remainder for post-processing the new events it generates with TeSSLa software interpreter (csplit
is a standard Unix tool that should come as part of your operating system installation). We also need to apply some substitutions; in the example below, you end up with the two files 1sthalf.tessla and 2ndhalf.tessla.
$ mkdir epu
$ csplit -f epu/unsplit spec.tessla '/^-- -----------6 EPU/'
$ mv epu/unsplit00 epu/1sthalf.tessla
$ sed -e s'/--in /in /' -e s'/--out /out /' -e s'/--def on/def on/' epu/unsplit01 | tail -n +3 >epu/2ndhalf.tessla
Compile the first part of the TeSSLa specification for the CEDAR-box using the EPU compiler.
$ java -jar $BASE/bin/isp/epu-compiler.jar -c epu/spec.epu epu/1sthalf.tessla --mapping epu/in.json $BASE/fpga_images/constraints.json
Create a configuration file (e.g. epu/race_config.json) that points to the required files (i.e., counting and spec.epu) according to the tutorial.
$ bash coems_epus.sh epu/in.json
COEMSEPUS=NNNNNNNNNNNN
We need the EPU id/command pairs corresponding to the input streams in the specification that will be used at runtime on the Enclustra board to emit the ITM events to the right EPUs.
We have provided another simple script that use the jq
JSON parser (see Dependencies) to retrieve this information for you.
Warning! Note that the EPU magic number may change on recompilation of your specification, so make sure that you use the correct values for your current spec!
Now for the hardware-side, be sure to use multiple terminals, as you need to have a shell open into the Enclustra-board, in addition to your shell where you start the monitoring and collect the results. We also redirect the output from the COEMS hardware to a file (hwtrace.log), although you can of course also directly pipe it into the subsequent stages. On the PC side:
$ scp counting root@enclustra:
$ $BASE/bin/accemic/cedar_server &
$ $BASE/bin/accemic/cedar_config.exe epu_cfr -f=race_config.json --show-mapping
$ $BASE/bin/accemic/cedar_info.exe cfr_epu -c -a -s -r -d | tee hwtrace.log
Once cedar_info
is running, you can start your binary on the Enclustra board, passing the COEMSEPUS magic number through the shell environment:
# COEMSEPUS=NNNNNNNNNNNN ./counting
4-Analyzing the Execution Traces
Afterwards, the TeSSLa interpreter can analyse the data race conditions by running the TeSSLa specification template. On a software execution trace, simply run:
// Offline monitoring
$ tessla spec.tessla traces.log
In case of executing an online monitoring of execution traces with the COEMS hardware, the TeSSLa interpreter will post-process the execution trace obtained by hardware monitoring using the second half of the TeSSLa specification we splitted previously.
// Online monitoring
$ java -jar $BASE/bin/isp/epu-output.jar mapper epu/spec.epu epu/hwtrace.log | tessla epu/2ndhalf.tessla
The TeSSLa interpreter will print out the cases when the variable x
has been accessed and probably corrupted. The output consists of a timestamp, followed by the id of the thread and the line it executes. As we can see, the data race checker raises a warning on line 9, which corresponds to the increments x++
in the parallel counter.
0: holding_1 = -1
0: holding_0 = -1
7676710499244: threadid = 5662848
7676710499244: holding_0 = 5662848
7676710507032: line = 16
7676710507032: threadid = 5662848
7676710550007: line = 17
7676710550007: threadid = 5662848
7676710576330: line = 9
7676710576330: threadid = 140714313598720
7676710576330: thread_accessing_0 = 140714313598720
7676710576330: error_0_in_line = 9
7676710576330: access_after_pc_0 = ()
7676710576330: access_0 = ()
7676710594749: line = 9
...
This information is useful for defining the area of interest when debugging. Line numbers in the trace help you to pin-point the problematic line of code. However, they are limited in two ways: if the project consists of multiple files, you still need an educated guess on which file it is referring to. Another issue is that you don’t have call stack information, so even if you pinpoint the corresponding line of code, you may still need to infer the surrounding context (who called this method?) from the trace.
Dependencies
To run the COEMS Lock Instrumentation tool you have to install the following requirements first:
- LLVM compiler version 6; packages: llvm-6.0 clang-6.0 llvm-6.0-tools (for hardware- and software tracing)
- jq JSON parser (only for hardware tracing)
- gdb debugger in path either with cross-platform support, or from the Xilinx SDK for the Enclustra ARM 32bit platform
Limitations
Currently, the data race analysis only works with threads using pthread_mutex_lock/unlock operations for protecting the shared variables. An issue are programs that do not use mutexes, but other ways of synchronizing access, e.g. through compare-and-swap instructions. For these patterns we do not provide templates yet, but you should easily be able to adapt one of the existing templates.
Another limitation is the limited value-range of ITM messages. Our instrumentation needs to communicate 32-bit pointer values, but we only have 16 bits available for ITM. To avoid splitting an event into multiple messages and dealing with the required synchronization, we cast the pointer and hence loose precision which may lead to false positives.
Finally, the instrumentation may experience glitching: the ITM/logging-code is not atomic with the pthread-operations. That means there is a very small time-window i) after taking a lock, ii) before releasing a lock, where the trace does not accurately reflect the state of the system. However, any of the resulting possible interleavings are equivalent for the purpose of race checking.
Related Work & Further Reading
The advanced detector above that we have implemented is adapted from the Eraser-algorithm by Savage et al. (“Eraser: A Dynamic Data Race Detector for Multithreaded Programs“, ACM Transactions on Computer Systems, Vol. 15, No. 4, 1997).
You can find more background information on the TeSSLa specification in our open-access article: “Stream-based dynamic data race detection“, NIK 2018.