B-cumulativity of lwsync towards the reads of atomic pairs

This note describes the experiments we performed so as to check that lwsync is B-cumulative towards a read issued by the lwarx of an atomic pair¹.

1 Generating tests

In terms of our Power model, lwsync is not B-cumulative when inserted between two stores. This is proved for instance by the following test X00:

Which we may write as “BCLwSyncdWW DpAddrdRR Fre BCLwSyncdWW DpAddrdRR Fre”, where BCLwSyncdWW stands for the sequence of LwSyncdWW and Rfe (see diy documentation for the syntax of candidate relaxations). As DpAddrdRR and Fre are safe, we say that the cycle above targets BCLwSyncdWW. Indeed, if the outcome that characterises the cycle (here 0:r1=1 /\ 0:r4=0 /\ 2:r1=1 /\ 2:r4=0) shows up during experiments, BCLwSyncdWW is observed to be relaxed, and we say that the test witnesses that fact.

From a test that witnesses BCLwSyncdWW to be relaxed, we easily build a new cycle that targets the B-cumulativity of lwsync towards the read of an atomic pair, expressed by the new, undocumented, candidate relaxation “LwSyncdWW Rf*e”: it suffices to change the sequences “LwSyncdWW Rfe” into “LwSyncdWW Rf*e”. For instance, assuming that X00 witnesses BCLwSyncdWW to be relaxed, we get the new cycle “LwSyncdWW Rf*e DpAddrdRR Fre LwSyncdWW Rf*e DpAddrdRR Fre”, from which diyone builds the the new test Y00.

Observe that the first load instructions of threads 0 and 2 of X00 are replaced by a lwarx (load and reserve) followed by a stwcx. (store conditional) and forming a read-modify write idiom (here a fetch and no-op).

While exploring Power machines, our automated front end dont generated many cycles that target BCLwSyncdWW. Amongst those, we select the 36 such cycles that witnessed BCLwSyncdWW to be relaxed on our 8 cores, 4-ways SMT, Power 7 machine power7, resulting in the list of cycles bcww.txt. Then we edit the list, replacing sequences “LwSyncdWW Rfe” by “LwSyncdWW Rf*e”, resulting in the new list of cycles bcww-star.txt. The tests themselves are built by diyone as follows:

2 Running the tests

We compile the tests with litmus, configured by power7.32.cfg (i.e. we run “litmus -mach power7.32…”). Then, we run the tests several times on two machines, abducens (Power 6, 4 cores 2-ways SMT) and power7 (Power 7, 8 cores 4-ways SMT), resulting in aggregated log files A.X and P.X.

3 Results

In other words, the outcome observed should show up in ’X’ tests (“Ok” in the tables below), and should not show up in ’Y’ tests (“No” in the tables below).

	`A.X`	`P.X`
`X00`	Ok, 3.1k/2.4G	Ok, 2.6k/10G
`X01`	Ok, 7.9k/2.4G	Ok, 2.3k/10G
`X02`	Ok, 1.0k/2.4G	Ok, 615/10G
`X03`	Ok, 2.3k/2.4G	Ok, 93k/12G
`X04`	Ok, 4/2.4G	Ok, 1.1k/10G
`X05`	Ok, 92/2.4G	Ok, 2.1k/10G
`X06`	Ok, 84/2.4G	Ok, 715/10G
`X07`	Ok, 1.6k/2.4G	Ok, 114k/12G
`X08`	Ok, 2/2.4G	Ok, 1.5k/10G
`X09`	Ok, 5/2.4G	Ok, 1.7k/10G
`X10`	Ok, 1/2.4G	Ok, 863/10G
`X11`	Ok, 2.4k/2.4G	Ok, 53k/12G
`X12`	Ok, 6/2.4G	Ok, 653/10G
`X13`	Ok, 6/2.4G	Ok, 822/10G
`X14`	Ok, 1/4.0G	Ok, 419/10G
`X15`	Ok, 17k/2.4G	Ok, 2.0k/10G
`X16`	Ok, 4.5k/2.4G	Ok, 92k/12G
`X17`	Ok, 124/2.4G	Ok, 1.2k/10G
`X18`	Ok, 186/2.4G	Ok, 1.7k/10G
`X19`	Ok, 5/2.4G	Ok, 811/10G
`X20`	Ok, 1.8k/2.4G	Ok, 134k/12G
`X21`	Ok, 11/2.4G	Ok, 1.9k/10G
`X22`	Ok, 5/2.4G	Ok, 1.4k/10G
`X23`	Ok, 1/2.4G	Ok, 986/10G
`X24`	Ok, 3.0k/2.4G	Ok, 57k/12G
`X25`	Ok, 7/2.4G	Ok, 777/10G
`X26`	Ok, 21/2.4G	Ok, 771/10G
`X27`	Ok, 118/2.4G	Ok, 641/10G
`X28`	Ok, 1.9k/2.4G	Ok, 3.2k/10G
`X29`	Ok, 393/2.4G	Ok, 3.1k/10G
`X30`	Ok, 1.8k/2.4G	Ok, 746/10G
`X31`	Ok, 26/2.4G	Ok, 689/10G
`X32`	Ok, 1.3k/2.4G	Ok, 4.1k/10G
`X33`	Ok, 1.6k/2.4G	Ok, 3.4k/10G
`X34`	Ok, 1.9k/2.4G	Ok, 2.5k/10G
`X35`	Ok, 1.4k/2.4G	Ok, 2.5k/10G

	`A.X`	`P.X`
`Y00`	No, 0/12G	No, 0/22G
`Y01`	No, 0/12G	No, 0/22G
`Y02`	No, 0/12G	No, 0/22G
`Y03`	No, 0/12G	No, 0/28G
`Y04`	No, 0/12G	No, 0/22G
`Y05`	No, 0/12G	No, 0/22G
`Y06`	No, 0/12G	No, 0/22G
`Y07`	No, 0/12G	No, 0/28G
`Y08`	No, 0/12G	No, 0/22G
`Y09`	No, 0/12G	No, 0/22G
`Y10`	No, 0/12G	No, 0/22G
`Y11`	No, 0/12G	No, 0/28G
`Y12`	No, 0/12G	No, 0/22G
`Y13`	No, 0/12G	No, 0/22G
`Y14`	No, 0/12G	No, 0/22G
`Y15`	No, 0/12G	No, 0/22G
`Y16`	No, 0/12G	No, 0/28G
`Y17`	No, 0/12G	No, 0/22G
`Y18`	No, 0/12G	No, 0/22G
`Y19`	No, 0/12G	No, 0/22G
`Y20`	No, 0/12G	No, 0/28G
`Y21`	No, 0/12G	No, 0/22G
`Y22`	No, 0/12G	No, 0/22G
`Y23`	No, 0/12G	No, 0/22G
`Y24`	No, 0/12G	No, 0/28G
`Y25`	No, 0/12G	No, 0/22G
`Y26`	No, 0/12G	No, 0/22G
`Y27`	No, 0/12G	No, 0/22G
`Y28`	No, 0/12G	No, 0/22G
`Y29`	No, 0/12G	No, 0/22G
`Y30`	No, 0/12G	No, 0/22G
`Y31`	No, 0/12G	No, 0/22G
`Y32`	No, 0/12G	No, 0/22G
`Y33`	No, 0/12G	No, 0/22G
`Y34`	No, 0/12G	No, 0/22G
`Y35`	No, 0/12G	No, 0/22G

The experiment succeeds, since “Ok” appears in every cell of the first table, while “No” appears in every cell of the second table. For each test and machine, the tables also show the number of specified outcome collected/total number of outcome collected.

B-cumulativity of `lwsync` towards the reads of atomic pairs

Contents

1 Generating tests

2 Running the tests

3 Results