B-cumulativity of lwsync towards the reads of atomic pairs

This note describes the experiments we performed so as to check that lwsync is B-cumulative towards a read issued by the lwarx of an atomic pair1.

The complete sources and logs are available as BCWW (archive).

Contents

1  Generating tests

In terms of our Power model, lwsync is not B-cumulative when inserted between two stores. This is proved for instance by the following test X00:

PPC X00
"Rfe DpAddrdR Fre LwSyncdWW Rfe DpAddrdR Fre LwSyncdWW"
{
0:r2=a; 0:r5=x;
1:r2=x; 1:r4=y;
2:r2=y; 2:r5=z;
3:r2=z; 3:r4=a;
}
 P0            | P1           | P2            | P3           ;
 lwz r1,0(r2)  | li r1,1      | lwz r1,0(r2)  | li r1,1      ;
 xor r3,r1,r1  | stw r1,0(r2) | xor r3,r1,r1  | stw r1,0(r2) ;
 lwzx r4,r3,r5 | lwsync       | lwzx r4,r3,r5 | lwsync       ;
               | li r3,1      |               | li r3,1      ;
               | stw r3,0(r4) |               | stw r3,0(r4) ;
exists
(0:r1=1 /\ 0:r4=0 /\ 2:r1=1 /\ 2:r4=0)

This test derives from the following cycle of candidate relaxations:

Which we may write as “BCLwSyncdWW DpAddrdRR Fre BCLwSyncdWW DpAddrdRR Fre”, where BCLwSyncdWW stands for the sequence of LwSyncdWW and Rfe (see diy documentation for the syntax of candidate relaxations). As DpAddrdRR and Fre are safe, we say that the cycle above targets BCLwSyncdWW. Indeed, if the outcome that characterises the cycle (here 0:r1=1 /\ 0:r4=0 /\ 2:r1=1 /\ 2:r4=0) shows up during experiments, BCLwSyncdWW is observed to be relaxed, and we say that the test witnesses that fact.

From a test that witnesses BCLwSyncdWW to be relaxed, we easily build a new cycle that targets the B-cumulativity of lwsync towards the read of an atomic pair, expressed by the new, undocumented, candidate relaxation “LwSyncdWW Rf*e”: it suffices to change the sequences “LwSyncdWW Rfe” into “LwSyncdWW Rf*e”. For instance, assuming that X00 witnesses BCLwSyncdWW to be relaxed, we get the new cycle “LwSyncdWW Rf*e DpAddrdRR Fre LwSyncdWW Rf*e DpAddrdRR Fre”, from which diyone builds the the new test Y00.

PPC Y00
"Rf*e DpAddrdR Fre LwSyncdWW Rf*e DpAddrdR Fre LwSyncdWW"
{
0:r2=a; 0:r5=x;
1:r2=x; 1:r4=y;
2:r2=y; 2:r5=z;
3:r2=z; 3:r4=a;
}
 P0              | P1           | P2              | P3           ;
 DIY00:          | li r1,1      | DIY01:          | li r1,1      ;
 lwarx r1,r0,r2  | stw r1,0(r2) | lwarx r1,r0,r2  | stw r1,0(r2) ;
 xor r3,r1,r1    | lwsync       | xor r3,r1,r1    | lwsync       ;
 lwzx r4,r3,r5   | li r3,1      | lwzx r4,r3,r5   | li r3,1      ;
 stwcx. r1,r0,r2 | stw r3,0(r4) | stwcx. r1,r0,r2 | stw r3,0(r4) ;
 bne  DIY00      |              | bne  DIY01      |              ;
exists
(0:r1=1 /\ 0:r4=0 /\ 2:r1=1 /\ 2:r4=0)

Observe that the first load instructions of threads 0 and 2 of X00 are replaced by a lwarx (load and reserve) followed by a stwcx. (store conditional) and forming a read-modify write idiom (here a fetch and no-op).

While exploring Power machines, our automated front end dont generated many cycles that target BCLwSyncdWW. Amongst those, we select the 36 such cycles that witnessed BCLwSyncdWW to be relaxed on our 8 cores, 4-ways SMT, Power 7 machine power7, resulting in the list of cycles bcww.txt. Then we edit the list, replacing sequences “LwSyncdWW Rfe” by “LwSyncdWW Rf*e”, resulting in the new list of cycles bcww-star.txt. The tests themselves are built by diyone as follows:

% diyone < bcww.txt
% diyone < bcww-star.txt

2  Running the tests

We compile the tests with litmus, configured by power7.32.cfg (i.e. we run “litmus -mach power7.32…”). Then, we run the tests several times on two machines, abducens (Power 6, 4 cores 2-ways SMT) and power7 (Power 7, 8 cores 4-ways SMT), resulting in aggregated log files A.X and P.X.

3  Results

The objective of this test is twofold:

  1. witnessing (again) BCLwSyncdWW to be relaxed;
  2. and not witnessing the sequence “LwSyncdWW Rf*e” to be relaxed.

In other words, the outcome observed should show up in ’X’ tests (“Ok” in the tables below), and should not show up in ’Y’ tests (“No” in the tables below).

 A.XP.X
X00Ok, 3.1k/2.4GOk, 2.6k/10G
X01Ok, 7.9k/2.4GOk, 2.3k/10G
X02Ok, 1.0k/2.4GOk, 615/10G
X03Ok, 2.3k/2.4GOk, 93k/12G
X04Ok, 4/2.4GOk, 1.1k/10G
X05Ok, 92/2.4GOk, 2.1k/10G
X06Ok, 84/2.4GOk, 715/10G
X07Ok, 1.6k/2.4GOk, 114k/12G
X08Ok, 2/2.4GOk, 1.5k/10G
X09Ok, 5/2.4GOk, 1.7k/10G
X10Ok, 1/2.4GOk, 863/10G
X11Ok, 2.4k/2.4GOk, 53k/12G
X12Ok, 6/2.4GOk, 653/10G
X13Ok, 6/2.4GOk, 822/10G
X14Ok, 1/4.0GOk, 419/10G
X15Ok, 17k/2.4GOk, 2.0k/10G
X16Ok, 4.5k/2.4GOk, 92k/12G
X17Ok, 124/2.4GOk, 1.2k/10G
X18Ok, 186/2.4GOk, 1.7k/10G
X19Ok, 5/2.4GOk, 811/10G
X20Ok, 1.8k/2.4GOk, 134k/12G
X21Ok, 11/2.4GOk, 1.9k/10G
X22Ok, 5/2.4GOk, 1.4k/10G
X23Ok, 1/2.4GOk, 986/10G
X24Ok, 3.0k/2.4GOk, 57k/12G
X25Ok, 7/2.4GOk, 777/10G
X26Ok, 21/2.4GOk, 771/10G
X27Ok, 118/2.4GOk, 641/10G
X28Ok, 1.9k/2.4GOk, 3.2k/10G
X29Ok, 393/2.4GOk, 3.1k/10G
X30Ok, 1.8k/2.4GOk, 746/10G
X31Ok, 26/2.4GOk, 689/10G
X32Ok, 1.3k/2.4GOk, 4.1k/10G
X33Ok, 1.6k/2.4GOk, 3.4k/10G
X34Ok, 1.9k/2.4GOk, 2.5k/10G
X35Ok, 1.4k/2.4GOk, 2.5k/10G
    
 A.XP.X
Y00No, 0/12GNo, 0/22G
Y01No, 0/12GNo, 0/22G
Y02No, 0/12GNo, 0/22G
Y03No, 0/12GNo, 0/28G
Y04No, 0/12GNo, 0/22G
Y05No, 0/12GNo, 0/22G
Y06No, 0/12GNo, 0/22G
Y07No, 0/12GNo, 0/28G
Y08No, 0/12GNo, 0/22G
Y09No, 0/12GNo, 0/22G
Y10No, 0/12GNo, 0/22G
Y11No, 0/12GNo, 0/28G
Y12No, 0/12GNo, 0/22G
Y13No, 0/12GNo, 0/22G
Y14No, 0/12GNo, 0/22G
Y15No, 0/12GNo, 0/22G
Y16No, 0/12GNo, 0/28G
Y17No, 0/12GNo, 0/22G
Y18No, 0/12GNo, 0/22G
Y19No, 0/12GNo, 0/22G
Y20No, 0/12GNo, 0/28G
Y21No, 0/12GNo, 0/22G
Y22No, 0/12GNo, 0/22G
Y23No, 0/12GNo, 0/22G
Y24No, 0/12GNo, 0/28G
Y25No, 0/12GNo, 0/22G
Y26No, 0/12GNo, 0/22G
Y27No, 0/12GNo, 0/22G
Y28No, 0/12GNo, 0/22G
Y29No, 0/12GNo, 0/22G
Y30No, 0/12GNo, 0/22G
Y31No, 0/12GNo, 0/22G
Y32No, 0/12GNo, 0/22G
Y33No, 0/12GNo, 0/22G
Y34No, 0/12GNo, 0/22G
Y35No, 0/12GNo, 0/22G

The experiment succeeds, since “Ok” appears in every cell of the first table, while “No” appears in every cell of the second table. For each test and machine, the tables also show the number of specified outcome collected/total number of outcome collected.


1
We use this technical property to assert the correctness of the locking idiom in our Power semantics.

This document was translated from LATEX by HEVEA.