University of Washington MM5 Benchmarks


Top of Page
COMPAQ Memory Interleave Comparisons
Utah and UW Ensembles
UW Ensemble Summary Table
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs


(see also old bencmarks
(see also CPU2000
benchmarks at Spec.Org

July 2014 Comparisons
                                          1 of same 
Chips    Dom  Hr  ver  best =sec  worst   CPU factor
E5-2650v2       @ 2.6 GHz specfp@2006 wrf=78.4,481.wrf specspeed=89-104
E5-2637v2       @ 3.5 GHz specfp@2006 wrf=89.6 (vs 2620v2 58.7),481.wrf specspeed=
E5-2620v1            @ 2,0GHz b1,b2,b3,b4 on bob cluster, 481.wrf specspeed=57.2
E5-2620 = E5-2620 v2 @ 2.1GHz new a113-a116    ,$406,481spec=471,481.wrf specspeed=65.5
E5645  = E5645 @ 2.4 GHz a109-a112, n18,n19,n20,n21(x12), $551,481.wrf specrate=210,specspeed=42.3
E5620  = E5620 @ 2.4 GHz n1-n8,n15,n16,n17(x8), $387 481.wrf specrate=169,specspeed=42.0
E5-2637       $996, 481specrate=504, 481.wrf specspeed=89.6
E5-2650v3           481specrate=684, 481.wrf specspeed=94.8baseline,98peak(Dell)

a113 and a114 are E5-2620 v2 2.1GHz
b3 (on bob)    is E5-2620 v1 2.0GHz
a113-RAID  55:47 (2 restart times, 7 wrfouts per domain)
 restarts 235.57, wrfouts 115.22, overall writes 350.79
a114-disk  54:36 (2 restart times, 7 wrfouts per domain)
 restarts 236.34, wrfouts 151.86, overall writes 388.19
a113Rsplit 51:59 (2 restart times, 7 wrfouts per domain)
 restarts  51.29, wrfouts  20.48, overall writes  71.78
a114dsplit 52:02 (2 restart times, 7 wrfouts per domain)
 restarts  79.39, wrfouts  57.10, overall writes 136.49
b3 to SSD  55:37 (2 restart times, 7 wrfouts per domain)
 restarts 209.21, wrfouts 124.93, overall writes 334.14
b3 to RAID 56:31 (2 restart times, 7 wrfouts per domain)
 restarts 241.82, wrfouts 151.17, overall writes 392.99
projected savings with SSD: 

                                                             24 x
                                              thruput 1 of same  E5645
Chips              Dom  Hr  ver ncpu time ftime spec.org  factor   factor
/home/disk/sage2/mm5rt/nobackup/runtest/2014071500/d4rerundir
 n18 E5645 @ 2.4 GHz d4  1  361    48 20:27 1147 3.82
 n18 E5645 @ 2.4 GHz d4  1  361    24 35:54 2060 1.80
 n18 E5645 @ 2.4 GHz d4  1  361    12 78:08 4378 
a109 E5645 @ 2.4 GHz d4  1  361    48 21:36 1116
a109 E5645 @ 2.4 GHz d4  1  361    24 39:12 2142
a109 E5645 @ 2.4 GHz d4  1  361    12 80:41 4343
a113 e5-2620v2           1  361    48 14:42  741 3.69  5.86
a113 e5-2620v2       d4  1  361    24 24:42 1330 2.06  3.27
a113 e5-2620v2       d4  1  361    12 48:43 2736 1.0   1.59
n51 E5-2650v3 @ 2.3 GHz  1  35     48 16:12  972  -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz  1  361    48 12:31  627 3.40
n51 E5-2650v3 @ 2.3 GHz  1  361   160  5:56  242 8.81
n51 E5-2650v3 @ 2.3 GHz  1  361   140  6:22  269 7.92
n51 E5-2650v3 @ 2.3 GHz  1  361    80  8:50  415 5.13
n51 E5-2650v3 @ 2.3 GHz  1  361    24 21:02 1133 1.88
n51 E5-2650v3 @ 2.3 GHz  1  361    12 38:06 2131 1.00


n51 E5-2650v3 @ 2.3 GHz  1  361 12.1x20.slot   38:06 2131 1.00
n51 E5-2650v3 @ 2.3 GHz  1  361 48.3x20.slot   12:31  627 3.40
n58 E5-2650v3 @ 2.3 GHz  1  361 48.3x20.socket 11:28  572 3.725
n58 E5-2650v3 @ 2.3 GHz  1  361 48.3x20.slot   11:23  572 3.725
n58 E5-2650v3 @ 2.3 GHz  1  361 48.3x16.socket 10:45  528 
n58 E5-2650v3 @ 2.3 GHz  1  361 48.3x16.node8  10:44  529 using-map-by ppr:8:socket
                     
n58 E5-2650v3 @ 2.3 GHz  1  361 48.3x16.slot   11:22  567 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x20.socket 19:07 1026 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x20.slot   19:12 1029 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x16.socket 17:34  940 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x16.slot   18:53 1011 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x12.socket 16:31  873 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.2x12.slot   18:53 1011 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.3x10.socket 16:07  850 
n58 E5-2650v3 @ 2.3 GHz  1  361 24.3x10.slot   19:13 1037 

48-hr d4 runs for 2014111712 case                       current factor
n21-n1 asstd            48  35       132 7:01:48  25308 1.000
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1 160 3:17:19  11839 2.138 1.596vs80cpu
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1 160 3:11:58  11518 2.197 1.641vs80cpu[/work/restrts]
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1 140 3:30:49  12649 2.001
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1 120 3:55:51  14151 1.788
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1 100 4:32:06  16326 1.550
n51 E5-2650v3 @ 2.3 GHz 48  361.avx1  80 5:15:04  18904 1.339

                                           notftime
n51 E5-2650v3 @ 2.3 GHz  1  361    48 12:31  751 3.04
n51 E5-2650v3 @ 2.3 GHz  1  361   160  5:56  356 6.42
n51 E5-2650v3 @ 2.3 GHz  1  361    80  8:50  530 4.31
n51 E5-2650v3 @ 2.3 GHz  1  361    24 21:02 1262 1.81
n51 E5-2650v3 @ 2.3 GHz  1  361    12 38:06 2286 1.00

n51 E5-2650v3 @ 2.3 GHz  1  35     40 24:01 1441  -- all on 1 machine
n51 E5-2650v3 @ 2.3 GHz  1  35     24 26:20 1580  -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz  1  35     12 33:36 2016  -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz  1  35    140  7:32  452
n51 E5-2650v3 @ 2.3 GHz  1  35    160  6:36  396
n51 E5-2650v3 @ 2.3 GHz  1  35    160  7:03  423 using ensm-ssd ftdiff=369
n51 E5-2650v3 @ 2.3 GHz  1  35     80 12:39  759

spec.org 481spec sep-2014
e5-2620v3 481.wrf rate = 562
e5-2650v3 481.wrf rate = 702   = 1.25
e5-2620v2 481.wrf normal = 71.3 
e5-2650v2 481.wrf normal = 90.9 = 1.27
e5645@2.4itsu  normao= 114
                                                         24 x
                                      thruput 1 of same  E5645
Chips    Dom  Hr  ver ncpu best =sec  spec.org  factor   factor
E52620  d4t1  1  3.5  48  12:34   754  380 474 65 1.778  1.662  a113-a116
E52620  d4t1  1  3.5  24  22:30  1350  380        1.000  1.639  a113-a116
E52620  d4    1  3.5  48  12:28   748  380 474 65 1.778  1.662  a113-a116
E52620  d4    1  3.5  24  22:10  1330  380        1.000  1.639  a113-a116
E52620  d4    1  3.5  12  45:01  2701  380        1.000  1.639  a113-a116
E52620v1      1  361  12  63:10 @ 2,0GHz b1
E52620v1      1  361  12  54:20 @ 2,0GHz b3

E5645   d4    1  3.5  48  20:43  1243  208        1.750  1.000  a109-a112e
E5645   d4    1  3.5  24  36:20  2180  208 196 40 1.000  1.000  a109-a112e
E5645   d4    1  3.5  12  71:02  4262  208 196 40 1.000  1.000  a109-a112e
E5645   d4    1  3.5  48  24:39  1479  208 196 40 1.000  1.000  n18-n21,usingnode1rundir,4x12

E5620   d4    1  3.5  72  13:52   832  170 195 42 4.362  1.991  
E5620   d4    1  3.5  60  14:58   898  170 195 42 4.041  1.991  
E5620   d4    1  3.5  48  18:15  1095  170 195 42 3.314  1.991  n1-n6,6x8
E5620   d4    1  3.5  48  18:15  1095  170 195 42 3.314  1.991  
E5620   d4    1  3.5  24  32:06  1926  170        1.884  1.132  
E5620   d4    1  3.5  12  60:29  3629  170        1.000



COMPAQ Memory Interleave Comparisons
######################################## 6/16/2001
Running Ensemble domain benchmarks with different memory interleaving
C = Compaq (C500 = ES40, C667=DS20, C833=ES-40)
NOTE: executable was NOT recompiled between runs!

                                          1 of same   INTERLEAVE
Chips    Dom  Hr  ver  best =sec  worst   CPU factor    factor    # of runs
 ES-40 1-way interleave (May tests on chocolat, .5G and 2G module)
C833x1x4 uw   24  3.3 199:17 11957 12386     0.785       1.000     2x4
C833x1   uw   24  3.3 156:21  9381  9431     1.000       1.000     2
C833x2   uw   24  3.3  90:31  5431  5456     1.727       1.000     2
C833x4   uw   24  3.3  52:52  3172  3284     2.957       1.000     2
 ES-40 2-way interleave (June tests on chocolat, 2x2G + 1G module)
C833x1x4 uw   24  3.3 168:12 10092 10323     0.907       1.185     2x4
C833x1   uw   24  3.3 152:32  9152  9227     1.000       1.025     2
C833x2   uw   24  3.3  85:34  5134  5161     1.783       1.058     2
C833x4   uw   24  3.3  46:26  2786  2857     3.285       1.139     2
 ES-40 4-way interleave (July tests on chocolat, 4x2G module)
C833x1x4 uw   24  3.3 158:48  9528  9552     0.940       1.255     2x4
C833x1   uw   24  3.3 149:12  8952  8961     1.000       1.048     2
C833x2   uw   24  3.3  82:04  4924  4932     1.818       1.103     2
C833x4   uw   24  3.3  44:12  2652  2676     3.376       1.185     2
           



Utah and UW Ensemble Benchmarks
######################################## 5/3/2001
Running Ensemble domain benchmarks on various platforms
S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000)
C = Compaq (C500 = ES40, C667=DS20)
A = AMD Athlon (TCP/IP protocol)
V = AMD Athlon with Via network card
I = Intel
A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!)
                                                              
                                          1 of same  Compaq 667MHz
Chips    Dom    ver    best =sec  worst   CPU factor    factor    # of runs
C667x1   utah   3.3  142:55  8575  8982     1.000       1.000     2
C667x2   utah   3.3   80:57  4857  4866     1.765       1.765     2
                       
C500x1   utah   3.3  214:35 12875  ----     1.000       0.666     1
C500x2   utah   3.3  109:53  6593  6615     1.953       1.301     2
C500x4   utah   3.3   68:16  4096  4334     3.143       2.094     2
           
A800x1   utah   3.3  318:05 19085 21631     1.000       0.449     2
I1400x1  utah   3.3  270:xx 
A1333x1  utah   3.3 (176:51)10611 mpp/1.08              0.808     est.

C600x1   utah   3.3  174:02 10442 10502     1.000       0.821     2 DS-10

C833x1x4 utah   3.3  142:19  8539  8662     0.845       1.004     2x4
C833x1   utah   3.3  120:18  7218  7232     1.000       1.188     2
C833x2   utah   3.3   67:21  4041  4058     1.786       2.122     2
C833x4   utah   3.3   39:13  2353  2356     3.068       3.644     2
                  
S400x12  utah   3.3   51:26  3086  ----     -----       2.779     1
S400x16  utah   3.3   40:19  2419  ----     -----       3.545     1
S400x23  utah   3.3   33:40  2020  ----     -----       4.245     1
                                         
                     |       Timings     |Same CPU | C667x1 Factors|Number
Chips  |Domain| Code | best =sec  worst  |Factor   | mpp | non-mpp |of runs
C667x1   utah  3.3mpp 155:18  9318  9340   1.000   1.000  0.920     2 DS-20E
C667x2   utah  3.3mpp  90:57  5457  5524   1.708   1.708  1.571     3
                                          
C500x1   utah  3.3mpp 214:44 12884  ----   1.000   0.723  0.666     1 ES-40
C500x2   utah  3.3mpp 121:20  7280  7290   1.770   1.280  1.178     2
C500x4   utah  3.3mpp  78:36  4716  ----   2.732   1.976  1.818     1
       

A1333x1  utah  3.3mpp 191:21 11481  ----   1.000   0.812  0.747     ? Beowulf
A1333x2  utah  3.3mpp 118:09  7089  ----   1.620   1.314  1.210     ?
A1333x4  utah  3.3mpp  78:45  4725  ----   2.430   1.972  1.815     ?
A1333x8  utah  3.3mpp  51:58  3118  ----   3.682   2.988  2.750     ?
A1333x12 utah  3.3mpp  46:18  2778  ----   4.133   3.354  3.087     ?
A1333x16 utah  3.3mpp  42:11  2531  ----   4.536   3.682  3.388     ?
       
V950x1   utah  3.3mpp 282:28 16948  ----   1.000   0.550  0.506     ?
V950x2   utah  3.3mpp 155:03  9303  ----   1.822   1.002  0.922     ?
V950x4   utah  3.3mpp  88:30  5310  ----   3.192   1.755  1.615     ?
V950x8   utah  3.3mpp  54:10  3250  ----   5.215   2.867  2.638     ?
V950x12  utah  3.3mpp  46:28  2788  ----   6.079   3.342  3.076     ?
V950x16  utah  3.3mpp  40:46  2446  ----   6.929   3.809  3.506     ?

24-hour UW Ensemble Runs
                                          1 of same  Compaq 667MHz
Chips    Dom  Hr  ver  best =sec  worst   CPU factor    factor    # of runs
 DS-20E
C667x1   uw   24  3.3 188:12 11292  ----     1.000       1.000      1 (97%) 
C667x1    singleproc  189:37 11377  ----     x.xxx       x.xxx      1 (97%)
C667x2   uw   24  3.3 110:34  6634  7017     1.702       1.702      1 (191%) 

 DS-10
C600x1   uw   24  3.3 228:30 13710 14058     1.000       0.824      2 DS-10 

 ES-40                
C500x1   uw   24  3.3 251:25 15085  ----     1.000       0.749      1     
C500x2   uw   24  3.3 143:12  8592  8647     1.756       1.314      2     
C500x4   uw   24  3.3  88:02  5282  5442     2.856       2.138      2 (389%)
C500x4   uw   48  3.3 174:41 10481                                  2 (383%)
                 
 ES-40                
C833x1x4 uw   24  3.3 158:48  9528  9552     0.940       1.185     2x4
C833x1   uw   24  3.3 149:12  8952  8961     1.000       1.261     2
C833x2   uw   24  3.3  82:04  4924  4932     1.818       2.293     2
C833x4   uw   24  3.3  44:12  2652  2676     3.376       4.258     2
 
 AMD Dual-Processor 1.2 GHz Tyan Motherboard, PGF77 compiler
A1200x2  uw   48  3.3 243:59 14639                       1.543
  FCFLAGS = -I$(LIBINCLUDE) -fast -Mcray=pointer -tp p6 -pc 32 -byteswapio -Mvect=prefetch,cachesize:393216 -mp -Mnosgimp 
  LDOPTIONS = $(FCFLAGS)
  LOCAL_LIBRARIES = -lnsl -lm

 IBM PowerPC 375 MHz (-O2 is faster than new compiler with -O3)
P375x1   uw   48  3.3 662:48 39768 39791     1.000       x.xxx     2
P375x2   uw   48  3.3 346:10 20770 20784     1.915                 2
P375x4   uw   48  3.3 207:09 12429 12515     3.200                 2

S400x27  uw   48  3.3  67:22  4042


48-hour Jun01 real-time 36/12 km with 37 levels 10/1/2001 and 3/21/2002:
                                       rainier-factor
A1200x1 uwrt37 48 3.4 825:52  49552       
A1200x2 uwrt37 48 3.4 471:01  28261       0.74
C500x4  uwrt37 48 3.4 346:22  20782       1.00
S400x27 uwrt37 48 3.4 108:36   6516       3.19   3/21/2002 ws7guide40
S400x27 uwrt36 48 3.4 112:00   6720       workshop7,guide3.8 (11% wow!)
S400x27 uwrt37 48 3.4 125:09   7509       2.77   workshop6,Guide3.8
S400x27 uwrt36 48 3.4 142:57   8577       workshop7,noguide (14% slower)


UW Ensemble Domain Summary Table
S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000)
C = Compaq (C500 = ES40, C667=DS20)
A = AMD Athlon (TCP/IP protocol)
A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!)
V = AMD Athlon with Via network card
I = Intel
P = IBM PowerPC

implied comparison for MM5 between AMD and Intel and Sun is 177.mesa in
 www.spec.org cfp2000, this doesn't work too well for AMD vs IBM P375,
 however, and it underestimates Compaqs.  None of them match too well. 

                                          1 of same  Compaq 667MHz
Chips    Dom  Hr  ver  best =sec  worst   CPU factor    factor    # of runs
A1200x1  uw    3  3.3  25:51  1551  ----     1.000       0.901      1
A1200x2  uw  (3)  3.3  15:14   914  ----    ~1.7         1.528      1 1/16 of
                                                                      48-hr run
I1400x1  uw    3  3.3  36:45  2205  ----     1.000       0.634      1
C833x1   uw  (3)  3.3  18:39 (1119) ** 1/8 of 24-hr **   1.248     (2)
C833x2   uw  (3)  3.3  10:15  (615)          1.818       2.272     (2)
C833x4   uw  (3)  3.3   5:31  (331)          3.376       4.221     (2)
C667x1   uw    3  3.3  23:17  1397  1422     1.000       1.000      2
C667x2   uw    3  3.3  13:44   824   827     1.695       1.695      2
C600x1   uw    3  3.3  28:49  1729  1752     1.000       0.808      2
C500x1   uw    3  3.3  31:14  1874  1889     1.000       0.745      2
C500x2   uw    3  3.3  17:46  1066  1077     1.758       1.311      4
C500x4   uw    3  3.3  10:36   636   661     2.947       2.197      2
S400x1   uw    3  3.3  70:52  4252  4260     1.000       0.329      2
S400x2   uw    3  3.3  35:31  2131  2140     1.995       0.656      2
S400x4   uw    3  3.3  18:12  1092  ----     3.894       1.280      1
S400x8   uw    3  3.3   9:35   575   578     7.395       2.430      2
S400x12  uw    3  3.3   6:46   406   407    10.473       3.441      2      
S400x23  uw    3  3.3   4:26   266   288    15.985*      5.252*     2
S750x1   uw    3  3.3  43:32  2612  ----     1.000       0.535      1
* --> run is too short to get scaling factors correct, we generally
      see 0.8 scaling factor, so it should be about 18.4 times as fast
      as a single processor run


Conclusions for Utah and UW Ensemble Benchmarks
  1) MPP is about 8% slower than regular code.
  2) 1 Atlon 1.3GHz chip for $1000 (+ pittance for PG compiler) is
     nearly identical to DS-10 600MHz!!!
  3) 4 Athlon 1.3GHz chips run MPP code as fast as ES-40
     with 4x500MHz chips ($4K vs $40K?).
  4) For ensembles, it makes more sense to run multiple MM5s on single
     processors than to run back to back on all processors.
  5) 16 Athlon 950 MHz chips coupled with Via networking
     can run code as fast as 16 Sun E6500 400 MHz chips.  For
     overall speed in high resolution runs, you need to 
     make a large cluster to equal the power of a Sun E6500
     because scaling is so poor in clusters.
  6) The faster the chip, the worse the scaling.
  7) To run our current ensembles (36/12 max dimension 101x137) out to
     48 hours, we should expect these results for MM5v3.x standard
     physics:

      Athlon  1333 MHz  (7:44) (81% speed of 667 running same code)

       DS-10   600 MHz   7:37  
 ?2001 DS-10   667 MHz   6:16  (Do they make this? or just 600 MHz?)
 ?2002 DS-10   883 MHz   5:13  (assuming ES40 performance of chip)
              
       DS-20E 1x667 MHz  6:16  (run 2 simultaneously in less than
           2x3:41 = 7:22)
       DS-20E 2x667 MHz  3:41   
 Jun01 DS-20E 2x883 MHz  2:57  (Due out ...)
        
       ES-40 4x500 MHz   3:01 
       ES-40 4x667 MHz   2:11  (too bad we never got that promised
                                free chip upgrade!)
       ES-40 4x883 MHz   1:43  (but you shouldn't run it this way, instead
                                run 4 simultaneously in 6:38 vs this
                                6:52) 
    23 of tahoma's CPUs  1:00 
    
    

Reisner2 and Vertical Levels

Top of Page 
Utah and UW Ensembles 
Conclusions for Utah and UW Ensembles 
Reisner2 
Vertical Levels 
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2 
Older Runs


######################################## 4/7/2001 - present
Comparing v3.4 runtimes of current and new domains on Sun E6500
 23x400 MHz


[c,m] = [current,montana]                 MB                        
 Dom  Hr  ver  best =sec  worst  f77    Guide   stack TAP,BUF,INC # of runs   
cd1d2  6 3.3   11:48   708   793 6.0 3.8  dF     384 15,15,4,1       4
cd1d2  6 3.3   11:41   701   785 6.0 3.8  dT     384 15,15,4,1       5
cd1d2  6 3.3   10:18   618   678 6.0 3.8  dF     384   365,1         4
cd1d2  6 3.3   10:19   619   691 6.0 3.8  dT     384   365,1         3
cd1d2  6 3.3Re 18:57  1137  1191 6.0 3.8  dF     384   365,1         3
       ==> Reisner2 cost is 1.84 for version 3.3!
cd1d2  6 3.4Re 19:08  1148  1198 6.0 3.8  dT     384   15,15,4,1     5
cd1d2 24 3.4   48:47  2927  ---- 6.0 3.8  dT     384   15,15,4,1     1
cd1d2 24 3.4Re 88:33  5313  ---- 6.0 3.8  dT     384   15,15,4,1     1
md1d2 24 3.4Re 113:01 6781  ---- 6.0 3.8  dT     384   15,15,4,1     1


Top of section
[c,m] = [current,montana]                 MB                        
Dom  Hr lvls best =sec  worst   f77    Guide   stack TAP,BUF,INC # of runs   
cd1d2 24 32   48:47  2927  ----  6.0 3.8  dT  384   15,15,4,1     1
md1d2 24 32   59:46  3586  ----  6.0 3.8  dT  384   15,15,4,1     1
cd1d2 24 37   59:57  3597  ----  6.0 3.8  dT  384   15,15,4,1     1
md1d2 24 37   71:57  4317  ----  6.0 3.8  dT  384   15,15,4,1     1
md3    8 32   81:51  4911  ----  6.0 3.8  dT  384   60,60,1,1     1
md3    8 37   94:06  5646  ----  6.0 3.8  dT  384   60,60,1,1     1
cd1d2 60 32 2:02:xx  ----  ----  6.0 3.8  dT  384   15,15,4,1     est.
cd1d2 60 37 2:30:xx  ----  ----  6.0 3.8  dT  384   15,15,4,1     est.
md1d2 60 32 2:29:xx           est.
md1d2 60 37 3:00:xx           est.
md3   24 32 4:06:xx           est.
md3   24 37 4:50:xx           est.

--------------------------------
60-hour forecast on tahoma (23 processors):
          Times (hh:mm)    
Domain    Code  Lvl Phys Clock   Start  End (am/pm)
-------   ----  --  ---  ------  -----  -----------
Cur d1d2  2.12  32  simp 1:50   7:16  9:06
Cur d1d2  3.4   32  simp 2:02    "  9:18
Cur d1d2  3.4   37  simp 2:30      "  9:46 (+ 22.9%)
Mon d1d2  3.4 32  simp 2:29      "  9:45 (new domain + 22.5%)
Mon d1d2  3.4 37  simp 3:00      "    10:16 (+ 20% for levels)

60-hour forecast on tahoma (29 processor estimates, 10.2% faster than 23):
          Times (hh:mm)    
Domain    Code  Lvl Phys Clock   Start  End (am/pm)
-------   ----  --  ---  ------  -----  -----------
Cur d1d2  2.12  32  simp (1:40)   7:16  8:56
Cur d1d2  3.4   32  simp (1:50)    "  9:06
Cur d1d2  3.4   37  simp (2:16)    "  9:32
Mon d1d2  3.4 32  simp (2:15)    "  9:31
Mon d1d2  3.4 37  simp (2:43)    "     9:59


Reisner2
--------------------------------
60-hour forecast on tahoma (23 processors):
          Times (hh:mm)    
Domain    Code  Lvl Phys Clock   Start  End (am/pm)
-------   ----  --  ---  ------  -----  -----------
Cur d1d2  3.4   32  simp 2:02   7:16  9:18
Cur d1d2  3.4   32  rei2 3:41    " 10:59
Mon d1d2  3.4   32  rei2 4:42    " 11:58
Mon d1d2  3.4   37  rei2 (5:23)    " 12:39


36-hour 4km domain
------------------
          Times (hh:mm)    
Domain    Code  Lvl Phys Clock   Start  End (am/pm)
-------   ----  --  ---  ------  -----  -----------
Cur d3    2.12  32  simp 3:33   9:18 12:51
Cur d3   3.4 32  simp (3:56)    "  1:14
Mon d3   3.4 32  simp 4:06   9:45  1:51
Mon d3   3.4 37  simp 4:42  10:16  2:58 ( + 35.4%)




Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2

Top of Page
Utah and UW Ensembles
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs


######################################## 1/31/2001 - 2/12/2001
Comparing Guide 3.8 and 3.9 and Workshop 6.0 and 6.0u1
 tahoma, Sun E6500, jul00 d1d2 runs in /tmp:
                                       sched=
Dom  Hr  ver   mm:ss = seconds   f77    Guide   stack TAP,BUF,INC # of runs   
d1d2  3  2.12   4:10      250    6.0    3.8  d   200    15,15,4,1    5  
d1d2  3  2.12   4:53      293    6.0u1  3.8  d   200    15,15,4,1    5  
          
                     MB                        
Dom  Hr  ver  best =sec  worst   f77    Guide   stack TAP,BUF,INC # of runs   
d1d2  6  2.12 10:43   643   676  6.0    3.8  dT  200    15,15,4,1    4
d1d2  6  2.12 10:38   638   662  6.0    3.8  dF  200    15,15,4,1    4
d1d2  6  2.12 10:59   659   694  6.0    3.9  rT  200    15,15,4,1    4
d1d2  6  2.12 11:03   663   681  6.0    3.9  rF  200    15,15,4,1    4
d1d2  6  2.12 14:06   846   864  6.0    3.9  dU  200    15,15,4,1    5
d1d2  6  2.12 11:44   704   739  6.0    3.9  rU  200    15,15,4,1    5
d1d2  6  2.12 11:48   708   718  6.0u1  3.8  dU  200    15,15,4,1    2
d1d2  6  2.12 11:25   685   691 6.0u1np 3.8  dU  200    15,15,4,1    2
d1d2  6  2.12 11:38   698   727  6.0u1  3.9  rU  200    15,15,4,1    2
d1d2  6  2.12 15:19   919   927  6.0u1  3.9  dU  200    15,15,4,1    4
d1d2  6  2.12 10:26   626   697 6.0u1np 3.9  rT  200    15,15,4,1    4
d1d2  6  2.12 10:14   614 ! 834 6.0u1np 3.9  rF  200    15,15,4,1    2
       np = xprefetch=no^      ^ -WG,scheduling= {r | d} 
                                              ^ OMP_DYNAMIC = {True | Unset |  
                                                               False } 
                             
tahoma, Sun E6500, jul00 d3 runs in /tmp:         
                     MB                        
Dom  Hr  ver  best =sec  worst   f77    Guide   stack TAP,BUF,INC # of runs   
d3    2  2.12 17:22  1042  1167  6.0    3.8  rT  200     60/15       2
d3    2  2.12 17:41  1061  1086  6.0    3.8  rF  200     60/15       2
d3    2  2.12 17:18  1038  1071  6.0    3.9  rT  200     60/15       3
d3    2  2.12 17:21  1041  1074  6.0    3.9  rF  200     60/15       4
d3    2  2.12 17:55  1075  1088  6.0u1  3.9  rT  200     60/15       2
d3    2  2.12 17:55  1075  1086  6.0u1  3.9  rF  200     60/15       2
d3    2  2.12 18:24  1104  1140 6.0u1np 3.9  rT  200     60/15       2
d3    2  2.12 18:18  1098  1098 6.0u1np 3.9  rF  200     60/15       2
d3    2  2.12 19:34  1174  1185 6.0u2   none  F  3000    60/15      2
d3    2  2.12 19:30  1170  1204 6.0u2   none  T  3000    60/15      3
d3    2  2.12 19:13  1153  1167 6.0u2np none  F  3000    60/15      2
d3    2  2.12 19:13  1153  1180 6.0u2np none  T  3000    60/15      2

d3    2  2.12 16:49  1009  1032 6.0 3.8   T  3000  360      2
d3    2  2.12 16:52  1012  x 6.0hyd 3.8   F  3000  360      1
d3    2  2.12 16:57  1017  1179 6.0hyd 3.8   T  3000  360      2
d3    2  2.12 17:45  1065  x 6.0u1 3.9   T  3000  360      1
       np = xprefetch=no^      ^ -WG,scheduling= {r | d} 
                                              ^ OMP_DYNAMIC = {True | Unset |  
                                                               False } 
######################################## 1/31/2001


Older benchmarks

Top of Page
Utah and UW Ensembles
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs


######################################## 11/7/2000
Ensemble 36km domain only, 48 hour runs:
version mm:ss = seconds   BDYFRQ # of runs   Machine (CPUs)
  2.12  51:57     3117     180      2        rainier, COMPAQ ES40 (4)
  2.12  69:23     4163     180      2        glacier, COMPAQ DS20 (2)
  2.12  20:22     1222     180      2        tahoma, Sun E6500 (23)
########################################################################

######################################## 7/19/2000
tahoma, Sun E6500, jul00 runs in /tmp:
Dom  Hr  ver   mm:ss = seconds   f77    Guide   stack TAP,BUF,INC # of runs   
d1d2  6  2.12  12:23      743    6.0      3.8    200        180      2  
d1d2  6  2.12  12:23      743    6.0      3.8    200        180      2  
d1d2  6   3.3  12:16      736    6.0      3.8    200        180      2  

d1d2  6  2.12  12:53      773    6.0      3.8    200    15,15,4,1    2  
d1d2  6   3.3  13:47      827    6.0      3.8    200    15,15,4,1    2  *
d1d2  6   3.3  13:56      836    6.0      3.8    200    15,0,4,1     2  *
d1d2  6   3.3  14:28      868    6.0      3.8    200    15,0,4,1     2  * origouttap

d1d2  6  2.12  13:17      797    6.0      3.8    200    15,15,1,1    2  * I/O = 3%
d1d2  6   3.3  15:10      910    6.0      3.8    200    15,0,1,1     2  
*these indicate that my mods to outtap.F are not slowing down the model 
 in any significant way

d1d2 60  2.12 127:27  7647  6.0   3.8  200 15,15,4,1  2000100600
d3   24  2.12 217:58   13,078  6.0   3.8  200 60,60,1,1  2000100600
d1d2 48  2.12 108:33     6513    6.0      3.8    200    15,15,4,1    1  
d1d2 48  2.12 104:47     6287    6.0      3.8    200        180      1  
d1d2 48   3.3 108:55     6535    6.0      3.8    200    15,15,4,1    1 
d1d2 48   3.3 102:35     6155    6.0      3.8    200    15,15,4,1    1 
d3   24  2.12 220:45   13,245  6.0   3.8  200 60,60,1,1    1
d3   24   3.3 248:12   14,892  6.0   3.8  200 60,60,1,1    1 12.4%
########################################
  

########################################
tahoma, Sun E6500, 3-hour jul00 4km domain runs:
version mm:ss = seconds   f77    Guide  stacksize  BDYFRQ # of runs   
  2.12  28:35     1715    6.0      3.8    200         60      1  
  3.3   30:37     1837    6.0      3.8    200         60      1  ( 7.1% slower)
  3.3   30:30     1830    6.0      3.8    400         60      1  ( 7.1% slower)
              
  2.12  30:38     1838    5.0      3.7    200       60      3       
  3.3   34:31     2071    5.0      3.7    200       60      3  (12.6% slower)
########################################         
              
########################################         
tahoma, Sun E6500, 2-hour jul00 4km runs in /tmp:
version mm:ss = seconds   f77    Guide  stacksize  BDYFRQ # of runs   
  2.12  16:37      997    6.0      3.8    200         60      2       
  3.3   19:32     1172    6.0      3.8    200         60      1  (17.5% slower)
                                                        
  2.12  18:32     1112    5.0      3.7    200         60      2       
  3.3   20:43     1243    5.0      3.7    200         60      2  (11.8% slower)
########################################

########################################
rainier, COMPAQ ES40, 1-hour jul00 4km runs:
  where:
  Add'l Options = (-tune host -inline speed -pipeline -speculate by_routine)

version mm:ss = seconds   BDYFRQ # of runs   Add'l Options
  2.12  23:14     1394      60      1        yes
  3.3   24:24     1464      60     2        yes    (5% slower)

  2.12  24:13     1453      60     1       no
########################################################################


dec99 (December 1999) real-time MM5 and ensemble typical runtimes:
(I/O gtar means Ernie was running tape backups
that resulted in heavy I/O slowdowns for
/home/mm5rt rundirs):
----------------------------------------------
                       s => split CPUs between d1d2/d3 runs
                       o => old tahoma CPUs (248 MHz)
MM5RUNDIR  I/O  static   
    gtar sched  CPUs run      tahoma   rainier
---------  ---- ---    ---  ---- ---------------  ----------------------
/tmp       no   yes    s23  d1d2 1:39:54 (wet day 2000061200) 8:57:35 pm
/tmp       no   yes    s23  d1d2 1:45:18 (wet day 2000060600)
/tmp       no   yes    s23  d3   2:40:18 (wet day 2000060600)
/tmp       no   yes    s23  d3   2:40:18 (wet day 2000060600)
/tmp       no   yes     23  ENS  0:57:46 (cmcgem 2000060500)
/tmp       no   yes    *24  ENS  0:52:58 (nogaps 2000060200)

mm5rt      no    no     o13  1,2 3:20:23 (best)
mm5rt     yes    no     o13  1,2 3:32:25 (worst)

mm5rt      no yes       o13  1,2 3:02:23 (best)
mm5rt     yes yes       o13  1,2 3:29:28 (worst)
    
/tmp       no   yes     o13  1,2  2:56:22 (fast, dry day 2000012900)
/tmp       no yes       o13  1,2 3:02:23 (fast, wet day 2000020100)
/tmp      yes yes       o13  1,2 3:06:28 (one run)
/tmp       no yes       o13  1,2 3:05:25 (one run)

ensemble runs:
mm5rt      no   no    yes 3:12:24
/tmp       no   yes   yes 2:46:25 (best, dry day 2000020400 NGM)
mm5rt      no   yes   yes 2:48:21 (best)
mm5rt      no   yes   yes 2:53:00 (avg)
mm5rt     yes   yes   yes 3:05:32 (worst)
/tmp       no           yes    2:45:59 (dry,ngm 2000041100)
/tmp       no           yes    2:50:09 (cmc 2000032900)
rmm5rt    yes    yes    3:13:06 (best)
          yes           3:18:36 (worst)
                          yes    3:24:05 (worst, Sunday 2000020700)

d3 simulations:
/tmp (-O4)                no                     6:11:34 (some pcpn, 032100)
/tmp (-O4)                no                     6:18:52 (wet day, 031900)
/tmp (-O4) yes            no    6:40:33 (convective, 00050200)
/tmp (full memory)                               6:37:04 (23,824 sec 4.6%)
rmm5rt    no    no    6:47:23 (best, dry day 020400)
rmm5rt    no    no    6:55:30 (avg)
rmm5rt    no    no    7:02:00 (wet day, 020100) 
rmm5rt    no    no    7:06:28 (worst)
/tmp    yes    no    7:02:27 (dry day, first backup
                
using /tmp)
rmm5rt    yes    no    7:19:32 (worst)


formulas for calculating times of different domains (same physics
packages) on tahoma:

 y-grid pts * x-grid pts * levels * 36-km time step factor

 current 36      45
 101x137x32x1    81x110x22x(36/45)
 (67 minutes)     (24 minutes)
 +
 current 12      15
 88x88x32x3      70x70x22x3x(36/45)
 (113 minutes)    (39 minutes)
 =
 180 minutes     63 minutes

########################################################################
## 1-hour d3 tests (3/28/2000):
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    2:27:29    8,849     1.00      1.000     171.000
 13      14:13      853    10.37      0.798
           
 853/765 = 11%   

Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    2:27:26    8,846     1.00      1.000     171.000
 13      12:45      765    11.56      0.889              ws5.0guide3.7
 13      12:30      750    11.79      0.907              ws5.0guide3.7nohoard
 13  11:44     704            ws6.0guide3.7hoardxvect=no
 13      13:03      783                                  ws6.0guide3.7hoard
 13      17:57     1077                                  ws6.0guide3.7nohoard
Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling
  #   wall clock time speed up old  efficiency  balance   
 of  h:mm:ss  seconds  over    tah speed up/  new/old
cpus                      1 cpu oma+  #cpus    exectuable   other options
  1   85:24 5,124     1.00  1.73 1.000     new v8    ws6.0guide3.7hoardxvect=no
  1   84:38 5,078*   1.00 1.74 1.000     new v8  ws6.0guide3.8nohoardxvect=no
 13    6:46   406    12.51  1.73 0.962     new v8    ws6.0guide3.7hoardxvect=no
 13    6:47   407    12.60       0.968     new v8    ws6.0guide3.7nohoardxvect=no
 13    6:55   415    12.24       0.941     new v8    ws6.0guide3.8hoardxvect=no
 13    6:45   405    12.54       0.964     new v8    ws6.0guide3.8nohoardxvect=no
 23    4:07   247    20.56 +2.85 0.894     new v8    ws6.0guide3.7hoardxvect=no
 23    4:08   248    20.66       0.898     new v8    ws6.0guide3.7nohoardxvect=no
 23    4:08   248    20.66       0.898     new v8    ws6.0guide3.8hoardxvect=no
 23    4:04   244    20.81 +2.88 0.905     new v8    ws6.0guide3.8nohoardxvect=no
 13    6:57   417   *12.29  1.87 0.945     new v8    ws6.0guide3.7hoard
 13    7:28   448   *11.44   --  0.879     new v8    ws6.0guide3.7nohoard
+ ==> 704/x

  (static and dynamic loop scheduling give same results for COMPAQ)
  (OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000))
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave
  # OPT wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of lvl  mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1  O5  42:33    2,553     1.00      1.000     171.000     0.300    21:16:30
  4  O5  13:25      805     3.17      0.793      42.750     0.950     6:42:30
  4  O4  12:17      737     3.17      0.793      42.750     1.038     6:42:30
same as above, just different headings:
  # OPT wall clock time    speed up  efficiency   mmout    csh    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval   -f    Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  1  O5  42:33    2,553     1.00      1.000     171.000     no
  4  O5  13:25      805     3.17      0.793      42.750     no
  4  O4  12:17      737     3.17      0.793      42.750    yes    1.038     4
same csh -f turned on for these, comparing other opts
  # buff wall clock time    speed up  efficiency   mmout   cxml    Tahoma  # of
 of  io  mm:ss    seconds    over    speed up/   interval  math   Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  4  no  12:17      737     3.17      0.793      42.750     no    1.038     8
  4 yes  12:57      737     3.17      0.793      42.750     no    1.038     8
  4  no  12:25                                             yes              3
  4 yes  12:28      737     3.17      0.793      42.750    yes    1.038     8

COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
  # OPT wall clock time    speed up  little       mmout    Tahoma   # of
 of lvl  mm:ss    seconds    over    endian      interval  Factor   runs
cpus                        1 cpu                          Tt / Tr       
  4  O4  12:17      737     3.17       no         15       1.038     8
  1  O4  38:26    2,306      --       yes         15       ?.???     4
  4  O4  12:08      728     3.17      yes         15       ?.???     4
  1  O4  38:27    2,307      --       yes         60       1.038     4
  4  O4  12:16      736     3.13      yes         60       ?.???     4


COMPAQ ES40 500MHz/4MB$ (rainier) full memory
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1  O5  42:33*   2,553*    1.00      1.000     171.000     xx       21:16:30
  4  O5  13:05      785     3.29      0.821      42.750     0.975     6:32:30
 (* est. since only 3 runs performed with fastest at 43:00)        
                      
** ESTIMATES **              
COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory        
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1      33:30    2,010     1.00      1.000     171.000     xx       16:45:00
  4      10:10      611     3.29      0.821      42.750     1.252     5:05:00
## end of 1-hour d3 tests section
########################################################################
                      
########################################################################
## 3-hour d1d2 simulations                
                          
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  #     wall clock time    speed up  efficiency   mmout    compiler     # of
 of    h:mm:ss    seconds    over    speed up/   interval  flags        runs
cpus                        1 cpu      #cpus                      
  1    2:00:29    7,229     1.00      1.000             guide3.7       4 
  1    2:04:15    7,455                           15       "f77 -fast"    3
 13      10:44      644     11.22      0.863   15    guide3.7        
 13      10:34      634     11.40      0.877      60    guide3.7    
 13      10:30      630     11.47      0.883  180    guide3.7    
 13  20:08    1208     xx.xx      x.xxx   15    ws6g3.8-O4     1
Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling
  #     wall clock time speed-up old  efficiency  balance   
 of    h:mm:ss  seconds  over    tah speed up/  new/old
cpus                     1 cpu  omas #cpus    exectuable   other options
 13   6:40   400    xx.xx       x.xxx     new v8 15 ws5.0guide3.8nohoard
 13  16:49  1009    xx.xx       x.xxx     new v8 15 ws6.0guide3.8nohoard
 13   6:05   365    xx.xx       x.xxx     new v8 15 ws6.0guide3.8nohoardxvect=no
 13   6:33   393    xx.xx       x.xxx     new v8 15 ws6.0guide3.7nohoardxvect=no
 23   4:23   263    xx.xx       x.xxx     new v8 15 ws5.0guide3.8nohoard
 23  23:44  1424    xx.xx       x.xxx     new v8 15 ws6.0guide3.8nohoard
 23   3:59   239    xx.xx       x.xxx     new v8 15 ws6.0guide3.8hoard
 23   4:03   243    xx.xx       x.xxx     new v8 15 ws6.0guide3.8nohoardxvect=no
 23   4:23   263    xx.xx       x.xxx     new v8 15 ws6.0guide3.7nohoardxvect=no
                                             
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
 csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line
 (note: a version of the v2.12 code that did not have our FILECOMMAND  
        mods ran slightly slower that our version and both had "csh -f")
  # OPT wall clock time    speed up  efficiency   mmout    csh    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval   -f    Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  1      35:41    2,141     1.00      1.000       15       no             1  
  1      35:30    2,130     1.00      1.000       15      yes             4  
  1      35:13    2,113     1.00      1.000       60      yes               4  
  1      35:13    2,113     1.00      1.000      180      yes             1  
  4  O5  12:03      723     2.95      0.738       15       no     0.891     4
  4  O5  11:05      665     3.21      0.803       15      yes     0.968     4
                                   le speed up
  1  O4  33:02    1,982     1.00      1.075       15  little_end            4
  4  O4  10:20      620     3.20      1.039       15  little_end            4
  1  O4  32:55    1,975     1.00      1.070       60  little_end            4
  4  O4  10:16      616     3.21      1.011       60  little_end            4

  4  O4  10:44      644     3.307     0.827       15      yes     1.000     4
  4  O4  12:44      644**no speculate or pipelne**15      yes     1.000     4
  4  O4  11:07      667     NCAR's flags          15      yes     0.966     4
  4  O5  10:51      651     3.25      0.811       60       no     0.974     4
  4  O5  10:51      651     3.25      0.811       60      yes     0.974     4
  4  O4  10:23      623     3.39?     0.xxx       60      yes     1.018     4
  4  O4  10:52      652     NCAR's flags          60      yes     0.972     1
  4  O5  10:44      644     3.28      0.820      180    no     0.978     4
  4  O5  10:42      642     3.29      0.823      180      yes     0.981     3
  4  O4  10:14      614     3.44?     0.xxx      180      yes     1.026     4
## end of 3-hour d1d2 simulations section
########################################################################

########################################################################
## 3-hour d3 simulations:
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:32:16    16,336    1.00      1.000     171.000
 13      27:30     1,650    9.90      0.762      13.154

1650/1338 = 23%

Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA+OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*2
cpus                        1 cpu      #cpus      #cpus              hours
  1    4:32:16    16,336     1.00      1.000     171.000   0.142     90.75
 13      22:18     1,338    12.21      0.939      13.154   1.731      7.43

Sun E6500 400MHz/8MB$/80MHz (buddy) 
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA+OR
 of    h:mm:ss    seconds    over    speed up/    171              time*10*2
cpus                        1 cpu      #cpus      #cpus              hours
  1    4:32:16    16,336     1.00      1.000     171.000   0.142     90.75
 (assumed this time for 1 CPU)
 13      23:18     1,398    11.69      0.899      13.154   1.657      7.77
 19      16:29       989    15.52      0.869       9.000   2.342      5.49
 25      12:30       750    21.78      0.871       6.840   3.088      4.17
 29      10:56       656    24.90      0.859       5.897   3.530      3.64


original executable compiled on tahoma for 4MB cache for everything below
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:34:02    16,442    1.00      1.000     171.000
 13      23:02     1,382   11.90      0.915      13.154

Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:34:02    16,442    1.00      1.000     171.000
 13      27:57     1,677    9.80      0.754      13.154
   
1677/1382 = 21%
   
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
 13      31:33     1,893     -          -        13.154
   
Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
 13      35:05     2,105     -          -        13.154
           
2105/1893 = 11%        
           
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    8:01:48   28,908     1.00      1.000     171.000
  4    x:xx:xx    xx,xxx    x.xx      0.xxx      42.750
  8    x:xx:xx    xx,xxx    x.xx      0.xxx      21.375
 13      38:36     2,316   12.48      0.xxx      13.154
           
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    7:29:27    26,967    1.00      1.000     171.000
  4    2:01:48     7,308    3.69      0.923      42.750
  8    1:03:39     3,819    7.06      0.883      21.375
 13      42:01     2,521   10.67      0.823      13.154
## end of 3-hour d3 simulations section
########################################################################
           
## end of 3-hour d1d2 simulations section
########################################################################

########################################################################
## 2-hour d3 simulations                
                          
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  #     wall clock time    speed up  efficiency   mmout    compiler     # of
 of    h:mm:ss    seconds    over    speed up/   interval  flags        runs
cpus                        1 cpu      #cpus                      
 13      26:14     1574     ?.???     0.xxx       15       ?.???         3?
 13      26:07     1567     ?.???     0.xxx       60       ?.???         2?

COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
  # OPT wall clock time    speed up  efficiency   mmout    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval  Factor   runs
cpus                        1 cpu      #cpus               Tt / Tr       
  4  O4  25:41     1540     ?.???     0.xxx       15       ?.???     4
  4  O4  24:44     1484     ?.???     0.xxx       60       ?.???     3?
  4  O4  26:41     1601     ?.???     0.xxx      180       ?.???     2?

########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) from:
## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz

Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
  #     wall clock time    speed up  efficiency    # of   comments
 of    h:mm:ss    seconds    over    speed up/       runs
cpus                        1 cpu      #cpus                      
  1      19:50     1190      --       1.000          4
  4       5:31      331     3.410     0.852      4
  8   3:11     191     6.23**    0.779**      4     **see note below
 13       2:05      125     9.520**   0.732**        4     **I believe the run is too
            short to see our true speed
            up which is more like 0.87
            efficiency.**

Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
  #     wall clock time    speed up  efficiency    # of   comments
 of    h:mm:ss    seconds    over    speed up/       runs
cpus                        1 cpu      #cpus                      
  1      14:39      879     1.000     1.000          4
  4       4:05      245     3.588     0.897          4     
  8       2:22      142     6.190**   0.774**        4     **see note above
 13       1:35       95     9.253**   0.711**        4     **see note above

COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp     
  #   OPT  wall clock time   speed up  efficiency   # of   comments
 of   lvl   mm:ss    seconds  over    speed up/       runs
cpus                         1 cpu      #cpus      
  1   NCAR   4:47     287    1.000     1.000           4   
  4   NCAR   1:28      88    3.261     0.815           4   

COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp     
  #   OPT  wall clock time   speed up  efficiency   # of   comments
 of   lvl   mm:ss    seconds  over    speed up/       runs
cpus                         1 cpu      #cpus      
  1   NCAR   5:19     319    1.000     1.000           5      

########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) 
## run for twice as long
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  1      40:07     2407     x.xxx     0.xxx          1
  4      11:13      673     3.577     0.894          2
  8       6:26      386     6.236     0.779          2
 13       4:14      234    10.286     0.791          2
COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp     
  1       9:42      582     x.xxx     0.xxx          4
  2       5:15      315     1.848     0.924          4
  4       2:58      178     3.270     0.817          3