Top of Page
COMPAQ Memory Interleave Comparisons
Utah and UW Ensembles
UW Ensemble Summary Table
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs
(see also old bencmarks
(see also CPU2000
benchmarks at Spec.Org
July 2014 Comparisons
1 of same
Chips Dom Hr ver best =sec worst CPU factor
E5-2650v2 @ 2.6 GHz specfp@2006 wrf=78.4,481.wrf specspeed=89-104
E5-2637v2 @ 3.5 GHz specfp@2006 wrf=89.6 (vs 2620v2 58.7),481.wrf specspeed=
E5-2620v1 @ 2,0GHz b1,b2,b3,b4 on bob cluster, 481.wrf specspeed=57.2
E5-2620 = E5-2620 v2 @ 2.1GHz new a113-a116 ,$406,481spec=471,481.wrf specspeed=65.5
E5645 = E5645 @ 2.4 GHz a109-a112, n18,n19,n20,n21(x12), $551,481.wrf specrate=210,specspeed=42.3
E5620 = E5620 @ 2.4 GHz n1-n8,n15,n16,n17(x8), $387 481.wrf specrate=169,specspeed=42.0
E5-2637 $996, 481specrate=504, 481.wrf specspeed=89.6
E5-2650v3 481specrate=684, 481.wrf specspeed=94.8baseline,98peak(Dell)
a113 and a114 are E5-2620 v2 2.1GHz
b3 (on bob) is E5-2620 v1 2.0GHz
a113-RAID 55:47 (2 restart times, 7 wrfouts per domain)
restarts 235.57, wrfouts 115.22, overall writes 350.79
a114-disk 54:36 (2 restart times, 7 wrfouts per domain)
restarts 236.34, wrfouts 151.86, overall writes 388.19
a113Rsplit 51:59 (2 restart times, 7 wrfouts per domain)
restarts 51.29, wrfouts 20.48, overall writes 71.78
a114dsplit 52:02 (2 restart times, 7 wrfouts per domain)
restarts 79.39, wrfouts 57.10, overall writes 136.49
b3 to SSD 55:37 (2 restart times, 7 wrfouts per domain)
restarts 209.21, wrfouts 124.93, overall writes 334.14
b3 to RAID 56:31 (2 restart times, 7 wrfouts per domain)
restarts 241.82, wrfouts 151.17, overall writes 392.99
projected savings with SSD:
24 x
thruput 1 of same E5645
Chips Dom Hr ver ncpu time ftime spec.org factor factor
/home/disk/sage2/mm5rt/nobackup/runtest/2014071500/d4rerundir
n18 E5645 @ 2.4 GHz d4 1 361 48 20:27 1147 3.82
n18 E5645 @ 2.4 GHz d4 1 361 24 35:54 2060 1.80
n18 E5645 @ 2.4 GHz d4 1 361 12 78:08 4378
a109 E5645 @ 2.4 GHz d4 1 361 48 21:36 1116
a109 E5645 @ 2.4 GHz d4 1 361 24 39:12 2142
a109 E5645 @ 2.4 GHz d4 1 361 12 80:41 4343
a113 e5-2620v2 1 361 48 14:42 741 3.69 5.86
a113 e5-2620v2 d4 1 361 24 24:42 1330 2.06 3.27
a113 e5-2620v2 d4 1 361 12 48:43 2736 1.0 1.59
n51 E5-2650v3 @ 2.3 GHz 1 35 48 16:12 972 -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz 1 361 48 12:31 627 3.40
n51 E5-2650v3 @ 2.3 GHz 1 361 160 5:56 242 8.81
n51 E5-2650v3 @ 2.3 GHz 1 361 140 6:22 269 7.92
n51 E5-2650v3 @ 2.3 GHz 1 361 80 8:50 415 5.13
n51 E5-2650v3 @ 2.3 GHz 1 361 24 21:02 1133 1.88
n51 E5-2650v3 @ 2.3 GHz 1 361 12 38:06 2131 1.00
n51 E5-2650v3 @ 2.3 GHz 1 361 12.1x20.slot 38:06 2131 1.00
n51 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.slot 12:31 627 3.40
n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.socket 11:28 572 3.725
n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.slot 11:23 572 3.725
n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.socket 10:45 528
n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.node8 10:44 529 using-map-by ppr:8:socket
n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.slot 11:22 567
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x20.socket 19:07 1026
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x20.slot 19:12 1029
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x16.socket 17:34 940
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x16.slot 18:53 1011
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x12.socket 16:31 873
n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x12.slot 18:53 1011
n58 E5-2650v3 @ 2.3 GHz 1 361 24.3x10.socket 16:07 850
n58 E5-2650v3 @ 2.3 GHz 1 361 24.3x10.slot 19:13 1037
48-hr d4 runs for 2014111712 case current factor
n21-n1 asstd 48 35 132 7:01:48 25308 1.000
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 160 3:17:19 11839 2.138 1.596vs80cpu
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 160 3:11:58 11518 2.197 1.641vs80cpu[/work/restrts]
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 140 3:30:49 12649 2.001
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 120 3:55:51 14151 1.788
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 100 4:32:06 16326 1.550
n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 80 5:15:04 18904 1.339
notftime
n51 E5-2650v3 @ 2.3 GHz 1 361 48 12:31 751 3.04
n51 E5-2650v3 @ 2.3 GHz 1 361 160 5:56 356 6.42
n51 E5-2650v3 @ 2.3 GHz 1 361 80 8:50 530 4.31
n51 E5-2650v3 @ 2.3 GHz 1 361 24 21:02 1262 1.81
n51 E5-2650v3 @ 2.3 GHz 1 361 12 38:06 2286 1.00
n51 E5-2650v3 @ 2.3 GHz 1 35 40 24:01 1441 -- all on 1 machine
n51 E5-2650v3 @ 2.3 GHz 1 35 24 26:20 1580 -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz 1 35 12 33:36 2016 -- bad or w/o infiniband
n51 E5-2650v3 @ 2.3 GHz 1 35 140 7:32 452
n51 E5-2650v3 @ 2.3 GHz 1 35 160 6:36 396
n51 E5-2650v3 @ 2.3 GHz 1 35 160 7:03 423 using ensm-ssd ftdiff=369
n51 E5-2650v3 @ 2.3 GHz 1 35 80 12:39 759
spec.org 481spec sep-2014
e5-2620v3 481.wrf rate = 562
e5-2650v3 481.wrf rate = 702 = 1.25
e5-2620v2 481.wrf normal = 71.3
e5-2650v2 481.wrf normal = 90.9 = 1.27
e5645@2.4itsu normao= 114
24 x
thruput 1 of same E5645
Chips Dom Hr ver ncpu best =sec spec.org factor factor
E52620 d4t1 1 3.5 48 12:34 754 380 474 65 1.778 1.662 a113-a116
E52620 d4t1 1 3.5 24 22:30 1350 380 1.000 1.639 a113-a116
E52620 d4 1 3.5 48 12:28 748 380 474 65 1.778 1.662 a113-a116
E52620 d4 1 3.5 24 22:10 1330 380 1.000 1.639 a113-a116
E52620 d4 1 3.5 12 45:01 2701 380 1.000 1.639 a113-a116
E52620v1 1 361 12 63:10 @ 2,0GHz b1
E52620v1 1 361 12 54:20 @ 2,0GHz b3
E5645 d4 1 3.5 48 20:43 1243 208 1.750 1.000 a109-a112e
E5645 d4 1 3.5 24 36:20 2180 208 196 40 1.000 1.000 a109-a112e
E5645 d4 1 3.5 12 71:02 4262 208 196 40 1.000 1.000 a109-a112e
E5645 d4 1 3.5 48 24:39 1479 208 196 40 1.000 1.000 n18-n21,usingnode1rundir,4x12
E5620 d4 1 3.5 72 13:52 832 170 195 42 4.362 1.991
E5620 d4 1 3.5 60 14:58 898 170 195 42 4.041 1.991
E5620 d4 1 3.5 48 18:15 1095 170 195 42 3.314 1.991 n1-n6,6x8
E5620 d4 1 3.5 48 18:15 1095 170 195 42 3.314 1.991
E5620 d4 1 3.5 24 32:06 1926 170 1.884 1.132
E5620 d4 1 3.5 12 60:29 3629 170 1.000
COMPAQ Memory Interleave Comparisons
######################################## 6/16/2001
Running Ensemble domain benchmarks with different memory interleaving
C = Compaq (C500 = ES40, C667=DS20, C833=ES-40)
NOTE: executable was NOT recompiled between runs!
1 of same INTERLEAVE
Chips Dom Hr ver best =sec worst CPU factor factor # of runs
ES-40 1-way interleave (May tests on chocolat, .5G and 2G module)
C833x1x4 uw 24 3.3 199:17 11957 12386 0.785 1.000 2x4
C833x1 uw 24 3.3 156:21 9381 9431 1.000 1.000 2
C833x2 uw 24 3.3 90:31 5431 5456 1.727 1.000 2
C833x4 uw 24 3.3 52:52 3172 3284 2.957 1.000 2
ES-40 2-way interleave (June tests on chocolat, 2x2G + 1G module)
C833x1x4 uw 24 3.3 168:12 10092 10323 0.907 1.185 2x4
C833x1 uw 24 3.3 152:32 9152 9227 1.000 1.025 2
C833x2 uw 24 3.3 85:34 5134 5161 1.783 1.058 2
C833x4 uw 24 3.3 46:26 2786 2857 3.285 1.139 2
ES-40 4-way interleave (July tests on chocolat, 4x2G module)
C833x1x4 uw 24 3.3 158:48 9528 9552 0.940 1.255 2x4
C833x1 uw 24 3.3 149:12 8952 8961 1.000 1.048 2
C833x2 uw 24 3.3 82:04 4924 4932 1.818 1.103 2
C833x4 uw 24 3.3 44:12 2652 2676 3.376 1.185 2
Utah and UW Ensemble Benchmarks
######################################## 5/3/2001
Running Ensemble domain benchmarks on various platforms
S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000)
C = Compaq (C500 = ES40, C667=DS20)
A = AMD Athlon (TCP/IP protocol)
V = AMD Athlon with Via network card
I = Intel
A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!)
1 of same Compaq 667MHz
Chips Dom ver best =sec worst CPU factor factor # of runs
C667x1 utah 3.3 142:55 8575 8982 1.000 1.000 2
C667x2 utah 3.3 80:57 4857 4866 1.765 1.765 2
C500x1 utah 3.3 214:35 12875 ---- 1.000 0.666 1
C500x2 utah 3.3 109:53 6593 6615 1.953 1.301 2
C500x4 utah 3.3 68:16 4096 4334 3.143 2.094 2
A800x1 utah 3.3 318:05 19085 21631 1.000 0.449 2
I1400x1 utah 3.3 270:xx
A1333x1 utah 3.3 (176:51)10611 mpp/1.08 0.808 est.
C600x1 utah 3.3 174:02 10442 10502 1.000 0.821 2 DS-10
C833x1x4 utah 3.3 142:19 8539 8662 0.845 1.004 2x4
C833x1 utah 3.3 120:18 7218 7232 1.000 1.188 2
C833x2 utah 3.3 67:21 4041 4058 1.786 2.122 2
C833x4 utah 3.3 39:13 2353 2356 3.068 3.644 2
S400x12 utah 3.3 51:26 3086 ---- ----- 2.779 1
S400x16 utah 3.3 40:19 2419 ---- ----- 3.545 1
S400x23 utah 3.3 33:40 2020 ---- ----- 4.245 1
| Timings |Same CPU | C667x1 Factors|Number
Chips |Domain| Code | best =sec worst |Factor | mpp | non-mpp |of runs
C667x1 utah 3.3mpp 155:18 9318 9340 1.000 1.000 0.920 2 DS-20E
C667x2 utah 3.3mpp 90:57 5457 5524 1.708 1.708 1.571 3
C500x1 utah 3.3mpp 214:44 12884 ---- 1.000 0.723 0.666 1 ES-40
C500x2 utah 3.3mpp 121:20 7280 7290 1.770 1.280 1.178 2
C500x4 utah 3.3mpp 78:36 4716 ---- 2.732 1.976 1.818 1
A1333x1 utah 3.3mpp 191:21 11481 ---- 1.000 0.812 0.747 ? Beowulf
A1333x2 utah 3.3mpp 118:09 7089 ---- 1.620 1.314 1.210 ?
A1333x4 utah 3.3mpp 78:45 4725 ---- 2.430 1.972 1.815 ?
A1333x8 utah 3.3mpp 51:58 3118 ---- 3.682 2.988 2.750 ?
A1333x12 utah 3.3mpp 46:18 2778 ---- 4.133 3.354 3.087 ?
A1333x16 utah 3.3mpp 42:11 2531 ---- 4.536 3.682 3.388 ?
V950x1 utah 3.3mpp 282:28 16948 ---- 1.000 0.550 0.506 ?
V950x2 utah 3.3mpp 155:03 9303 ---- 1.822 1.002 0.922 ?
V950x4 utah 3.3mpp 88:30 5310 ---- 3.192 1.755 1.615 ?
V950x8 utah 3.3mpp 54:10 3250 ---- 5.215 2.867 2.638 ?
V950x12 utah 3.3mpp 46:28 2788 ---- 6.079 3.342 3.076 ?
V950x16 utah 3.3mpp 40:46 2446 ---- 6.929 3.809 3.506 ?
24-hour UW Ensemble Runs
1 of same Compaq 667MHz
Chips Dom Hr ver best =sec worst CPU factor factor # of runs
DS-20E
C667x1 uw 24 3.3 188:12 11292 ---- 1.000 1.000 1 (97%)
C667x1 singleproc 189:37 11377 ---- x.xxx x.xxx 1 (97%)
C667x2 uw 24 3.3 110:34 6634 7017 1.702 1.702 1 (191%)
DS-10
C600x1 uw 24 3.3 228:30 13710 14058 1.000 0.824 2 DS-10
ES-40
C500x1 uw 24 3.3 251:25 15085 ---- 1.000 0.749 1
C500x2 uw 24 3.3 143:12 8592 8647 1.756 1.314 2
C500x4 uw 24 3.3 88:02 5282 5442 2.856 2.138 2 (389%)
C500x4 uw 48 3.3 174:41 10481 2 (383%)
ES-40
C833x1x4 uw 24 3.3 158:48 9528 9552 0.940 1.185 2x4
C833x1 uw 24 3.3 149:12 8952 8961 1.000 1.261 2
C833x2 uw 24 3.3 82:04 4924 4932 1.818 2.293 2
C833x4 uw 24 3.3 44:12 2652 2676 3.376 4.258 2
AMD Dual-Processor 1.2 GHz Tyan Motherboard, PGF77 compiler
A1200x2 uw 48 3.3 243:59 14639 1.543
FCFLAGS = -I$(LIBINCLUDE) -fast -Mcray=pointer -tp p6 -pc 32 -byteswapio -Mvect=prefetch,cachesize:393216 -mp -Mnosgimp
LDOPTIONS = $(FCFLAGS)
LOCAL_LIBRARIES = -lnsl -lm
IBM PowerPC 375 MHz (-O2 is faster than new compiler with -O3)
P375x1 uw 48 3.3 662:48 39768 39791 1.000 x.xxx 2
P375x2 uw 48 3.3 346:10 20770 20784 1.915 2
P375x4 uw 48 3.3 207:09 12429 12515 3.200 2
S400x27 uw 48 3.3 67:22 4042
48-hour Jun01 real-time 36/12 km with 37 levels 10/1/2001 and 3/21/2002:
rainier-factor
A1200x1 uwrt37 48 3.4 825:52 49552
A1200x2 uwrt37 48 3.4 471:01 28261 0.74
C500x4 uwrt37 48 3.4 346:22 20782 1.00
S400x27 uwrt37 48 3.4 108:36 6516 3.19 3/21/2002 ws7guide40
S400x27 uwrt36 48 3.4 112:00 6720 workshop7,guide3.8 (11% wow!)
S400x27 uwrt37 48 3.4 125:09 7509 2.77 workshop6,Guide3.8
S400x27 uwrt36 48 3.4 142:57 8577 workshop7,noguide (14% slower)
UW Ensemble Domain Summary Table
S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000)
C = Compaq (C500 = ES40, C667=DS20)
A = AMD Athlon (TCP/IP protocol)
A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!)
V = AMD Athlon with Via network card
I = Intel
P = IBM PowerPC
implied comparison for MM5 between AMD and Intel and Sun is 177.mesa in
www.spec.org cfp2000, this doesn't work too well for AMD vs IBM P375,
however, and it underestimates Compaqs. None of them match too well.
1 of same Compaq 667MHz
Chips Dom Hr ver best =sec worst CPU factor factor # of runs
A1200x1 uw 3 3.3 25:51 1551 ---- 1.000 0.901 1
A1200x2 uw (3) 3.3 15:14 914 ---- ~1.7 1.528 1 1/16 of
48-hr run
I1400x1 uw 3 3.3 36:45 2205 ---- 1.000 0.634 1
C833x1 uw (3) 3.3 18:39 (1119) ** 1/8 of 24-hr ** 1.248 (2)
C833x2 uw (3) 3.3 10:15 (615) 1.818 2.272 (2)
C833x4 uw (3) 3.3 5:31 (331) 3.376 4.221 (2)
C667x1 uw 3 3.3 23:17 1397 1422 1.000 1.000 2
C667x2 uw 3 3.3 13:44 824 827 1.695 1.695 2
C600x1 uw 3 3.3 28:49 1729 1752 1.000 0.808 2
C500x1 uw 3 3.3 31:14 1874 1889 1.000 0.745 2
C500x2 uw 3 3.3 17:46 1066 1077 1.758 1.311 4
C500x4 uw 3 3.3 10:36 636 661 2.947 2.197 2
S400x1 uw 3 3.3 70:52 4252 4260 1.000 0.329 2
S400x2 uw 3 3.3 35:31 2131 2140 1.995 0.656 2
S400x4 uw 3 3.3 18:12 1092 ---- 3.894 1.280 1
S400x8 uw 3 3.3 9:35 575 578 7.395 2.430 2
S400x12 uw 3 3.3 6:46 406 407 10.473 3.441 2
S400x23 uw 3 3.3 4:26 266 288 15.985* 5.252* 2
S750x1 uw 3 3.3 43:32 2612 ---- 1.000 0.535 1
* --> run is too short to get scaling factors correct, we generally
see 0.8 scaling factor, so it should be about 18.4 times as fast
as a single processor run
Conclusions for Utah and UW Ensemble Benchmarks
1) MPP is about 8% slower than regular code.
2) 1 Atlon 1.3GHz chip for $1000 (+ pittance for PG compiler) is
nearly identical to DS-10 600MHz!!!
3) 4 Athlon 1.3GHz chips run MPP code as fast as ES-40
with 4x500MHz chips ($4K vs $40K?).
4) For ensembles, it makes more sense to run multiple MM5s on single
processors than to run back to back on all processors.
5) 16 Athlon 950 MHz chips coupled with Via networking
can run code as fast as 16 Sun E6500 400 MHz chips. For
overall speed in high resolution runs, you need to
make a large cluster to equal the power of a Sun E6500
because scaling is so poor in clusters.
6) The faster the chip, the worse the scaling.
7) To run our current ensembles (36/12 max dimension 101x137) out to
48 hours, we should expect these results for MM5v3.x standard
physics:
Athlon 1333 MHz (7:44) (81% speed of 667 running same code)
DS-10 600 MHz 7:37
?2001 DS-10 667 MHz 6:16 (Do they make this? or just 600 MHz?)
?2002 DS-10 883 MHz 5:13 (assuming ES40 performance of chip)
DS-20E 1x667 MHz 6:16 (run 2 simultaneously in less than
2x3:41 = 7:22)
DS-20E 2x667 MHz 3:41
Jun01 DS-20E 2x883 MHz 2:57 (Due out ...)
ES-40 4x500 MHz 3:01
ES-40 4x667 MHz 2:11 (too bad we never got that promised
free chip upgrade!)
ES-40 4x883 MHz 1:43 (but you shouldn't run it this way, instead
run 4 simultaneously in 6:38 vs this
6:52)
23 of tahoma's CPUs 1:00
Reisner2 and Vertical Levels
Top of Page
Utah and UW Ensembles
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs
######################################## 4/7/2001 - present
Comparing v3.4 runtimes of current and new domains on Sun E6500
23x400 MHz
[c,m] = [current,montana] MB
Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs
cd1d2 6 3.3 11:48 708 793 6.0 3.8 dF 384 15,15,4,1 4
cd1d2 6 3.3 11:41 701 785 6.0 3.8 dT 384 15,15,4,1 5
cd1d2 6 3.3 10:18 618 678 6.0 3.8 dF 384 365,1 4
cd1d2 6 3.3 10:19 619 691 6.0 3.8 dT 384 365,1 3
cd1d2 6 3.3Re 18:57 1137 1191 6.0 3.8 dF 384 365,1 3
==> Reisner2 cost is 1.84 for version 3.3!
cd1d2 6 3.4Re 19:08 1148 1198 6.0 3.8 dT 384 15,15,4,1 5
cd1d2 24 3.4 48:47 2927 ---- 6.0 3.8 dT 384 15,15,4,1 1
cd1d2 24 3.4Re 88:33 5313 ---- 6.0 3.8 dT 384 15,15,4,1 1
md1d2 24 3.4Re 113:01 6781 ---- 6.0 3.8 dT 384 15,15,4,1 1
Top of section
[c,m] = [current,montana] MB
Dom Hr lvls best =sec worst f77 Guide stack TAP,BUF,INC # of runs
cd1d2 24 32 48:47 2927 ---- 6.0 3.8 dT 384 15,15,4,1 1
md1d2 24 32 59:46 3586 ---- 6.0 3.8 dT 384 15,15,4,1 1
cd1d2 24 37 59:57 3597 ---- 6.0 3.8 dT 384 15,15,4,1 1
md1d2 24 37 71:57 4317 ---- 6.0 3.8 dT 384 15,15,4,1 1
md3 8 32 81:51 4911 ---- 6.0 3.8 dT 384 60,60,1,1 1
md3 8 37 94:06 5646 ---- 6.0 3.8 dT 384 60,60,1,1 1
cd1d2 60 32 2:02:xx ---- ---- 6.0 3.8 dT 384 15,15,4,1 est.
cd1d2 60 37 2:30:xx ---- ---- 6.0 3.8 dT 384 15,15,4,1 est.
md1d2 60 32 2:29:xx est.
md1d2 60 37 3:00:xx est.
md3 24 32 4:06:xx est.
md3 24 37 4:50:xx est.
--------------------------------
60-hour forecast on tahoma (23 processors):
Times (hh:mm)
Domain Code Lvl Phys Clock Start End (am/pm)
------- ---- -- --- ------ ----- -----------
Cur d1d2 2.12 32 simp 1:50 7:16 9:06
Cur d1d2 3.4 32 simp 2:02 " 9:18
Cur d1d2 3.4 37 simp 2:30 " 9:46 (+ 22.9%)
Mon d1d2 3.4 32 simp 2:29 " 9:45 (new domain + 22.5%)
Mon d1d2 3.4 37 simp 3:00 " 10:16 (+ 20% for levels)
60-hour forecast on tahoma (29 processor estimates, 10.2% faster than 23):
Times (hh:mm)
Domain Code Lvl Phys Clock Start End (am/pm)
------- ---- -- --- ------ ----- -----------
Cur d1d2 2.12 32 simp (1:40) 7:16 8:56
Cur d1d2 3.4 32 simp (1:50) " 9:06
Cur d1d2 3.4 37 simp (2:16) " 9:32
Mon d1d2 3.4 32 simp (2:15) " 9:31
Mon d1d2 3.4 37 simp (2:43) " 9:59
Reisner2
--------------------------------
60-hour forecast on tahoma (23 processors):
Times (hh:mm)
Domain Code Lvl Phys Clock Start End (am/pm)
------- ---- -- --- ------ ----- -----------
Cur d1d2 3.4 32 simp 2:02 7:16 9:18
Cur d1d2 3.4 32 rei2 3:41 " 10:59
Mon d1d2 3.4 32 rei2 4:42 " 11:58
Mon d1d2 3.4 37 rei2 (5:23) " 12:39
36-hour 4km domain
------------------
Times (hh:mm)
Domain Code Lvl Phys Clock Start End (am/pm)
------- ---- -- --- ------ ----- -----------
Cur d3 2.12 32 simp 3:33 9:18 12:51
Cur d3 3.4 32 simp (3:56) " 1:14
Mon d3 3.4 32 simp 4:06 9:45 1:51
Mon d3 3.4 37 simp 4:42 10:16 2:58 ( + 35.4%)
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Top of Page
Utah and UW Ensembles
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs
######################################## 1/31/2001 - 2/12/2001
Comparing Guide 3.8 and 3.9 and Workshop 6.0 and 6.0u1
tahoma, Sun E6500, jul00 d1d2 runs in /tmp:
sched=
Dom Hr ver mm:ss = seconds f77 Guide stack TAP,BUF,INC # of runs
d1d2 3 2.12 4:10 250 6.0 3.8 d 200 15,15,4,1 5
d1d2 3 2.12 4:53 293 6.0u1 3.8 d 200 15,15,4,1 5
MB
Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs
d1d2 6 2.12 10:43 643 676 6.0 3.8 dT 200 15,15,4,1 4
d1d2 6 2.12 10:38 638 662 6.0 3.8 dF 200 15,15,4,1 4
d1d2 6 2.12 10:59 659 694 6.0 3.9 rT 200 15,15,4,1 4
d1d2 6 2.12 11:03 663 681 6.0 3.9 rF 200 15,15,4,1 4
d1d2 6 2.12 14:06 846 864 6.0 3.9 dU 200 15,15,4,1 5
d1d2 6 2.12 11:44 704 739 6.0 3.9 rU 200 15,15,4,1 5
d1d2 6 2.12 11:48 708 718 6.0u1 3.8 dU 200 15,15,4,1 2
d1d2 6 2.12 11:25 685 691 6.0u1np 3.8 dU 200 15,15,4,1 2
d1d2 6 2.12 11:38 698 727 6.0u1 3.9 rU 200 15,15,4,1 2
d1d2 6 2.12 15:19 919 927 6.0u1 3.9 dU 200 15,15,4,1 4
d1d2 6 2.12 10:26 626 697 6.0u1np 3.9 rT 200 15,15,4,1 4
d1d2 6 2.12 10:14 614 ! 834 6.0u1np 3.9 rF 200 15,15,4,1 2
np = xprefetch=no^ ^ -WG,scheduling= {r | d}
^ OMP_DYNAMIC = {True | Unset |
False }
tahoma, Sun E6500, jul00 d3 runs in /tmp:
MB
Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs
d3 2 2.12 17:22 1042 1167 6.0 3.8 rT 200 60/15 2
d3 2 2.12 17:41 1061 1086 6.0 3.8 rF 200 60/15 2
d3 2 2.12 17:18 1038 1071 6.0 3.9 rT 200 60/15 3
d3 2 2.12 17:21 1041 1074 6.0 3.9 rF 200 60/15 4
d3 2 2.12 17:55 1075 1088 6.0u1 3.9 rT 200 60/15 2
d3 2 2.12 17:55 1075 1086 6.0u1 3.9 rF 200 60/15 2
d3 2 2.12 18:24 1104 1140 6.0u1np 3.9 rT 200 60/15 2
d3 2 2.12 18:18 1098 1098 6.0u1np 3.9 rF 200 60/15 2
d3 2 2.12 19:34 1174 1185 6.0u2 none F 3000 60/15 2
d3 2 2.12 19:30 1170 1204 6.0u2 none T 3000 60/15 3
d3 2 2.12 19:13 1153 1167 6.0u2np none F 3000 60/15 2
d3 2 2.12 19:13 1153 1180 6.0u2np none T 3000 60/15 2
d3 2 2.12 16:49 1009 1032 6.0 3.8 T 3000 360 2
d3 2 2.12 16:52 1012 x 6.0hyd 3.8 F 3000 360 1
d3 2 2.12 16:57 1017 1179 6.0hyd 3.8 T 3000 360 2
d3 2 2.12 17:45 1065 x 6.0u1 3.9 T 3000 360 1
np = xprefetch=no^ ^ -WG,scheduling= {r | d}
^ OMP_DYNAMIC = {True | Unset |
False }
######################################## 1/31/2001
Older benchmarks
Top of Page
Utah and UW Ensembles
Conclusions for Utah and UW Ensembles
Reisner2
Vertical Levels
Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2
Older Runs
######################################## 11/7/2000
Ensemble 36km domain only, 48 hour runs:
version mm:ss = seconds BDYFRQ # of runs Machine (CPUs)
2.12 51:57 3117 180 2 rainier, COMPAQ ES40 (4)
2.12 69:23 4163 180 2 glacier, COMPAQ DS20 (2)
2.12 20:22 1222 180 2 tahoma, Sun E6500 (23)
########################################################################
######################################## 7/19/2000
tahoma, Sun E6500, jul00 runs in /tmp:
Dom Hr ver mm:ss = seconds f77 Guide stack TAP,BUF,INC # of runs
d1d2 6 2.12 12:23 743 6.0 3.8 200 180 2
d1d2 6 2.12 12:23 743 6.0 3.8 200 180 2
d1d2 6 3.3 12:16 736 6.0 3.8 200 180 2
d1d2 6 2.12 12:53 773 6.0 3.8 200 15,15,4,1 2
d1d2 6 3.3 13:47 827 6.0 3.8 200 15,15,4,1 2 *
d1d2 6 3.3 13:56 836 6.0 3.8 200 15,0,4,1 2 *
d1d2 6 3.3 14:28 868 6.0 3.8 200 15,0,4,1 2 * origouttap
d1d2 6 2.12 13:17 797 6.0 3.8 200 15,15,1,1 2 * I/O = 3%
d1d2 6 3.3 15:10 910 6.0 3.8 200 15,0,1,1 2
*these indicate that my mods to outtap.F are not slowing down the model
in any significant way
d1d2 60 2.12 127:27 7647 6.0 3.8 200 15,15,4,1 2000100600
d3 24 2.12 217:58 13,078 6.0 3.8 200 60,60,1,1 2000100600
d1d2 48 2.12 108:33 6513 6.0 3.8 200 15,15,4,1 1
d1d2 48 2.12 104:47 6287 6.0 3.8 200 180 1
d1d2 48 3.3 108:55 6535 6.0 3.8 200 15,15,4,1 1
d1d2 48 3.3 102:35 6155 6.0 3.8 200 15,15,4,1 1
d3 24 2.12 220:45 13,245 6.0 3.8 200 60,60,1,1 1
d3 24 3.3 248:12 14,892 6.0 3.8 200 60,60,1,1 1 12.4%
########################################
########################################
tahoma, Sun E6500, 3-hour jul00 4km domain runs:
version mm:ss = seconds f77 Guide stacksize BDYFRQ # of runs
2.12 28:35 1715 6.0 3.8 200 60 1
3.3 30:37 1837 6.0 3.8 200 60 1 ( 7.1% slower)
3.3 30:30 1830 6.0 3.8 400 60 1 ( 7.1% slower)
2.12 30:38 1838 5.0 3.7 200 60 3
3.3 34:31 2071 5.0 3.7 200 60 3 (12.6% slower)
########################################
########################################
tahoma, Sun E6500, 2-hour jul00 4km runs in /tmp:
version mm:ss = seconds f77 Guide stacksize BDYFRQ # of runs
2.12 16:37 997 6.0 3.8 200 60 2
3.3 19:32 1172 6.0 3.8 200 60 1 (17.5% slower)
2.12 18:32 1112 5.0 3.7 200 60 2
3.3 20:43 1243 5.0 3.7 200 60 2 (11.8% slower)
########################################
########################################
rainier, COMPAQ ES40, 1-hour jul00 4km runs:
where:
Add'l Options = (-tune host -inline speed -pipeline -speculate by_routine)
version mm:ss = seconds BDYFRQ # of runs Add'l Options
2.12 23:14 1394 60 1 yes
3.3 24:24 1464 60 2 yes (5% slower)
2.12 24:13 1453 60 1 no
########################################################################
dec99 (December 1999) real-time MM5 and ensemble typical runtimes:
(I/O gtar means Ernie was running tape backups
that resulted in heavy I/O slowdowns for
/home/mm5rt rundirs):
----------------------------------------------
s => split CPUs between d1d2/d3 runs
o => old tahoma CPUs (248 MHz)
MM5RUNDIR I/O static
gtar sched CPUs run tahoma rainier
--------- ---- --- --- ---- --------------- ----------------------
/tmp no yes s23 d1d2 1:39:54 (wet day 2000061200) 8:57:35 pm
/tmp no yes s23 d1d2 1:45:18 (wet day 2000060600)
/tmp no yes s23 d3 2:40:18 (wet day 2000060600)
/tmp no yes s23 d3 2:40:18 (wet day 2000060600)
/tmp no yes 23 ENS 0:57:46 (cmcgem 2000060500)
/tmp no yes *24 ENS 0:52:58 (nogaps 2000060200)
mm5rt no no o13 1,2 3:20:23 (best)
mm5rt yes no o13 1,2 3:32:25 (worst)
mm5rt no yes o13 1,2 3:02:23 (best)
mm5rt yes yes o13 1,2 3:29:28 (worst)
/tmp no yes o13 1,2 2:56:22 (fast, dry day 2000012900)
/tmp no yes o13 1,2 3:02:23 (fast, wet day 2000020100)
/tmp yes yes o13 1,2 3:06:28 (one run)
/tmp no yes o13 1,2 3:05:25 (one run)
ensemble runs:
mm5rt no no yes 3:12:24
/tmp no yes yes 2:46:25 (best, dry day 2000020400 NGM)
mm5rt no yes yes 2:48:21 (best)
mm5rt no yes yes 2:53:00 (avg)
mm5rt yes yes yes 3:05:32 (worst)
/tmp no yes 2:45:59 (dry,ngm 2000041100)
/tmp no yes 2:50:09 (cmc 2000032900)
rmm5rt yes yes 3:13:06 (best)
yes 3:18:36 (worst)
yes 3:24:05 (worst, Sunday 2000020700)
d3 simulations:
/tmp (-O4) no 6:11:34 (some pcpn, 032100)
/tmp (-O4) no 6:18:52 (wet day, 031900)
/tmp (-O4) yes no 6:40:33 (convective, 00050200)
/tmp (full memory) 6:37:04 (23,824 sec 4.6%)
rmm5rt no no 6:47:23 (best, dry day 020400)
rmm5rt no no 6:55:30 (avg)
rmm5rt no no 7:02:00 (wet day, 020100)
rmm5rt no no 7:06:28 (worst)
/tmp yes no 7:02:27 (dry day, first backup
using /tmp)
rmm5rt yes no 7:19:32 (worst)
formulas for calculating times of different domains (same physics
packages) on tahoma:
y-grid pts * x-grid pts * levels * 36-km time step factor
current 36 45
101x137x32x1 81x110x22x(36/45)
(67 minutes) (24 minutes)
+
current 12 15
88x88x32x3 70x70x22x3x(36/45)
(113 minutes) (39 minutes)
=
180 minutes 63 minutes
########################################################################
## 1-hour d3 tests (3/28/2000):
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 2:27:29 8,849 1.00 1.000 171.000
13 14:13 853 10.37 0.798
853/765 = 11%
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 2:27:26 8,846 1.00 1.000 171.000
13 12:45 765 11.56 0.889 ws5.0guide3.7
13 12:30 750 11.79 0.907 ws5.0guide3.7nohoard
13 11:44 704 ws6.0guide3.7hoardxvect=no
13 13:03 783 ws6.0guide3.7hoard
13 17:57 1077 ws6.0guide3.7nohoard
Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling
# wall clock time speed up old efficiency balance
of h:mm:ss seconds over tah speed up/ new/old
cpus 1 cpu oma+ #cpus exectuable other options
1 85:24 5,124 1.00 1.73 1.000 new v8 ws6.0guide3.7hoardxvect=no
1 84:38 5,078* 1.00 1.74 1.000 new v8 ws6.0guide3.8nohoardxvect=no
13 6:46 406 12.51 1.73 0.962 new v8 ws6.0guide3.7hoardxvect=no
13 6:47 407 12.60 0.968 new v8 ws6.0guide3.7nohoardxvect=no
13 6:55 415 12.24 0.941 new v8 ws6.0guide3.8hoardxvect=no
13 6:45 405 12.54 0.964 new v8 ws6.0guide3.8nohoardxvect=no
23 4:07 247 20.56 +2.85 0.894 new v8 ws6.0guide3.7hoardxvect=no
23 4:08 248 20.66 0.898 new v8 ws6.0guide3.7nohoardxvect=no
23 4:08 248 20.66 0.898 new v8 ws6.0guide3.8hoardxvect=no
23 4:04 244 20.81 +2.88 0.905 new v8 ws6.0guide3.8nohoardxvect=no
13 6:57 417 *12.29 1.87 0.945 new v8 ws6.0guide3.7hoard
13 7:28 448 *11.44 -- 0.879 new v8 ws6.0guide3.7nohoard
+ ==> 704/x
(static and dynamic loop scheduling give same results for COMPAQ)
(OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000))
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave
# OPT wall clock time speed up efficiency balance Tahomas est. WA|OR
of lvl mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 O5 42:33 2,553 1.00 1.000 171.000 0.300 21:16:30
4 O5 13:25 805 3.17 0.793 42.750 0.950 6:42:30
4 O4 12:17 737 3.17 0.793 42.750 1.038 6:42:30
same as above, just different headings:
# OPT wall clock time speed up efficiency mmout csh Tahoma # of
of lvl mm:ss seconds over speed up/ interval -f Factor runs
cpus 1 cpu #cpus Tt / Tr
1 O5 42:33 2,553 1.00 1.000 171.000 no
4 O5 13:25 805 3.17 0.793 42.750 no
4 O4 12:17 737 3.17 0.793 42.750 yes 1.038 4
same csh -f turned on for these, comparing other opts
# buff wall clock time speed up efficiency mmout cxml Tahoma # of
of io mm:ss seconds over speed up/ interval math Factor runs
cpus 1 cpu #cpus Tt / Tr
4 no 12:17 737 3.17 0.793 42.750 no 1.038 8
4 yes 12:57 737 3.17 0.793 42.750 no 1.038 8
4 no 12:25 yes 3
4 yes 12:28 737 3.17 0.793 42.750 yes 1.038 8
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
# OPT wall clock time speed up little mmout Tahoma # of
of lvl mm:ss seconds over endian interval Factor runs
cpus 1 cpu Tt / Tr
4 O4 12:17 737 3.17 no 15 1.038 8
1 O4 38:26 2,306 -- yes 15 ?.??? 4
4 O4 12:08 728 3.17 yes 15 ?.??? 4
1 O4 38:27 2,307 -- yes 60 1.038 4
4 O4 12:16 736 3.13 yes 60 ?.??? 4
COMPAQ ES40 500MHz/4MB$ (rainier) full memory
# wall clock time speed up efficiency balance Tahomas est. WA|OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 O5 42:33* 2,553* 1.00 1.000 171.000 xx 21:16:30
4 O5 13:05 785 3.29 0.821 42.750 0.975 6:32:30
(* est. since only 3 runs performed with fastest at 43:00)
** ESTIMATES **
COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory
# wall clock time speed up efficiency balance Tahomas est. WA|OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 33:30 2,010 1.00 1.000 171.000 xx 16:45:00
4 10:10 611 3.29 0.821 42.750 1.252 5:05:00
## end of 1-hour d3 tests section
########################################################################
########################################################################
## 3-hour d1d2 simulations
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
# wall clock time speed up efficiency mmout compiler # of
of h:mm:ss seconds over speed up/ interval flags runs
cpus 1 cpu #cpus
1 2:00:29 7,229 1.00 1.000 guide3.7 4
1 2:04:15 7,455 15 "f77 -fast" 3
13 10:44 644 11.22 0.863 15 guide3.7
13 10:34 634 11.40 0.877 60 guide3.7
13 10:30 630 11.47 0.883 180 guide3.7
13 20:08 1208 xx.xx x.xxx 15 ws6g3.8-O4 1
Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling
# wall clock time speed-up old efficiency balance
of h:mm:ss seconds over tah speed up/ new/old
cpus 1 cpu omas #cpus exectuable other options
13 6:40 400 xx.xx x.xxx new v8 15 ws5.0guide3.8nohoard
13 16:49 1009 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoard
13 6:05 365 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoardxvect=no
13 6:33 393 xx.xx x.xxx new v8 15 ws6.0guide3.7nohoardxvect=no
23 4:23 263 xx.xx x.xxx new v8 15 ws5.0guide3.8nohoard
23 23:44 1424 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoard
23 3:59 239 xx.xx x.xxx new v8 15 ws6.0guide3.8hoard
23 4:03 243 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoardxvect=no
23 4:23 263 xx.xx x.xxx new v8 15 ws6.0guide3.7nohoardxvect=no
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line
(note: a version of the v2.12 code that did not have our FILECOMMAND
mods ran slightly slower that our version and both had "csh -f")
# OPT wall clock time speed up efficiency mmout csh Tahoma # of
of lvl mm:ss seconds over speed up/ interval -f Factor runs
cpus 1 cpu #cpus Tt / Tr
1 35:41 2,141 1.00 1.000 15 no 1
1 35:30 2,130 1.00 1.000 15 yes 4
1 35:13 2,113 1.00 1.000 60 yes 4
1 35:13 2,113 1.00 1.000 180 yes 1
4 O5 12:03 723 2.95 0.738 15 no 0.891 4
4 O5 11:05 665 3.21 0.803 15 yes 0.968 4
le speed up
1 O4 33:02 1,982 1.00 1.075 15 little_end 4
4 O4 10:20 620 3.20 1.039 15 little_end 4
1 O4 32:55 1,975 1.00 1.070 60 little_end 4
4 O4 10:16 616 3.21 1.011 60 little_end 4
4 O4 10:44 644 3.307 0.827 15 yes 1.000 4
4 O4 12:44 644**no speculate or pipelne**15 yes 1.000 4
4 O4 11:07 667 NCAR's flags 15 yes 0.966 4
4 O5 10:51 651 3.25 0.811 60 no 0.974 4
4 O5 10:51 651 3.25 0.811 60 yes 0.974 4
4 O4 10:23 623 3.39? 0.xxx 60 yes 1.018 4
4 O4 10:52 652 NCAR's flags 60 yes 0.972 1
4 O5 10:44 644 3.28 0.820 180 no 0.978 4
4 O5 10:42 642 3.29 0.823 180 yes 0.981 3
4 O4 10:14 614 3.44? 0.xxx 180 yes 1.026 4
## end of 3-hour d1d2 simulations section
########################################################################
########################################################################
## 3-hour d3 simulations:
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:32:16 16,336 1.00 1.000 171.000
13 27:30 1,650 9.90 0.762 13.154
1650/1338 = 23%
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
# wall clock time speed up efficiency balance Tahomas est. WA+OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*2
cpus 1 cpu #cpus #cpus hours
1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75
13 22:18 1,338 12.21 0.939 13.154 1.731 7.43
Sun E6500 400MHz/8MB$/80MHz (buddy)
# wall clock time speed up efficiency balance Tahomas est. WA+OR
of h:mm:ss seconds over speed up/ 171 time*10*2
cpus 1 cpu #cpus #cpus hours
1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75
(assumed this time for 1 CPU)
13 23:18 1,398 11.69 0.899 13.154 1.657 7.77
19 16:29 989 15.52 0.869 9.000 2.342 5.49
25 12:30 750 21.78 0.871 6.840 3.088 4.17
29 10:56 656 24.90 0.859 5.897 3.530 3.64
original executable compiled on tahoma for 4MB cache for everything below
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:34:02 16,442 1.00 1.000 171.000
13 23:02 1,382 11.90 0.915 13.154
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:34:02 16,442 1.00 1.000 171.000
13 27:57 1,677 9.80 0.754 13.154
1677/1382 = 21%
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
13 31:33 1,893 - - 13.154
Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
13 35:05 2,105 - - 13.154
2105/1893 = 11%
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 8:01:48 28,908 1.00 1.000 171.000
4 x:xx:xx xx,xxx x.xx 0.xxx 42.750
8 x:xx:xx xx,xxx x.xx 0.xxx 21.375
13 38:36 2,316 12.48 0.xxx 13.154
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 7:29:27 26,967 1.00 1.000 171.000
4 2:01:48 7,308 3.69 0.923 42.750
8 1:03:39 3,819 7.06 0.883 21.375
13 42:01 2,521 10.67 0.823 13.154
## end of 3-hour d3 simulations section
########################################################################
## end of 3-hour d1d2 simulations section
########################################################################
########################################################################
## 2-hour d3 simulations
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
# wall clock time speed up efficiency mmout compiler # of
of h:mm:ss seconds over speed up/ interval flags runs
cpus 1 cpu #cpus
13 26:14 1574 ?.??? 0.xxx 15 ?.??? 3?
13 26:07 1567 ?.??? 0.xxx 60 ?.??? 2?
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
# OPT wall clock time speed up efficiency mmout Tahoma # of
of lvl mm:ss seconds over speed up/ interval Factor runs
cpus 1 cpu #cpus Tt / Tr
4 O4 25:41 1540 ?.??? 0.xxx 15 ?.??? 4
4 O4 24:44 1484 ?.??? 0.xxx 60 ?.??? 3?
4 O4 26:41 1601 ?.??? 0.xxx 180 ?.??? 2?
########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) from:
## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
# wall clock time speed up efficiency # of comments
of h:mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 19:50 1190 -- 1.000 4
4 5:31 331 3.410 0.852 4
8 3:11 191 6.23** 0.779** 4 **see note below
13 2:05 125 9.520** 0.732** 4 **I believe the run is too
short to see our true speed
up which is more like 0.87
efficiency.**
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
# wall clock time speed up efficiency # of comments
of h:mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 14:39 879 1.000 1.000 4
4 4:05 245 3.588 0.897 4
8 2:22 142 6.190** 0.774** 4 **see note above
13 1:35 95 9.253** 0.711** 4 **see note above
COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp
# OPT wall clock time speed up efficiency # of comments
of lvl mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 NCAR 4:47 287 1.000 1.000 4
4 NCAR 1:28 88 3.261 0.815 4
COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp
# OPT wall clock time speed up efficiency # of comments
of lvl mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 NCAR 5:19 319 1.000 1.000 5
########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200)
## run for twice as long
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
1 40:07 2407 x.xxx 0.xxx 1
4 11:13 673 3.577 0.894 2
8 6:26 386 6.236 0.779 2
13 4:14 234 10.286 0.791 2
COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp
1 9:42 582 x.xxx 0.xxx 4
2 5:15 315 1.848 0.924 4
4 2:58 178 3.270 0.817 3