University of Washington MM5 Benchmarks

Top of Page COMPAQ Memory Interleave Comparisons Utah and UW Ensembles UW Ensemble Summary Table Conclusions for Utah and UW Ensembles Reisner2 Vertical Levels Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2 Older Runs
(see also old bencmarks (see also CPU2000 benchmarks at Spec.Org

July 2014 Comparisons

1 of same Chips Dom Hr ver best =sec worst CPU factor E5-2650v2 @ 2.6 GHz specfp@2006 wrf=78.4,481.wrf specspeed=89-104 E5-2637v2 @ 3.5 GHz specfp@2006 wrf=89.6 (vs 2620v2 58.7),481.wrf specspeed= E5-2620v1 @ 2,0GHz b1,b2,b3,b4 on bob cluster, 481.wrf specspeed=57.2 E5-2620 = E5-2620 v2 @ 2.1GHz new a113-a116 ,$406,481spec=471,481.wrf specspeed=65.5 E5645 = E5645 @ 2.4 GHz a109-a112, n18,n19,n20,n21(x12), $551,481.wrf specrate=210,specspeed=42.3 E5620 = E5620 @ 2.4 GHz n1-n8,n15,n16,n17(x8), $387 481.wrf specrate=169,specspeed=42.0 E5-2637 $996, 481specrate=504, 481.wrf specspeed=89.6 E5-2650v3 481specrate=684, 481.wrf specspeed=94.8baseline,98peak(Dell) a113 and a114 are E5-2620 v2 2.1GHz b3 (on bob) is E5-2620 v1 2.0GHz a113-RAID 55:47 (2 restart times, 7 wrfouts per domain) restarts 235.57, wrfouts 115.22, overall writes 350.79 a114-disk 54:36 (2 restart times, 7 wrfouts per domain) restarts 236.34, wrfouts 151.86, overall writes 388.19 a113Rsplit 51:59 (2 restart times, 7 wrfouts per domain) restarts 51.29, wrfouts 20.48, overall writes 71.78 a114dsplit 52:02 (2 restart times, 7 wrfouts per domain) restarts 79.39, wrfouts 57.10, overall writes 136.49 b3 to SSD 55:37 (2 restart times, 7 wrfouts per domain) restarts 209.21, wrfouts 124.93, overall writes 334.14 b3 to RAID 56:31 (2 restart times, 7 wrfouts per domain) restarts 241.82, wrfouts 151.17, overall writes 392.99 projected savings with SSD: 24 x thruput 1 of same E5645 Chips Dom Hr ver ncpu time ftime spec.org factor factor /home/disk/sage2/mm5rt/nobackup/runtest/2014071500/d4rerundir n18 E5645 @ 2.4 GHz d4 1 361 48 20:27 1147 3.82 n18 E5645 @ 2.4 GHz d4 1 361 24 35:54 2060 1.80 n18 E5645 @ 2.4 GHz d4 1 361 12 78:08 4378 a109 E5645 @ 2.4 GHz d4 1 361 48 21:36 1116 a109 E5645 @ 2.4 GHz d4 1 361 24 39:12 2142 a109 E5645 @ 2.4 GHz d4 1 361 12 80:41 4343 a113 e5-2620v2 1 361 48 14:42 741 3.69 5.86 a113 e5-2620v2 d4 1 361 24 24:42 1330 2.06 3.27 a113 e5-2620v2 d4 1 361 12 48:43 2736 1.0 1.59 n51 E5-2650v3 @ 2.3 GHz 1 35 48 16:12 972 -- bad or w/o infiniband n51 E5-2650v3 @ 2.3 GHz 1 361 48 12:31 627 3.40 n51 E5-2650v3 @ 2.3 GHz 1 361 160 5:56 242 8.81 n51 E5-2650v3 @ 2.3 GHz 1 361 140 6:22 269 7.92 n51 E5-2650v3 @ 2.3 GHz 1 361 80 8:50 415 5.13 n51 E5-2650v3 @ 2.3 GHz 1 361 24 21:02 1133 1.88 n51 E5-2650v3 @ 2.3 GHz 1 361 12 38:06 2131 1.00 n51 E5-2650v3 @ 2.3 GHz 1 361 12.1x20.slot 38:06 2131 1.00 n51 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.slot 12:31 627 3.40 n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.socket 11:28 572 3.725 n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x20.slot 11:23 572 3.725 n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.socket 10:45 528 n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.node8 10:44 529 using-map-by ppr:8:socket n58 E5-2650v3 @ 2.3 GHz 1 361 48.3x16.slot 11:22 567 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x20.socket 19:07 1026 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x20.slot 19:12 1029 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x16.socket 17:34 940 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x16.slot 18:53 1011 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x12.socket 16:31 873 n58 E5-2650v3 @ 2.3 GHz 1 361 24.2x12.slot 18:53 1011 n58 E5-2650v3 @ 2.3 GHz 1 361 24.3x10.socket 16:07 850 n58 E5-2650v3 @ 2.3 GHz 1 361 24.3x10.slot 19:13 1037 48-hr d4 runs for 2014111712 case current factor n21-n1 asstd 48 35 132 7:01:48 25308 1.000 n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 160 3:17:19 11839 2.138 1.596vs80cpu n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 160 3:11:58 11518 2.197 1.641vs80cpu[/work/restrts] n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 140 3:30:49 12649 2.001 n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 120 3:55:51 14151 1.788 n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 100 4:32:06 16326 1.550 n51 E5-2650v3 @ 2.3 GHz 48 361.avx1 80 5:15:04 18904 1.339 notftime n51 E5-2650v3 @ 2.3 GHz 1 361 48 12:31 751 3.04 n51 E5-2650v3 @ 2.3 GHz 1 361 160 5:56 356 6.42 n51 E5-2650v3 @ 2.3 GHz 1 361 80 8:50 530 4.31 n51 E5-2650v3 @ 2.3 GHz 1 361 24 21:02 1262 1.81 n51 E5-2650v3 @ 2.3 GHz 1 361 12 38:06 2286 1.00 n51 E5-2650v3 @ 2.3 GHz 1 35 40 24:01 1441 -- all on 1 machine n51 E5-2650v3 @ 2.3 GHz 1 35 24 26:20 1580 -- bad or w/o infiniband n51 E5-2650v3 @ 2.3 GHz 1 35 12 33:36 2016 -- bad or w/o infiniband n51 E5-2650v3 @ 2.3 GHz 1 35 140 7:32 452 n51 E5-2650v3 @ 2.3 GHz 1 35 160 6:36 396 n51 E5-2650v3 @ 2.3 GHz 1 35 160 7:03 423 using ensm-ssd ftdiff=369 n51 E5-2650v3 @ 2.3 GHz 1 35 80 12:39 759 spec.org 481spec sep-2014 e5-2620v3 481.wrf rate = 562 e5-2650v3 481.wrf rate = 702 = 1.25 e5-2620v2 481.wrf normal = 71.3 e5-2650v2 481.wrf normal = 90.9 = 1.27 e5645@2.4itsu normao= 114 24 x thruput 1 of same E5645 Chips Dom Hr ver ncpu best =sec spec.org factor factor E52620 d4t1 1 3.5 48 12:34 754 380 474 65 1.778 1.662 a113-a116 E52620 d4t1 1 3.5 24 22:30 1350 380 1.000 1.639 a113-a116 E52620 d4 1 3.5 48 12:28 748 380 474 65 1.778 1.662 a113-a116 E52620 d4 1 3.5 24 22:10 1330 380 1.000 1.639 a113-a116 E52620 d4 1 3.5 12 45:01 2701 380 1.000 1.639 a113-a116 E52620v1 1 361 12 63:10 @ 2,0GHz b1 E52620v1 1 361 12 54:20 @ 2,0GHz b3 E5645 d4 1 3.5 48 20:43 1243 208 1.750 1.000 a109-a112e E5645 d4 1 3.5 24 36:20 2180 208 196 40 1.000 1.000 a109-a112e E5645 d4 1 3.5 12 71:02 4262 208 196 40 1.000 1.000 a109-a112e E5645 d4 1 3.5 48 24:39 1479 208 196 40 1.000 1.000 n18-n21,usingnode1rundir,4x12 E5620 d4 1 3.5 72 13:52 832 170 195 42 4.362 1.991 E5620 d4 1 3.5 60 14:58 898 170 195 42 4.041 1.991 E5620 d4 1 3.5 48 18:15 1095 170 195 42 3.314 1.991 n1-n6,6x8 E5620 d4 1 3.5 48 18:15 1095 170 195 42 3.314 1.991 E5620 d4 1 3.5 24 32:06 1926 170 1.884 1.132 E5620 d4 1 3.5 12 60:29 3629 170 1.000

COMPAQ Memory Interleave Comparisons

######################################## 6/16/2001 Running Ensemble domain benchmarks with different memory interleaving C = Compaq (C500 = ES40, C667=DS20, C833=ES-40) NOTE: executable was NOT recompiled between runs! 1 of same INTERLEAVE Chips Dom Hr ver best =sec worst CPU factor factor # of runs ES-40 1-way interleave (May tests on chocolat, .5G and 2G module) C833x1x4 uw 24 3.3 199:17 11957 12386 0.785 1.000 2x4 C833x1 uw 24 3.3 156:21 9381 9431 1.000 1.000 2 C833x2 uw 24 3.3 90:31 5431 5456 1.727 1.000 2 C833x4 uw 24 3.3 52:52 3172 3284 2.957 1.000 2 ES-40 2-way interleave (June tests on chocolat, 2x2G + 1G module) C833x1x4 uw 24 3.3 168:12 10092 10323 0.907 1.185 2x4 C833x1 uw 24 3.3 152:32 9152 9227 1.000 1.025 2 C833x2 uw 24 3.3 85:34 5134 5161 1.783 1.058 2 C833x4 uw 24 3.3 46:26 2786 2857 3.285 1.139 2 ES-40 4-way interleave (July tests on chocolat, 4x2G module) C833x1x4 uw 24 3.3 158:48 9528 9552 0.940 1.255 2x4 C833x1 uw 24 3.3 149:12 8952 8961 1.000 1.048 2 C833x2 uw 24 3.3 82:04 4924 4932 1.818 1.103 2 C833x4 uw 24 3.3 44:12 2652 2676 3.376 1.185 2

Utah and UW Ensemble Benchmarks

######################################## 5/3/2001 Running Ensemble domain benchmarks on various platforms S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000) C = Compaq (C500 = ES40, C667=DS20) A = AMD Athlon (TCP/IP protocol) V = AMD Athlon with Via network card I = Intel A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!) 1 of same Compaq 667MHz Chips Dom ver best =sec worst CPU factor factor # of runs C667x1 utah 3.3 142:55 8575 8982 1.000 1.000 2 C667x2 utah 3.3 80:57 4857 4866 1.765 1.765 2 C500x1 utah 3.3 214:35 12875 ---- 1.000 0.666 1 C500x2 utah 3.3 109:53 6593 6615 1.953 1.301 2 C500x4 utah 3.3 68:16 4096 4334 3.143 2.094 2 A800x1 utah 3.3 318:05 19085 21631 1.000 0.449 2 I1400x1 utah 3.3 270:xx A1333x1 utah 3.3 (176:51)10611 mpp/1.08 0.808 est. C600x1 utah 3.3 174:02 10442 10502 1.000 0.821 2 DS-10 C833x1x4 utah 3.3 142:19 8539 8662 0.845 1.004 2x4 C833x1 utah 3.3 120:18 7218 7232 1.000 1.188 2 C833x2 utah 3.3 67:21 4041 4058 1.786 2.122 2 C833x4 utah 3.3 39:13 2353 2356 3.068 3.644 2 S400x12 utah 3.3 51:26 3086 ---- ----- 2.779 1 S400x16 utah 3.3 40:19 2419 ---- ----- 3.545 1 S400x23 utah 3.3 33:40 2020 ---- ----- 4.245 1 | Timings |Same CPU | C667x1 Factors|Number Chips |Domain| Code | best =sec worst |Factor | mpp | non-mpp |of runs C667x1 utah 3.3mpp 155:18 9318 9340 1.000 1.000 0.920 2 DS-20E C667x2 utah 3.3mpp 90:57 5457 5524 1.708 1.708 1.571 3 C500x1 utah 3.3mpp 214:44 12884 ---- 1.000 0.723 0.666 1 ES-40 C500x2 utah 3.3mpp 121:20 7280 7290 1.770 1.280 1.178 2 C500x4 utah 3.3mpp 78:36 4716 ---- 2.732 1.976 1.818 1 A1333x1 utah 3.3mpp 191:21 11481 ---- 1.000 0.812 0.747 ? Beowulf A1333x2 utah 3.3mpp 118:09 7089 ---- 1.620 1.314 1.210 ? A1333x4 utah 3.3mpp 78:45 4725 ---- 2.430 1.972 1.815 ? A1333x8 utah 3.3mpp 51:58 3118 ---- 3.682 2.988 2.750 ? A1333x12 utah 3.3mpp 46:18 2778 ---- 4.133 3.354 3.087 ? A1333x16 utah 3.3mpp 42:11 2531 ---- 4.536 3.682 3.388 ? V950x1 utah 3.3mpp 282:28 16948 ---- 1.000 0.550 0.506 ? V950x2 utah 3.3mpp 155:03 9303 ---- 1.822 1.002 0.922 ? V950x4 utah 3.3mpp 88:30 5310 ---- 3.192 1.755 1.615 ? V950x8 utah 3.3mpp 54:10 3250 ---- 5.215 2.867 2.638 ? V950x12 utah 3.3mpp 46:28 2788 ---- 6.079 3.342 3.076 ? V950x16 utah 3.3mpp 40:46 2446 ---- 6.929 3.809 3.506 ?

24-hour UW Ensemble Runs

1 of same Compaq 667MHz Chips Dom Hr ver best =sec worst CPU factor factor # of runs DS-20E C667x1 uw 24 3.3 188:12 11292 ---- 1.000 1.000 1 (97%) C667x1 singleproc 189:37 11377 ---- x.xxx x.xxx 1 (97%) C667x2 uw 24 3.3 110:34 6634 7017 1.702 1.702 1 (191%) DS-10 C600x1 uw 24 3.3 228:30 13710 14058 1.000 0.824 2 DS-10 ES-40 C500x1 uw 24 3.3 251:25 15085 ---- 1.000 0.749 1 C500x2 uw 24 3.3 143:12 8592 8647 1.756 1.314 2 C500x4 uw 24 3.3 88:02 5282 5442 2.856 2.138 2 (389%) C500x4 uw 48 3.3 174:41 10481 2 (383%) ES-40 C833x1x4 uw 24 3.3 158:48 9528 9552 0.940 1.185 2x4 C833x1 uw 24 3.3 149:12 8952 8961 1.000 1.261 2 C833x2 uw 24 3.3 82:04 4924 4932 1.818 2.293 2 C833x4 uw 24 3.3 44:12 2652 2676 3.376 4.258 2 AMD Dual-Processor 1.2 GHz Tyan Motherboard, PGF77 compiler A1200x2 uw 48 3.3 243:59 14639 1.543 FCFLAGS = -I$(LIBINCLUDE) -fast -Mcray=pointer -tp p6 -pc 32 -byteswapio -Mvect=prefetch,cachesize:393216 -mp -Mnosgimp LDOPTIONS = $(FCFLAGS) LOCAL_LIBRARIES = -lnsl -lm IBM PowerPC 375 MHz (-O2 is faster than new compiler with -O3) P375x1 uw 48 3.3 662:48 39768 39791 1.000 x.xxx 2 P375x2 uw 48 3.3 346:10 20770 20784 1.915 2 P375x4 uw 48 3.3 207:09 12429 12515 3.200 2 S400x27 uw 48 3.3 67:22 4042 48-hour Jun01 real-time 36/12 km with 37 levels 10/1/2001 and 3/21/2002: rainier-factor A1200x1 uwrt37 48 3.4 825:52 49552 A1200x2 uwrt37 48 3.4 471:01 28261 0.74 C500x4 uwrt37 48 3.4 346:22 20782 1.00 S400x27 uwrt37 48 3.4 108:36 6516 3.19 3/21/2002 ws7guide40 S400x27 uwrt36 48 3.4 112:00 6720 workshop7,guide3.8 (11% wow!) S400x27 uwrt37 48 3.4 125:09 7509 2.77 workshop6,Guide3.8 S400x27 uwrt36 48 3.4 142:57 8577 workshop7,noguide (14% slower)

UW Ensemble Domain Summary Table

S = Sun (S750 = 750MHz UltraSPARC-III Sun Blade 1000) C = Compaq (C500 = ES40, C667=DS20) A = AMD Athlon (TCP/IP protocol) A1200 = Tyan Thunder K7 Dual 1.2GHz AthlonMP, 2x256MB simms ($1375!) V = AMD Athlon with Via network card I = Intel P = IBM PowerPC implied comparison for MM5 between AMD and Intel and Sun is 177.mesa in www.spec.org cfp2000, this doesn't work too well for AMD vs IBM P375, however, and it underestimates Compaqs. None of them match too well. 1 of same Compaq 667MHz Chips Dom Hr ver best =sec worst CPU factor factor # of runs A1200x1 uw 3 3.3 25:51 1551 ---- 1.000 0.901 1 A1200x2 uw (3) 3.3 15:14 914 ---- ~1.7 1.528 1 1/16 of 48-hr run I1400x1 uw 3 3.3 36:45 2205 ---- 1.000 0.634 1 C833x1 uw (3) 3.3 18:39 (1119) ** 1/8 of 24-hr ** 1.248 (2) C833x2 uw (3) 3.3 10:15 (615) 1.818 2.272 (2) C833x4 uw (3) 3.3 5:31 (331) 3.376 4.221 (2) C667x1 uw 3 3.3 23:17 1397 1422 1.000 1.000 2 C667x2 uw 3 3.3 13:44 824 827 1.695 1.695 2 C600x1 uw 3 3.3 28:49 1729 1752 1.000 0.808 2 C500x1 uw 3 3.3 31:14 1874 1889 1.000 0.745 2 C500x2 uw 3 3.3 17:46 1066 1077 1.758 1.311 4 C500x4 uw 3 3.3 10:36 636 661 2.947 2.197 2 S400x1 uw 3 3.3 70:52 4252 4260 1.000 0.329 2 S400x2 uw 3 3.3 35:31 2131 2140 1.995 0.656 2 S400x4 uw 3 3.3 18:12 1092 ---- 3.894 1.280 1 S400x8 uw 3 3.3 9:35 575 578 7.395 2.430 2 S400x12 uw 3 3.3 6:46 406 407 10.473 3.441 2 S400x23 uw 3 3.3 4:26 266 288 15.985* 5.252* 2 S750x1 uw 3 3.3 43:32 2612 ---- 1.000 0.535 1 * --> run is too short to get scaling factors correct, we generally see 0.8 scaling factor, so it should be about 18.4 times as fast as a single processor run

Conclusions for Utah and UW Ensemble Benchmarks

1) MPP is about 8% slower than regular code. 2) 1 Atlon 1.3GHz chip for $1000 (+ pittance for PG compiler) is nearly identical to DS-10 600MHz!!! 3) 4 Athlon 1.3GHz chips run MPP code as fast as ES-40 with 4x500MHz chips ($4K vs $40K?). 4) For ensembles, it makes more sense to run multiple MM5s on single processors than to run back to back on all processors. 5) 16 Athlon 950 MHz chips coupled with Via networking can run code as fast as 16 Sun E6500 400 MHz chips. For overall speed in high resolution runs, you need to make a large cluster to equal the power of a Sun E6500 because scaling is so poor in clusters. 6) The faster the chip, the worse the scaling. 7) To run our current ensembles (36/12 max dimension 101x137) out to 48 hours, we should expect these results for MM5v3.x standard physics: Athlon 1333 MHz (7:44) (81% speed of 667 running same code) DS-10 600 MHz 7:37 ?2001 DS-10 667 MHz 6:16 (Do they make this? or just 600 MHz?) ?2002 DS-10 883 MHz 5:13 (assuming ES40 performance of chip) DS-20E 1x667 MHz 6:16 (run 2 simultaneously in less than 2x3:41 = 7:22) DS-20E 2x667 MHz 3:41 Jun01 DS-20E 2x883 MHz 2:57 (Due out ...) ES-40 4x500 MHz 3:01 ES-40 4x667 MHz 2:11 (too bad we never got that promised free chip upgrade!) ES-40 4x883 MHz 1:43 (but you shouldn't run it this way, instead run 4 simultaneously in 6:38 vs this 6:52) 23 of tahoma's CPUs 1:00

Reisner2 and Vertical Levels

Top of Page Utah and UW Ensembles Conclusions for Utah and UW Ensembles Reisner2 Vertical Levels Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2 Older Runs
######################################## 4/7/2001 - present Comparing v3.4 runtimes of current and new domains on Sun E6500 23x400 MHz [c,m] = [current,montana] MB Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs cd1d2 6 3.3 11:48 708 793 6.0 3.8 dF 384 15,15,4,1 4 cd1d2 6 3.3 11:41 701 785 6.0 3.8 dT 384 15,15,4,1 5 cd1d2 6 3.3 10:18 618 678 6.0 3.8 dF 384 365,1 4 cd1d2 6 3.3 10:19 619 691 6.0 3.8 dT 384 365,1 3 cd1d2 6 3.3Re 18:57 1137 1191 6.0 3.8 dF 384 365,1 3 ==> Reisner2 cost is 1.84 for version 3.3! cd1d2 6 3.4Re 19:08 1148 1198 6.0 3.8 dT 384 15,15,4,1 5 cd1d2 24 3.4 48:47 2927 ---- 6.0 3.8 dT 384 15,15,4,1 1 cd1d2 24 3.4Re 88:33 5313 ---- 6.0 3.8 dT 384 15,15,4,1 1 md1d2 24 3.4Re 113:01 6781 ---- 6.0 3.8 dT 384 15,15,4,1 1 Top of section [c,m] = [current,montana] MB Dom Hr lvls best =sec worst f77 Guide stack TAP,BUF,INC # of runs cd1d2 24 32 48:47 2927 ---- 6.0 3.8 dT 384 15,15,4,1 1 md1d2 24 32 59:46 3586 ---- 6.0 3.8 dT 384 15,15,4,1 1 cd1d2 24 37 59:57 3597 ---- 6.0 3.8 dT 384 15,15,4,1 1 md1d2 24 37 71:57 4317 ---- 6.0 3.8 dT 384 15,15,4,1 1 md3 8 32 81:51 4911 ---- 6.0 3.8 dT 384 60,60,1,1 1 md3 8 37 94:06 5646 ---- 6.0 3.8 dT 384 60,60,1,1 1 cd1d2 60 32 2:02:xx ---- ---- 6.0 3.8 dT 384 15,15,4,1 est. cd1d2 60 37 2:30:xx ---- ---- 6.0 3.8 dT 384 15,15,4,1 est. md1d2 60 32 2:29:xx est. md1d2 60 37 3:00:xx est. md3 24 32 4:06:xx est. md3 24 37 4:50:xx est. -------------------------------- 60-hour forecast on tahoma (23 processors): Times (hh:mm) Domain Code Lvl Phys Clock Start End (am/pm) ------- ---- -- --- ------ ----- ----------- Cur d1d2 2.12 32 simp 1:50 7:16 9:06 Cur d1d2 3.4 32 simp 2:02 " 9:18 Cur d1d2 3.4 37 simp 2:30 " 9:46 (+ 22.9%) Mon d1d2 3.4 32 simp 2:29 " 9:45 (new domain + 22.5%) Mon d1d2 3.4 37 simp 3:00 " 10:16 (+ 20% for levels) 60-hour forecast on tahoma (29 processor estimates, 10.2% faster than 23): Times (hh:mm) Domain Code Lvl Phys Clock Start End (am/pm) ------- ---- -- --- ------ ----- ----------- Cur d1d2 2.12 32 simp (1:40) 7:16 8:56 Cur d1d2 3.4 32 simp (1:50) " 9:06 Cur d1d2 3.4 37 simp (2:16) " 9:32 Mon d1d2 3.4 32 simp (2:15) " 9:31 Mon d1d2 3.4 37 simp (2:43) " 9:59 Reisner2 -------------------------------- 60-hour forecast on tahoma (23 processors): Times (hh:mm) Domain Code Lvl Phys Clock Start End (am/pm) ------- ---- -- --- ------ ----- ----------- Cur d1d2 3.4 32 simp 2:02 7:16 9:18 Cur d1d2 3.4 32 rei2 3:41 " 10:59 Mon d1d2 3.4 32 rei2 4:42 " 11:58 Mon d1d2 3.4 37 rei2 (5:23) " 12:39 36-hour 4km domain ------------------ Times (hh:mm) Domain Code Lvl Phys Clock Start End (am/pm) ------- ---- -- --- ------ ----- ----------- Cur d3 2.12 32 simp 3:33 9:18 12:51 Cur d3 3.4 32 simp (3:56) " 1:14 Mon d3 3.4 32 simp 4:06 9:45 1:51 Mon d3 3.4 37 simp 4:42 10:16 2:58 ( + 35.4%)

Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2

Top of Page Utah and UW Ensembles Conclusions for Utah and UW Ensembles Reisner2 Vertical Levels Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2 Older Runs
######################################## 1/31/2001 - 2/12/2001 Comparing Guide 3.8 and 3.9 and Workshop 6.0 and 6.0u1 tahoma, Sun E6500, jul00 d1d2 runs in /tmp: sched= Dom Hr ver mm:ss = seconds f77 Guide stack TAP,BUF,INC # of runs d1d2 3 2.12 4:10 250 6.0 3.8 d 200 15,15,4,1 5 d1d2 3 2.12 4:53 293 6.0u1 3.8 d 200 15,15,4,1 5 MB Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs d1d2 6 2.12 10:43 643 676 6.0 3.8 dT 200 15,15,4,1 4 d1d2 6 2.12 10:38 638 662 6.0 3.8 dF 200 15,15,4,1 4 d1d2 6 2.12 10:59 659 694 6.0 3.9 rT 200 15,15,4,1 4 d1d2 6 2.12 11:03 663 681 6.0 3.9 rF 200 15,15,4,1 4 d1d2 6 2.12 14:06 846 864 6.0 3.9 dU 200 15,15,4,1 5 d1d2 6 2.12 11:44 704 739 6.0 3.9 rU 200 15,15,4,1 5 d1d2 6 2.12 11:48 708 718 6.0u1 3.8 dU 200 15,15,4,1 2 d1d2 6 2.12 11:25 685 691 6.0u1np 3.8 dU 200 15,15,4,1 2 d1d2 6 2.12 11:38 698 727 6.0u1 3.9 rU 200 15,15,4,1 2 d1d2 6 2.12 15:19 919 927 6.0u1 3.9 dU 200 15,15,4,1 4 d1d2 6 2.12 10:26 626 697 6.0u1np 3.9 rT 200 15,15,4,1 4 d1d2 6 2.12 10:14 614 ! 834 6.0u1np 3.9 rF 200 15,15,4,1 2 np = xprefetch=no^ ^ -WG,scheduling= {r | d} ^ OMP_DYNAMIC = {True | Unset | False } tahoma, Sun E6500, jul00 d3 runs in /tmp: MB Dom Hr ver best =sec worst f77 Guide stack TAP,BUF,INC # of runs d3 2 2.12 17:22 1042 1167 6.0 3.8 rT 200 60/15 2 d3 2 2.12 17:41 1061 1086 6.0 3.8 rF 200 60/15 2 d3 2 2.12 17:18 1038 1071 6.0 3.9 rT 200 60/15 3 d3 2 2.12 17:21 1041 1074 6.0 3.9 rF 200 60/15 4 d3 2 2.12 17:55 1075 1088 6.0u1 3.9 rT 200 60/15 2 d3 2 2.12 17:55 1075 1086 6.0u1 3.9 rF 200 60/15 2 d3 2 2.12 18:24 1104 1140 6.0u1np 3.9 rT 200 60/15 2 d3 2 2.12 18:18 1098 1098 6.0u1np 3.9 rF 200 60/15 2 d3 2 2.12 19:34 1174 1185 6.0u2 none F 3000 60/15 2 d3 2 2.12 19:30 1170 1204 6.0u2 none T 3000 60/15 3 d3 2 2.12 19:13 1153 1167 6.0u2np none F 3000 60/15 2 d3 2 2.12 19:13 1153 1180 6.0u2np none T 3000 60/15 2 d3 2 2.12 16:49 1009 1032 6.0 3.8 T 3000 360 2 d3 2 2.12 16:52 1012 x 6.0hyd 3.8 F 3000 360 1 d3 2 2.12 16:57 1017 1179 6.0hyd 3.8 T 3000 360 2 d3 2 2.12 17:45 1065 x 6.0u1 3.9 T 3000 360 1 np = xprefetch=no^ ^ -WG,scheduling= {r | d} ^ OMP_DYNAMIC = {True | Unset | False } ######################################## 1/31/2001

Older benchmarks

Top of Page Utah and UW Ensembles Conclusions for Utah and UW Ensembles Reisner2 Vertical Levels Guide 3.8 vs 3.9 and Workshop 6.0, 6.0u1, 6.0u2 Older Runs
######################################## 11/7/2000 Ensemble 36km domain only, 48 hour runs: version mm:ss = seconds BDYFRQ # of runs Machine (CPUs) 2.12 51:57 3117 180 2 rainier, COMPAQ ES40 (4) 2.12 69:23 4163 180 2 glacier, COMPAQ DS20 (2) 2.12 20:22 1222 180 2 tahoma, Sun E6500 (23) ######################################################################## ######################################## 7/19/2000 tahoma, Sun E6500, jul00 runs in /tmp: Dom Hr ver mm:ss = seconds f77 Guide stack TAP,BUF,INC # of runs d1d2 6 2.12 12:23 743 6.0 3.8 200 180 2 d1d2 6 2.12 12:23 743 6.0 3.8 200 180 2 d1d2 6 3.3 12:16 736 6.0 3.8 200 180 2 d1d2 6 2.12 12:53 773 6.0 3.8 200 15,15,4,1 2 d1d2 6 3.3 13:47 827 6.0 3.8 200 15,15,4,1 2 * d1d2 6 3.3 13:56 836 6.0 3.8 200 15,0,4,1 2 * d1d2 6 3.3 14:28 868 6.0 3.8 200 15,0,4,1 2 * origouttap d1d2 6 2.12 13:17 797 6.0 3.8 200 15,15,1,1 2 * I/O = 3% d1d2 6 3.3 15:10 910 6.0 3.8 200 15,0,1,1 2 *these indicate that my mods to outtap.F are not slowing down the model in any significant way d1d2 60 2.12 127:27 7647 6.0 3.8 200 15,15,4,1 2000100600 d3 24 2.12 217:58 13,078 6.0 3.8 200 60,60,1,1 2000100600 d1d2 48 2.12 108:33 6513 6.0 3.8 200 15,15,4,1 1 d1d2 48 2.12 104:47 6287 6.0 3.8 200 180 1 d1d2 48 3.3 108:55 6535 6.0 3.8 200 15,15,4,1 1 d1d2 48 3.3 102:35 6155 6.0 3.8 200 15,15,4,1 1 d3 24 2.12 220:45 13,245 6.0 3.8 200 60,60,1,1 1 d3 24 3.3 248:12 14,892 6.0 3.8 200 60,60,1,1 1 12.4% ######################################## ######################################## tahoma, Sun E6500, 3-hour jul00 4km domain runs: version mm:ss = seconds f77 Guide stacksize BDYFRQ # of runs 2.12 28:35 1715 6.0 3.8 200 60 1 3.3 30:37 1837 6.0 3.8 200 60 1 ( 7.1% slower) 3.3 30:30 1830 6.0 3.8 400 60 1 ( 7.1% slower) 2.12 30:38 1838 5.0 3.7 200 60 3 3.3 34:31 2071 5.0 3.7 200 60 3 (12.6% slower) ######################################## ######################################## tahoma, Sun E6500, 2-hour jul00 4km runs in /tmp: version mm:ss = seconds f77 Guide stacksize BDYFRQ # of runs 2.12 16:37 997 6.0 3.8 200 60 2 3.3 19:32 1172 6.0 3.8 200 60 1 (17.5% slower) 2.12 18:32 1112 5.0 3.7 200 60 2 3.3 20:43 1243 5.0 3.7 200 60 2 (11.8% slower) ######################################## ######################################## rainier, COMPAQ ES40, 1-hour jul00 4km runs: where: Add'l Options = (-tune host -inline speed -pipeline -speculate by_routine) version mm:ss = seconds BDYFRQ # of runs Add'l Options 2.12 23:14 1394 60 1 yes 3.3 24:24 1464 60 2 yes (5% slower) 2.12 24:13 1453 60 1 no ######################################################################## dec99 (December 1999) real-time MM5 and ensemble typical runtimes: (I/O gtar means Ernie was running tape backups that resulted in heavy I/O slowdowns for /home/mm5rt rundirs): ---------------------------------------------- s => split CPUs between d1d2/d3 runs o => old tahoma CPUs (248 MHz) MM5RUNDIR I/O static gtar sched CPUs run tahoma rainier --------- ---- --- --- ---- --------------- ---------------------- /tmp no yes s23 d1d2 1:39:54 (wet day 2000061200) 8:57:35 pm /tmp no yes s23 d1d2 1:45:18 (wet day 2000060600) /tmp no yes s23 d3 2:40:18 (wet day 2000060600) /tmp no yes s23 d3 2:40:18 (wet day 2000060600) /tmp no yes 23 ENS 0:57:46 (cmcgem 2000060500) /tmp no yes *24 ENS 0:52:58 (nogaps 2000060200) mm5rt no no o13 1,2 3:20:23 (best) mm5rt yes no o13 1,2 3:32:25 (worst) mm5rt no yes o13 1,2 3:02:23 (best) mm5rt yes yes o13 1,2 3:29:28 (worst) /tmp no yes o13 1,2 2:56:22 (fast, dry day 2000012900) /tmp no yes o13 1,2 3:02:23 (fast, wet day 2000020100) /tmp yes yes o13 1,2 3:06:28 (one run) /tmp no yes o13 1,2 3:05:25 (one run) ensemble runs: mm5rt no no yes 3:12:24 /tmp no yes yes 2:46:25 (best, dry day 2000020400 NGM) mm5rt no yes yes 2:48:21 (best) mm5rt no yes yes 2:53:00 (avg) mm5rt yes yes yes 3:05:32 (worst) /tmp no yes 2:45:59 (dry,ngm 2000041100) /tmp no yes 2:50:09 (cmc 2000032900) rmm5rt yes yes 3:13:06 (best) yes 3:18:36 (worst) yes 3:24:05 (worst, Sunday 2000020700) d3 simulations: /tmp (-O4) no 6:11:34 (some pcpn, 032100) /tmp (-O4) no 6:18:52 (wet day, 031900) /tmp (-O4) yes no 6:40:33 (convective, 00050200) /tmp (full memory) 6:37:04 (23,824 sec 4.6%) rmm5rt no no 6:47:23 (best, dry day 020400) rmm5rt no no 6:55:30 (avg) rmm5rt no no 7:02:00 (wet day, 020100) rmm5rt no no 7:06:28 (worst) /tmp yes no 7:02:27 (dry day, first backup using /tmp) rmm5rt yes no 7:19:32 (worst) formulas for calculating times of different domains (same physics packages) on tahoma: y-grid pts * x-grid pts * levels * 36-km time step factor current 36 45 101x137x32x1 81x110x22x(36/45) (67 minutes) (24 minutes) + current 12 15 88x88x32x3 70x70x22x3x(36/45) (113 minutes) (39 minutes) = 180 minutes 63 minutes ######################################################################## ## 1-hour d3 tests (3/28/2000): Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 2:27:29 8,849 1.00 1.000 171.000 13 14:13 853 10.37 0.798 853/765 = 11% Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 2:27:26 8,846 1.00 1.000 171.000 13 12:45 765 11.56 0.889 ws5.0guide3.7 13 12:30 750 11.79 0.907 ws5.0guide3.7nohoard 13 11:44 704 ws6.0guide3.7hoardxvect=no 13 13:03 783 ws6.0guide3.7hoard 13 17:57 1077 ws6.0guide3.7nohoard Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling # wall clock time speed up old efficiency balance of h:mm:ss seconds over tah speed up/ new/old cpus 1 cpu oma+ #cpus exectuable other options 1 85:24 5,124 1.00 1.73 1.000 new v8 ws6.0guide3.7hoardxvect=no 1 84:38 5,078* 1.00 1.74 1.000 new v8 ws6.0guide3.8nohoardxvect=no 13 6:46 406 12.51 1.73 0.962 new v8 ws6.0guide3.7hoardxvect=no 13 6:47 407 12.60 0.968 new v8 ws6.0guide3.7nohoardxvect=no 13 6:55 415 12.24 0.941 new v8 ws6.0guide3.8hoardxvect=no 13 6:45 405 12.54 0.964 new v8 ws6.0guide3.8nohoardxvect=no 23 4:07 247 20.56 +2.85 0.894 new v8 ws6.0guide3.7hoardxvect=no 23 4:08 248 20.66 0.898 new v8 ws6.0guide3.7nohoardxvect=no 23 4:08 248 20.66 0.898 new v8 ws6.0guide3.8hoardxvect=no 23 4:04 244 20.81 +2.88 0.905 new v8 ws6.0guide3.8nohoardxvect=no 13 6:57 417 *12.29 1.87 0.945 new v8 ws6.0guide3.7hoard 13 7:28 448 *11.44 -- 0.879 new v8 ws6.0guide3.7nohoard + ==> 704/x (static and dynamic loop scheduling give same results for COMPAQ) (OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000)) COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave # OPT wall clock time speed up efficiency balance Tahomas est. WA|OR of lvl mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 O5 42:33 2,553 1.00 1.000 171.000 0.300 21:16:30 4 O5 13:25 805 3.17 0.793 42.750 0.950 6:42:30 4 O4 12:17 737 3.17 0.793 42.750 1.038 6:42:30 same as above, just different headings: # OPT wall clock time speed up efficiency mmout csh Tahoma # of of lvl mm:ss seconds over speed up/ interval -f Factor runs cpus 1 cpu #cpus Tt / Tr 1 O5 42:33 2,553 1.00 1.000 171.000 no 4 O5 13:25 805 3.17 0.793 42.750 no 4 O4 12:17 737 3.17 0.793 42.750 yes 1.038 4 same csh -f turned on for these, comparing other opts # buff wall clock time speed up efficiency mmout cxml Tahoma # of of io mm:ss seconds over speed up/ interval math Factor runs cpus 1 cpu #cpus Tt / Tr 4 no 12:17 737 3.17 0.793 42.750 no 1.038 8 4 yes 12:57 737 3.17 0.793 42.750 no 1.038 8 4 no 12:25 yes 3 4 yes 12:28 737 3.17 0.793 42.750 yes 1.038 8 COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp # OPT wall clock time speed up little mmout Tahoma # of of lvl mm:ss seconds over endian interval Factor runs cpus 1 cpu Tt / Tr 4 O4 12:17 737 3.17 no 15 1.038 8 1 O4 38:26 2,306 -- yes 15 ?.??? 4 4 O4 12:08 728 3.17 yes 15 ?.??? 4 1 O4 38:27 2,307 -- yes 60 1.038 4 4 O4 12:16 736 3.13 yes 60 ?.??? 4 COMPAQ ES40 500MHz/4MB$ (rainier) full memory # wall clock time speed up efficiency balance Tahomas est. WA|OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 O5 42:33* 2,553* 1.00 1.000 171.000 xx 21:16:30 4 O5 13:05 785 3.29 0.821 42.750 0.975 6:32:30 (* est. since only 3 runs performed with fastest at 43:00) ** ESTIMATES ** COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory # wall clock time speed up efficiency balance Tahomas est. WA|OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 33:30 2,010 1.00 1.000 171.000 xx 16:45:00 4 10:10 611 3.29 0.821 42.750 1.252 5:05:00 ## end of 1-hour d3 tests section ######################################################################## ######################################################################## ## 3-hour d1d2 simulations Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp # wall clock time speed up efficiency mmout compiler # of of h:mm:ss seconds over speed up/ interval flags runs cpus 1 cpu #cpus 1 2:00:29 7,229 1.00 1.000 guide3.7 4 1 2:04:15 7,455 15 "f77 -fast" 3 13 10:44 644 11.22 0.863 15 guide3.7 13 10:34 634 11.40 0.877 60 guide3.7 13 10:30 630 11.47 0.883 180 guide3.7 13 20:08 1208 xx.xx x.xxx 15 ws6g3.8-O4 1 Sun E4000 400MHz/8MB$ (tahoma) static loop scheduling # wall clock time speed-up old efficiency balance of h:mm:ss seconds over tah speed up/ new/old cpus 1 cpu omas #cpus exectuable other options 13 6:40 400 xx.xx x.xxx new v8 15 ws5.0guide3.8nohoard 13 16:49 1009 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoard 13 6:05 365 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoardxvect=no 13 6:33 393 xx.xx x.xxx new v8 15 ws6.0guide3.7nohoardxvect=no 23 4:23 263 xx.xx x.xxx new v8 15 ws5.0guide3.8nohoard 23 23:44 1424 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoard 23 3:59 239 xx.xx x.xxx new v8 15 ws6.0guide3.8hoard 23 4:03 243 xx.xx x.xxx new v8 15 ws6.0guide3.8nohoardxvect=no 23 4:23 263 xx.xx x.xxx new v8 15 ws6.0guide3.7nohoardxvect=no COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line (note: a version of the v2.12 code that did not have our FILECOMMAND mods ran slightly slower that our version and both had "csh -f") # OPT wall clock time speed up efficiency mmout csh Tahoma # of of lvl mm:ss seconds over speed up/ interval -f Factor runs cpus 1 cpu #cpus Tt / Tr 1 35:41 2,141 1.00 1.000 15 no 1 1 35:30 2,130 1.00 1.000 15 yes 4 1 35:13 2,113 1.00 1.000 60 yes 4 1 35:13 2,113 1.00 1.000 180 yes 1 4 O5 12:03 723 2.95 0.738 15 no 0.891 4 4 O5 11:05 665 3.21 0.803 15 yes 0.968 4 le speed up 1 O4 33:02 1,982 1.00 1.075 15 little_end 4 4 O4 10:20 620 3.20 1.039 15 little_end 4 1 O4 32:55 1,975 1.00 1.070 60 little_end 4 4 O4 10:16 616 3.21 1.011 60 little_end 4 4 O4 10:44 644 3.307 0.827 15 yes 1.000 4 4 O4 12:44 644**no speculate or pipelne**15 yes 1.000 4 4 O4 11:07 667 NCAR's flags 15 yes 0.966 4 4 O5 10:51 651 3.25 0.811 60 no 0.974 4 4 O5 10:51 651 3.25 0.811 60 yes 0.974 4 4 O4 10:23 623 3.39? 0.xxx 60 yes 1.018 4 4 O4 10:52 652 NCAR's flags 60 yes 0.972 1 4 O5 10:44 644 3.28 0.820 180 no 0.978 4 4 O5 10:42 642 3.29 0.823 180 yes 0.981 3 4 O4 10:14 614 3.44? 0.xxx 180 yes 1.026 4 ## end of 3-hour d1d2 simulations section ######################################################################## ######################################################################## ## 3-hour d3 simulations: Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:32:16 16,336 1.00 1.000 171.000 13 27:30 1,650 9.90 0.762 13.154 1650/1338 = 23% Sun E4500 400MHz/8MB$ (hydra) static loop scheduling # wall clock time speed up efficiency balance Tahomas est. WA+OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*2 cpus 1 cpu #cpus #cpus hours 1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75 13 22:18 1,338 12.21 0.939 13.154 1.731 7.43 Sun E6500 400MHz/8MB$/80MHz (buddy) # wall clock time speed up efficiency balance Tahomas est. WA+OR of h:mm:ss seconds over speed up/ 171 time*10*2 cpus 1 cpu #cpus #cpus hours 1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75 (assumed this time for 1 CPU) 13 23:18 1,398 11.69 0.899 13.154 1.657 7.77 19 16:29 989 15.52 0.869 9.000 2.342 5.49 25 12:30 750 21.78 0.871 6.840 3.088 4.17 29 10:56 656 24.90 0.859 5.897 3.530 3.64 original executable compiled on tahoma for 4MB cache for everything below Sun E4500 400MHz/8MB$ (hydra) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:34:02 16,442 1.00 1.000 171.000 13 23:02 1,382 11.90 0.915 13.154 Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:34:02 16,442 1.00 1.000 171.000 13 27:57 1,677 9.80 0.754 13.154 1677/1382 = 21% Sun E4500 336MHz/4MB$ (hayes) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 13 31:33 1,893 - - 13.154 Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 13 35:05 2,105 - - 13.154 2105/1893 = 11% Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 8:01:48 28,908 1.00 1.000 171.000 4 x:xx:xx xx,xxx x.xx 0.xxx 42.750 8 x:xx:xx xx,xxx x.xx 0.xxx 21.375 13 38:36 2,316 12.48 0.xxx 13.154 Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 7:29:27 26,967 1.00 1.000 171.000 4 2:01:48 7,308 3.69 0.923 42.750 8 1:03:39 3,819 7.06 0.883 21.375 13 42:01 2,521 10.67 0.823 13.154 ## end of 3-hour d3 simulations section ######################################################################## ## end of 3-hour d1d2 simulations section ######################################################################## ######################################################################## ## 2-hour d3 simulations Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp # wall clock time speed up efficiency mmout compiler # of of h:mm:ss seconds over speed up/ interval flags runs cpus 1 cpu #cpus 13 26:14 1574 ?.??? 0.xxx 15 ?.??? 3? 13 26:07 1567 ?.??? 0.xxx 60 ?.??? 2? COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp # OPT wall clock time speed up efficiency mmout Tahoma # of of lvl mm:ss seconds over speed up/ interval Factor runs cpus 1 cpu #cpus Tt / Tr 4 O4 25:41 1540 ?.??? 0.xxx 15 ?.??? 4 4 O4 24:44 1484 ?.??? 0.xxx 60 ?.??? 3? 4 O4 26:41 1601 ?.??? 0.xxx 180 ?.??? 2? ######################################################################## ## NCAR MM5 v2.12 benchmarks (4/6/200) from: ## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z ## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz ## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77. # wall clock time speed up efficiency # of comments of h:mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 19:50 1190 -- 1.000 4 4 5:31 331 3.410 0.852 4 8 3:11 191 6.23** 0.779** 4 **see note below 13 2:05 125 9.520** 0.732** 4 **I believe the run is too short to see our true speed up which is more like 0.87 efficiency.** Sun E4500 336MHz/4MB$ (hayes) static loop scheduling compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77. # wall clock time speed up efficiency # of comments of h:mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 14:39 879 1.000 1.000 4 4 4:05 245 3.588 0.897 4 8 2:22 142 6.190** 0.774** 4 **see note above 13 1:35 95 9.253** 0.711** 4 **see note above COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp # OPT wall clock time speed up efficiency # of comments of lvl mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 NCAR 4:47 287 1.000 1.000 4 4 NCAR 1:28 88 3.261 0.815 4 COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp # OPT wall clock time speed up efficiency # of comments of lvl mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 NCAR 5:19 319 1.000 1.000 5 ######################################################################## ## NCAR MM5 v2.12 benchmarks (4/6/200) ## run for twice as long Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp 1 40:07 2407 x.xxx 0.xxx 1 4 11:13 673 3.577 0.894 2 8 6:26 386 6.236 0.779 2 13 4:14 234 10.286 0.791 2 COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp 1 9:42 582 x.xxx 0.xxx 4 2 5:15 315 1.848 0.924 4 4 2:58 178 3.270 0.817 3