11/29/1999 6-hour benchmarks with MPHYSTBL=1 tahoma (benchmark 2.12 setup, 13 locked processors) 15033.0u 18.0s 19:33 1282% 0+0k 0+0io 0pf+0w 15134.0u 17.0s 19:41 1282% 0+0k 0+0io 0pf+0w 15224.0u 18.0s 19:50 1280% 0+0k 0+0io 0pf+0w 15250.0u 17.0s 19:53 1279% 0+0k 0+0io 0pf+0w = 15160.25u 1184.25 seconds rainier (writing to tahoma) 4045.39u 13.28s 18:56 357% 611+935k 999+57412io 0pf+0w 4036.51u 13.32s 18:51 357% 621+938k 546+57413io 0pf+0w 4202.00u 16.91s 20:06 349% 620+938k 575+57410io 0pf+0w 4082.65u 13.38s 19:01 358% 626+946k 554+57415io 0pf+0w = 1153.5 seconds = 2.6% faster than tahoma rainier (local disks) 4155.78u 12.59s 18:02 385% 628+944k 4665+57714io 1pf+0w 4042.11u 12.52s 17:29 386% 619+935k 10+57740io 0pf+0w 4164.75u 12.42s 18:02 386% 621+939k 3002+57737io 0pf+0w 4217.06u 13.16s 18:24 382% 626+946k 0+57763io 0pf+0w = 4144.925u 1079.25 seconds ==> Wallclock speed is 8.86% faster than tahoma ==> single CPU =~ 3.66 * tahoma CPU 11/30/1999 4 km simulation for 1999112912 run from fcst hour 12 to 36: tahoma 111308.0u 272.0s 2:20:36 1322% 0+0k 0+0io 0pf+0w = 8436 seconds rainier 30524.667u 45.927s 2:13:05.07 -155.0% 782+479k 45092+74431io 0pf+0w 30416.59u 46.25s 2:12:32 383% 803+492k 45+74346io 0pf+0w = 30470u 7968.5 seconds ==> Wallclock speed is 5.5% faster ==> single CPU =~ 8/25/1999 6-hour tahoma 13-processors locked benchmarks: KMP_BLOCKTIME "24h" KMP_LIBRARY "turnaround" KMP_PARALLEL "13" KMP_STACKSIZE "8000000" OMP_NUM_THREADS "13" KMP_SCHEDULING "static" -- if yes below Dom Static Sched Guide MM5 Time Date --- ----- ----- --- -------------------------- --------- 1,2 no 3.7 2.12 15253.0u 19.0s 19:51 1281% benchmark 15764.0u 20.0s 20:33 1279% benchmark 1,2 yes 3.7 2.12 15204.0u 20.0s 19:48 1280% benchmark 14793.0u 20.0s 19:18 1278% benchmark Domain Guide version MM5 Time Date 1,2 3.6 v2.12 149431.0u 848.0s 3:06:54 1340% 1999062300 1,2 3.6 v2.12 3:03 1999062212 1,2 3.6 v2.12 3:01 1999062200 1,2 3.6 v2.12 2:59 1999062112 1,2 3.6 v2.12 2:54 1999062100 1,2 3.6 v2.12 1:34:44 1999062300 24hr 1,2 3.6 v2.12 67035.0u 206.0s 1:26:10 1300% 1999062300 24hr 1,2 3.6 v2.12 race condition? even with KMP_BLOCKTIME "24h" 1,2 3.7 v2.12 68986.0u 205.0s 1:28:29 1303% 1999062300 24hr 1,2 3.7 v2.12 69299.0u 208.0s 1:29:02 1301% 1999062300 24hr 1,2 3.7 v2.12 151778.0u 922.0s 3:08:03 1353% 1999062312 1,2 3.7 v2.12 150913.0u 963.0s 3:05:23 1365% 1999062400 1,2 3.7 v2.12 140783.0u 1087.0s 2:53:03 1366% 1999072212 ** 1,2 3.7 v2.12 157825.0u 1065.0s 3:15:00 1357% 1999071600 ** 1,2 3.7 v2.12 144987.0u 1087.0s 2:57:59 1367% 1999071100 1,2 3.7 v2.12 148516.0u 1068.0s 3:02:51 1363% 1999071112 1,2 3.7 v2.12 147185.0u 961.0s 3:02:10 1355% 1999071200 18-hour benchmarks: 1,2 3.7 v2.12 45958.0u 62.0s 59:27 1289% 1999071600 47752.0u 63.0s 1:01:51 1288 44914.0u 59.0s 58:09 1288% 44369.0u 62.0s 57:26 1288% 1,2 3.6 v2.12 43726.0u 62.0s 56:33 1290% 1999071600 43876.0u 61.0s 56:47 1289% 44556.0u 62.0s 57:43 1288% 46938.0u 62.0s 1:00:49 1287% 2-domain Guide 3.7 AVERAGES: 45748.3u 61.5s 59:13 1288% 2-domain Guide 3.6 AVERAGES: 44773.3u 61.8u 57:58 1289% 2-domain Guide 3.7 slowdown: 2.17% 2.16% 3 3.7 v2.12 48050.0u 28.0s 1:01:57 1293% 1999071600 44070.0u 27.0s 56:45 1294% 45988.0u 30.0s 59:26 1290% 44365.0u 26.0s 57:17 1291% 3 3.6 v2.12 44789.0u 26.0s 57:44 1293% 1999071600 45030.0u 28.0s 58:00 1294% 50943.0u 29.0s 1:05:37 1294% 45528.0u 27.0s 58:50 1290% 1-domain Guide 3.7 AVERAGES: 45618.3u 28.0s 58:51 1292% 1-domain Guide 3.6 AVERAGES: 46572.5u 27.5s 60:02 1203% 1-domain Guide 3.7 speedup: 2.1% 1.9% 3 3.6 v2.12 119179.0u 260.0s 2:34:04 1291% 1999062300 3 3.6 v2.12 2:37 1999062212 3 3.6 v2.12 2:26 1999062200 3 3.7 v2.12 114603.0u 243.0s 2:24:54 1320% 1999062312 3 3.7 v2.12 109980.0u 210.0s 2:19:15 1318% 1999071100 3 3.7 v2.12 113441.0u 215.0s 2:23:44 1317% 1999071112 3 3.7 v2.12 109878.0u 213.0s 2:19:11 1318% 1999071200 3 3.7 v2.12 108313.0u 219.0s 2:16:59 1320% 1999071300 ** ---------------------------------------------------------------------------- MM5 2-Domain 6-hour benchmarks on tahoma with 13 of the 248 MHz chips locked The 2.12 benchmarks here were all MPHYSTBL = 0 (unless otherwise noted) and 2.7 benchmarks here were all MPHYSTBL = 1 ---------------------------------------------------------------------------- Special Compilation for addrx1c Guide Fortran Env. Code and addrx1n Version Compiler KMP_LIBRARY Time(s) Comments ---- ----------- ------- -------- ----------- ------- -------- 2.7 no 3.0 4.2 turnaround NA Bombs due to f77 compiler error 2.7 yes 3.0 4.2 turnaround 20:38, 20:41 2.7 yes 3.6 4.2 throughput 20:36 Guide 3.6 is 2.7 yes 3.6 4.2 turnaround 20:20 just as fast as Guide 3.0 2.12 no 3.7 5.0 turnaround MPHYSTBL=1 15796.0u 20.0s 20:34 1281%, 15454.0u 21.0s 20:13 1275%, 15542.0u 17.0s 20:16 1279% ======= 2.12 all had MPHYSTBL = 0, while 2.7 had MPHYSTBL = 1 ===== ======= THIS WAS THE DIFFERENCE!!!!!!! =============== 2.12 no 3.6 4.2 throughput NA Bombs due to f77 compiler error 2.12 yes 3.6 4.2 throughput 22:04, 22:29, 23:49 2.12 yes 3.6 4.2 turnaround 22:11, 24:36 2.12 yes 3.6 5.0 throughput 23:06, 23:37 2.12 yes 3.6 5.0 turnaround 22:26, Fortran 5.0 22:32, is no faster 22:47 than 4.2, 2.12 no 3.6 5.0 throughput 22:46, however, the 23:12 compiler bug 2.12 no 3.6 5.0 turnaround 22:50 is fixed. 2.12* yes 3.6 4.2 throughput 22:33, Solve3 closer 22:39 in form to 2.7 version ---------------------- single processor jobs -------------------------------- Code F77 Machine Times 2.7 no 5.0 toniwha 7151.90u 9.70s 2:05:01.03 95.4% 2.12 no 5.0 toniwha 7151.51u 8.29s 1:59:48.51 99.6% was it MPHYSTBL = 1? 2.7 no 5.0 blizzard 11866.26u 17.99s 3:22:54.09 97.6% 2.12 no 5.0 blizzard 12067.21u 18.41s 3:24:39.71 98.4% 2.12 no 5.0 tahoma13 15528.33u 22.84s 21:03.50 1230.8% 2.7 no 5.0 tahoma 15343.71u 22.18s 20:49.93 1229.3% --------------- Mar98 vs May 99 --------------- 2-Domain 3-hour benchmarks on hayes: Number 2.7 TP Locked Inc. Code Disk FQ Times Procs Bdy. Simultaneous Jobs ---- ------- -- ------------------------------- ----- --- ---------------- may99 hayes2 15 7105.48u 54.66s 9:44.71 1224.5% 0 yes -- mar98 hayes2 15 7510.51u 42.31s 9:47.42 1285.7% 0 no -- may99 /tmp 15 6771.69u 47.23s 9:15.50 1227.5% 13 yes preprocess may99 hayes2 15 6613.15u 38.57s 9:18.86 1190.2% 13 yes preprocess mar98 hayes2 15 7498.85u 51.96s 9:33.08 1317.5% 13 no preprocess may99 hayes2 60 6479.51u 26.39s 8:38.62 1254.4% 13 yes preprocess mar98 hayes2 60 6625.63u 10.87s 8:43.58 1267.5% 13 no no postprocess may99 hayes2 15 6613.36u 37.88s 9:04.43 1221.6% 13 yes -- mar98 hayes2 15 7265.40u 38.65s 9:21.38 1301.0% 13 no -- may99 hayes2 60 6609.38u 37.82s 8:34.26 1292.5% 13 yes -- mar98 hayes2 60 6946.99u 37.96s 8:44.47 1331.8% 13 no -- may99 /tmp* 15 7010.68u 52.17s 9:39.42 1218.9% 13 yes *RUNDIR only may99 /tmp 15 6607.30u 34.39s 8:46.11 1262.4% 13 yes -- may99 /tmp 15 6850.34u 49.36s 9:21.38 1229.0% 13 yes -- may99 /tmp 15 6647.71u 35.93s 9:03.35 1230.0% 13 yes -- consistent_coloring turned on for the following: may99 /tmp 15 6973.94u 47.95s 9:08.28 1280.7% 13 yes -- may99 /tmp 15 6842.98u 47.23s 8:56.32 1284.7% 13 yes -- may99 /tmp 15 6887.60u 46.26s 8:57.81 1289.2% 13 yes -- may99 hayes2 15 7022.90u 51.84s 9:37.53 1224.9% 13 yes -- ------------------------------------------------------------------------ Setup MMOUT Files Average finish times (local time, not elapsed time) ----- ----------- -------------------- mar98 d3 f12 9:00 may99 d3 f12 10:45 mar98 d3 f24 22:40 to 11:10 may99 d3 f24 12:20 mar98 d3 f36 12:50 may99 d3 f36 13:10 --> 20 minutes later, BUT the complete forecast is 30+ minutes faster. mar98 d2 f24 11:00 may99 d2 f24 9:10 --> 1 hour 50 minutes faster mar98 d2 f36 13:00 may99 d2 f36 10:00 --> 3 hours faster mar98 d2 f48 1:40 to 14:02 may99 d2 f48 10:45 --> 3 hours faster !! Domains Forecast Approximate Finish Time Time Change 1,2 24-hr 9:10 am/pm -1 hr 50 min (earlier) 1,2 36-hr 10:00 am/pm -3 hrs (earlier) 1,2 48-hr 10:45 am/pm -3 hrs (earlier) 3 12-hr 10:45 am/pm +1 hr 45 min (later) 3 24-hr 12:00 am/pm +1 hr 20 min (later) 3 36-hr 1:10 am/pm +20 min (later) All complete 1:10 am/pm -30 min (earlier) ------------------------------------------------------------------------ Benchmark of new 336 MHz chips (Hayes) vs 247 MHz chips (Tahoma): Outfiles Tahoma Hayes finish (:elasped time) finish (:elapsed time) -------- ------ ----- mmout_d1.f0 7:45 13:06 mmout_d1.f9 8:15 (:30) 13:31 (:25) mmout_d2.f11 8:22 (:37) 13:36 (:30) -- mmout_d1.f12 9:16 14:00 mmout_d3.f18 10:19 (:63) 14:48 (:48) mmout_d3.f22 11:02 (:106) 15:21 (:81) mmout_d1.f36 13:32 (:256) 17:16 (:196) mmout_d1.f48 14:15 (:299) 17:51 (:231) Summary: full 36/12/4 run with 13 processors is approximately 29% faster on hayes' new chips. dec99 real-time and ensemble typical runtimes (I/O gtar means Ernie was running tape backups that resulted in heavy I/O slowdowns for /home/mm5rt rundirs): ---------------------------------------------- MM5RUNDIR I/O static gtar sched ensm tahoma rainier --------- ---- ------ ---- --------------- ---------------------- mm5rt no no no 3:20:23 (best) mm5rt yes no no 3:32:25 (worst) mm5rt no yes no 3:02:23 (best) mm5rt yes yes no 3:29:28 (worst) /tmp no yes no 2:56:22 (fast, dry day 2000012900) /tmp no yes no 3:02:23 (fast, wet day 2000020100) /tmp yes yes no 3:06:28 (one run) /tmp no yes no 3:05:25 (one run) ensemble runs: mm5rt no no yes 3:12:24 /tmp no yes yes 2:46:25 (best, dry day 2000020400 NGM) mm5rt no yes yes 2:48:21 (best) mm5rt no yes yes 2:53:00 (avg) mm5rt yes yes yes 3:05:32 (worst) /tmp no yes 2:50:09 (cmc 2000032900) rmm5rt yes yes 3:13:06 (best) yes 3:18:36 (worst) yes 3:24:05 (worst, Sunday 2000020700) d3 simulations: /tmp (-O4) no 6:11:34 (some pcpn, 032100) /tmp (-O4) no 6:18:52 (wet day, 031900) /tmp (full memory) 6:37:04 (23,824 sec 4.6%) rmm5rt no no 6:47:23 (best, dry day 020400) rmm5rt no no 6:55:30 (avg) rmm5rt no no 7:02:00 (wet day, 020100) rmm5rt no no 7:06:28 (worst) /tmp yes no 7:02:27 (dry day, first backup using /tmp) rmm5rt yes no 7:19:32 (worst) formulas for calculating times of different domains (same physics packages) on tahoma: y-grid pts * x-grid pts * levels * 36-km time step factor current 36 45 101x137x32x1 81x110x22x(36/45) (67 minutes) (24 minutes) + current 12 15 88x88x32x3 70x70x22x3x(36/45) (113 minutes) (39 minutes) = 180 minutes 63 minutes Best of 3 runs each case, except tahoma which was 2 (3/4 - 3/7/2000). recompiled to inform compiler of 8MB cache Sun E4500 400MHz/8MB$ (hydra) static loop scheduling # wall clock time speed up efficiency balance Tahomas est. WA+OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*2 cpus 1 cpu #cpus #cpus hours 1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75 13 22:18 1,338 12.21 0.939 13.154 1.731 7.43 Sun E6500 400MHz/8MB$/80MHz (buddy) # wall clock time speed up efficiency balance Tahomas est. WA+OR of h:mm:ss seconds over speed up/ 171 time*10*2 cpus 1 cpu #cpus #cpus hours 1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75 (assumed this time for 1 CPU) 13 23:18 1,398 11.69 0.899 13.154 1.657 7.77 19 16:29 989 15.52 0.869 9.000 2.342 5.49 25 12:30 750 21.78 0.871 6.840 3.088 4.17 29 10:56 656 24.90 0.859 5.897 3.530 3.64 recompiled to inform compiler of 8MB cache Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:32:16 16,336 1.00 1.000 171.000 13 27:30 1,650 9.90 0.762 13.154 1650/1338 = 23% original executable compiled on tahoma for 4MB cache for everything below Sun E4500 400MHz/8MB$ (hydra) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:34:02 16,442 1.00 1.000 171.000 13 23:02 1,382 11.90 0.915 13.154 Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 4:34:02 16,442 1.00 1.000 171.000 13 27:57 1,677 9.80 0.754 13.154 1677/1382 = 21% Sun E4500 336MHz/4MB$ (hayes) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 13 31:33 1,893 - - 13.154 Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 13 35:05 2,105 - - 13.154 2105/1893 = 11% ------ Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 8:01:48 28,908 1.00 1.000 171.000 4 x:xx:xx xx,xxx x.xx 0.xxx 42.750 8 x:xx:xx xx,xxx x.xx 0.xxx 21.375 13 38:36 2,316 12.48 0.xxx 13.154 Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 7:29:27 26,967 1.00 1.000 171.000 4 2:01:48 7,308 3.69 0.923 42.750 8 1:03:39 3,819 7.06 0.883 21.375 13 42:01 2,521 10.67 0.823 13.154 tahoma 1-hour simulations of d3 (fastest times for 3 or more runs): Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 2:27:29 8,849 1.00 1.000 171.000 13 14:13 853 10.37 0.798 853/765 = 11% Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling # wall clock time speed up efficiency balance of h:mm:ss seconds over speed up/ 171 j pts/ cpus 1 cpu #cpus #cpus 1 2:27:26 8,846 1.00 1.000 171.000 13 12:45 765 11.56 0.889 (static and dynamic loop scheduling give same results for COMPAQ) (OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000)) COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave # OPT wall clock time speed up efficiency balance Tahomas est. WA|OR of lvl mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 O5 42:33 2,553 1.00 1.000 171.000 0.300 21:16:30 4 O5 13:25 805 3.17 0.793 42.750 0.950 6:42:30 4 O4 12:17 737 3.17 0.793 42.750 1.038 6:42:30 same as above, just different headings: # OPT wall clock time speed up efficiency mmout csh Tahoma # of of lvl mm:ss seconds over speed up/ interval -f Factor runs cpus 1 cpu #cpus Tt / Tr 1 O5 42:33 2,553 1.00 1.000 171.000 no 4 O5 13:25 805 3.17 0.793 42.750 no 4 O4 12:17 737 3.17 0.793 42.750 yes 1.038 4 same csh -f turned on for these, comparing other opts # buff wall clock time speed up efficiency mmout cxml Tahoma # of of io mm:ss seconds over speed up/ interval math Factor runs cpus 1 cpu #cpus Tt / Tr 4 no 12:17 737 3.17 0.793 42.750 no 1.038 8 4 yes 12:57 737 3.17 0.793 42.750 no 1.038 8 4 no 12:25 yes 3 4 yes 12:28 737 3.17 0.793 42.750 yes 1.038 8 more 1-hour d3 tests (3/28/2000): COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp # OPT wall clock time speed up little mmout Tahoma # of of lvl mm:ss seconds over endian interval Factor runs cpus 1 cpu Tt / Tr 4 O4 12:17 737 3.17 no 15 1.038 8 1 O4 38:26 2,306 -- yes 15 ?.??? 4 4 O4 12:08 728 3.17 yes 15 ?.??? 4 1 O4 38:27 2,307 -- yes 60 1.038 4 4 O4 12:16 736 3.13 yes 60 ?.??? 4 == static and dynamic loop scheduling give same results COMPAQ ES40 500MHz/4MB$ (rainier) full memory # wall clock time speed up efficiency balance Tahomas est. WA|OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 42:33* 2,553* 1.00 1.000 171.000 xx 21:16:30 4 13:05 785 3.29 0.821 42.750 0.975 6:32:30 (* est. since only 3 runs performed with fastest at 43:00) ** ESTIMATES ** COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory # wall clock time speed up efficiency balance Tahomas est. WA|OR of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3 cpus 1 cpu #cpus #cpus hours 1 33:30 2,010 1.00 1.000 171.000 xx 16:45:00 4 10:10 611 3.29 0.821 42.750 1.252 5:05:00 ######################################################################## ## 3-hour d1d2 simulations Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp # wall clock time speed up efficiency mmout compiler # of of h:mm:ss seconds over speed up/ interval flags runs cpus 1 cpu #cpus 1 2:00:29 7,229 1.00 1.000 guide3.7 4 1 2:04:15 7,455 15 "f77 -fast" 3 13 10:44 644 11.22 0.863 15 guide3.7 13 10:34 634 11.40 0.877 60 guide3.7 13 10:30 630 11.47 0.883 180 guide3.7 COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line (note: a version of the v2.12 code that did not have our FILECOMMAND mods ran slightly slower that our version and both had "csh -f") # OPT wall clock time speed up efficiency mmout csh Tahoma # of of lvl mm:ss seconds over speed up/ interval -f Factor runs cpus 1 cpu #cpus Tt / Tr 1 35:41 2,141 1.00 1.000 15 no 1 1 35:30 2,130 1.00 1.000 15 yes 4 1 35:13 2,113 1.00 1.000 60 yes 4 1 35:13 2,113 1.00 1.000 180 yes 1 4 O5 12:03 723 2.95 0.738 15 no 0.891 4 4 O5 11:05 665 3.21 0.803 15 yes 0.968 4 le speed up 1 O4 33:02 1,982 1.00 1.075 15 little_end 4 4 O4 10:20 620 3.20 1.039 15 little_end 4 1 O4 32:55 1,975 1.00 1.070 60 little_end 4 4 O4 10:16 616 3.21 1.011 60 little_end 4 4 O4 10:44 644 3.307 0.827 15 yes 1.000 4 4 O4 12:44 644**no speculate or pipelne**15 yes 1.000 4 4 O4 11:07 667 NCAR's flags 15 yes 0.966 4 4 O5 10:51 651 3.25 0.811 60 no 0.974 4 4 O5 10:51 651 3.25 0.811 60 yes 0.974 4 4 O4 10:23 623 3.39? 0.xxx 60 yes 1.018 4 4 O4 10:52 652 NCAR's flags 60 yes 0.972 1 4 O5 10:44 644 3.28 0.820 180 no 0.978 4 4 O5 10:42 642 3.29 0.823 180 yes 0.981 3 4 O4 10:14 614 3.44? 0.xxx 180 yes 1.026 4 ######################################################################## ## 2-hour d3 simulations Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp # wall clock time speed up efficiency mmout compiler # of of h:mm:ss seconds over speed up/ interval flags runs cpus 1 cpu #cpus 13 26:14 1574 ?.??? 0.xxx 15 ?.??? 3? 13 26:07 1567 ?.??? 0.xxx 60 ?.??? 2? COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp # OPT wall clock time speed up efficiency mmout Tahoma # of of lvl mm:ss seconds over speed up/ interval Factor runs cpus 1 cpu #cpus Tt / Tr 4 O4 25:41 1540 ?.??? 0.xxx 15 ?.??? 4 4 O4 24:44 1484 ?.??? 0.xxx 60 ?.??? 3? 4 O4 26:41 1601 ?.??? 0.xxx 180 ?.??? 2? ######################################################################## ## NCAR MM5 v2.12 benchmarks (4/6/200) from: ## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z ## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz ## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77. # wall clock time speed up efficiency # of comments of h:mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 19:50 1190 -- 1.000 4 4 5:31 331 3.410 0.852 4 8 3:11 191 6.23** 0.779** 4 **see note below 13 2:05 125 9.520** 0.732** 4 **I believe the run is too short to see our true speed up which is more like 0.87 efficiency.** Sun E4500 336MHz/4MB$ (hayes) static loop scheduling compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77. # wall clock time speed up efficiency # of comments of h:mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 14:39 879 1.000 1.000 4 4 4:05 245 3.588 0.897 4 8 2:22 142 6.190** 0.774** 4 **see note above 13 1:35 95 9.253** 0.711** 4 **see note above COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp # OPT wall clock time speed up efficiency # of comments of lvl mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 NCAR 4:47 287 1.000 1.000 4 4 NCAR 1:28 88 3.261 0.815 4 COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp # OPT wall clock time speed up efficiency # of comments of lvl mm:ss seconds over speed up/ runs cpus 1 cpu #cpus 1 NCAR 5:19 319 1.000 1.000 5 ######################################################################## ## NCAR MM5 v2.12 benchmarks (4/6/200) ## run for twice as long Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp 1 40:07 2407 x.xxx 0.xxx 1 4 11:13 673 3.577 0.894 2 8 6:26 386 6.236 0.779 2 13 4:14 234 10.286 0.791 2