11/29/1999 6-hour benchmarks with MPHYSTBL=1
tahoma (benchmark 2.12 setup, 13 locked processors)
15033.0u 18.0s 19:33 1282% 0+0k 0+0io 0pf+0w
15134.0u 17.0s 19:41 1282% 0+0k 0+0io 0pf+0w
15224.0u 18.0s 19:50 1280% 0+0k 0+0io 0pf+0w
15250.0u 17.0s 19:53 1279% 0+0k 0+0io 0pf+0w
= 15160.25u 1184.25 seconds
rainier (writing to tahoma)
4045.39u 13.28s 18:56 357% 611+935k 999+57412io 0pf+0w
4036.51u 13.32s 18:51 357% 621+938k 546+57413io 0pf+0w
4202.00u 16.91s 20:06 349% 620+938k 575+57410io 0pf+0w
4082.65u 13.38s 19:01 358% 626+946k 554+57415io 0pf+0w
= 1153.5 seconds = 2.6% faster than tahoma
rainier (local disks)
4155.78u 12.59s 18:02 385% 628+944k 4665+57714io 1pf+0w
4042.11u 12.52s 17:29 386% 619+935k 10+57740io 0pf+0w
4164.75u 12.42s 18:02 386% 621+939k 3002+57737io 0pf+0w
4217.06u 13.16s 18:24 382% 626+946k 0+57763io 0pf+0w
= 4144.925u 1079.25 seconds
==> Wallclock speed is 8.86% faster than tahoma
==> single CPU =~ 3.66 * tahoma CPU
11/30/1999
4 km simulation for 1999112912 run from fcst hour 12 to 36:
tahoma
111308.0u 272.0s 2:20:36 1322% 0+0k 0+0io 0pf+0w
= 8436 seconds
rainier
30524.667u 45.927s 2:13:05.07 -155.0% 782+479k 45092+74431io 0pf+0w
30416.59u 46.25s 2:12:32 383% 803+492k 45+74346io 0pf+0w
= 30470u 7968.5 seconds
==> Wallclock speed is 5.5% faster
==> single CPU =~
8/25/1999 6-hour tahoma 13-processors locked benchmarks:
KMP_BLOCKTIME "24h"
KMP_LIBRARY "turnaround"
KMP_PARALLEL "13"
KMP_STACKSIZE "8000000"
OMP_NUM_THREADS "13"
KMP_SCHEDULING "static" -- if yes below
Dom Static
Sched Guide MM5 Time Date
--- ----- ----- --- -------------------------- ---------
1,2 no 3.7 2.12 15253.0u 19.0s 19:51 1281% benchmark
15764.0u 20.0s 20:33 1279% benchmark
1,2 yes 3.7 2.12 15204.0u 20.0s 19:48 1280% benchmark
14793.0u 20.0s 19:18 1278% benchmark
Domain Guide version MM5 Time Date
1,2 3.6 v2.12 149431.0u 848.0s 3:06:54 1340% 1999062300
1,2 3.6 v2.12 3:03 1999062212
1,2 3.6 v2.12 3:01 1999062200
1,2 3.6 v2.12 2:59 1999062112
1,2 3.6 v2.12 2:54 1999062100
1,2 3.6 v2.12 1:34:44 1999062300 24hr
1,2 3.6 v2.12 67035.0u 206.0s 1:26:10 1300% 1999062300 24hr
1,2 3.6 v2.12 race condition? even with KMP_BLOCKTIME "24h"
1,2 3.7 v2.12 68986.0u 205.0s 1:28:29 1303% 1999062300 24hr
1,2 3.7 v2.12 69299.0u 208.0s 1:29:02 1301% 1999062300 24hr
1,2 3.7 v2.12 151778.0u 922.0s 3:08:03 1353% 1999062312
1,2 3.7 v2.12 150913.0u 963.0s 3:05:23 1365% 1999062400
1,2 3.7 v2.12 140783.0u 1087.0s 2:53:03 1366% 1999072212 **
1,2 3.7 v2.12 157825.0u 1065.0s 3:15:00 1357% 1999071600 **
1,2 3.7 v2.12 144987.0u 1087.0s 2:57:59 1367% 1999071100
1,2 3.7 v2.12 148516.0u 1068.0s 3:02:51 1363% 1999071112
1,2 3.7 v2.12 147185.0u 961.0s 3:02:10 1355% 1999071200
18-hour benchmarks:
1,2 3.7 v2.12 45958.0u 62.0s 59:27 1289% 1999071600
47752.0u 63.0s 1:01:51 1288
44914.0u 59.0s 58:09 1288%
44369.0u 62.0s 57:26 1288%
1,2 3.6 v2.12 43726.0u 62.0s 56:33 1290% 1999071600
43876.0u 61.0s 56:47 1289%
44556.0u 62.0s 57:43 1288%
46938.0u 62.0s 1:00:49 1287%
2-domain Guide 3.7 AVERAGES: 45748.3u 61.5s 59:13 1288%
2-domain Guide 3.6 AVERAGES: 44773.3u 61.8u 57:58 1289%
2-domain Guide 3.7 slowdown: 2.17% 2.16%
3 3.7 v2.12 48050.0u 28.0s 1:01:57 1293% 1999071600
44070.0u 27.0s 56:45 1294%
45988.0u 30.0s 59:26 1290%
44365.0u 26.0s 57:17 1291%
3 3.6 v2.12 44789.0u 26.0s 57:44 1293% 1999071600
45030.0u 28.0s 58:00 1294%
50943.0u 29.0s 1:05:37 1294%
45528.0u 27.0s 58:50 1290%
1-domain Guide 3.7 AVERAGES: 45618.3u 28.0s 58:51 1292%
1-domain Guide 3.6 AVERAGES: 46572.5u 27.5s 60:02 1203%
1-domain Guide 3.7 speedup: 2.1% 1.9%
3 3.6 v2.12 119179.0u 260.0s 2:34:04 1291% 1999062300
3 3.6 v2.12 2:37 1999062212
3 3.6 v2.12 2:26 1999062200
3 3.7 v2.12 114603.0u 243.0s 2:24:54 1320% 1999062312
3 3.7 v2.12 109980.0u 210.0s 2:19:15 1318% 1999071100
3 3.7 v2.12 113441.0u 215.0s 2:23:44 1317% 1999071112
3 3.7 v2.12 109878.0u 213.0s 2:19:11 1318% 1999071200
3 3.7 v2.12 108313.0u 219.0s 2:16:59 1320% 1999071300 **
----------------------------------------------------------------------------
MM5 2-Domain 6-hour benchmarks on tahoma with 13 of the 248 MHz chips locked
The 2.12 benchmarks here were all MPHYSTBL = 0 (unless otherwise noted) and
2.7 benchmarks here were all MPHYSTBL = 1
----------------------------------------------------------------------------
Special
Compilation
for addrx1c Guide Fortran Env.
Code and addrx1n Version Compiler KMP_LIBRARY Time(s) Comments
---- ----------- ------- -------- ----------- ------- --------
2.7 no 3.0 4.2 turnaround NA Bombs due to
f77 compiler
error
2.7 yes 3.0 4.2 turnaround 20:38,
20:41
2.7 yes 3.6 4.2 throughput 20:36 Guide 3.6 is
2.7 yes 3.6 4.2 turnaround 20:20 just as fast
as Guide 3.0
2.12 no 3.7 5.0 turnaround MPHYSTBL=1
15796.0u 20.0s 20:34 1281%,
15454.0u 21.0s 20:13 1275%,
15542.0u 17.0s 20:16 1279%
======= 2.12 all had MPHYSTBL = 0, while 2.7 had MPHYSTBL = 1 =====
======= THIS WAS THE DIFFERENCE!!!!!!! ===============
2.12 no 3.6 4.2 throughput NA Bombs due to
f77 compiler
error
2.12 yes 3.6 4.2 throughput 22:04,
22:29,
23:49
2.12 yes 3.6 4.2 turnaround 22:11,
24:36
2.12 yes 3.6 5.0 throughput 23:06,
23:37
2.12 yes 3.6 5.0 turnaround 22:26, Fortran 5.0
22:32, is no faster
22:47 than 4.2,
2.12 no 3.6 5.0 throughput 22:46, however, the
23:12 compiler bug
2.12 no 3.6 5.0 turnaround 22:50 is fixed.
2.12* yes 3.6 4.2 throughput 22:33, Solve3 closer
22:39 in form to
2.7 version
---------------------- single processor jobs --------------------------------
Code F77 Machine Times
2.7 no 5.0 toniwha 7151.90u 9.70s 2:05:01.03 95.4%
2.12 no 5.0 toniwha 7151.51u 8.29s 1:59:48.51 99.6% was it MPHYSTBL = 1?
2.7 no 5.0 blizzard 11866.26u 17.99s 3:22:54.09 97.6%
2.12 no 5.0 blizzard 12067.21u 18.41s 3:24:39.71 98.4%
2.12 no 5.0 tahoma13 15528.33u 22.84s 21:03.50 1230.8%
2.7 no 5.0 tahoma 15343.71u 22.18s 20:49.93 1229.3%
---------------
Mar98 vs May 99
---------------
2-Domain 3-hour benchmarks on hayes:
Number
2.7 TP Locked Inc.
Code Disk FQ Times Procs Bdy. Simultaneous Jobs
---- ------- -- ------------------------------- ----- --- ----------------
may99 hayes2 15 7105.48u 54.66s 9:44.71 1224.5% 0 yes --
mar98 hayes2 15 7510.51u 42.31s 9:47.42 1285.7% 0 no --
may99 /tmp 15 6771.69u 47.23s 9:15.50 1227.5% 13 yes preprocess
may99 hayes2 15 6613.15u 38.57s 9:18.86 1190.2% 13 yes preprocess
mar98 hayes2 15 7498.85u 51.96s 9:33.08 1317.5% 13 no preprocess
may99 hayes2 60 6479.51u 26.39s 8:38.62 1254.4% 13 yes preprocess
mar98 hayes2 60 6625.63u 10.87s 8:43.58 1267.5% 13 no no postprocess
may99 hayes2 15 6613.36u 37.88s 9:04.43 1221.6% 13 yes --
mar98 hayes2 15 7265.40u 38.65s 9:21.38 1301.0% 13 no --
may99 hayes2 60 6609.38u 37.82s 8:34.26 1292.5% 13 yes --
mar98 hayes2 60 6946.99u 37.96s 8:44.47 1331.8% 13 no --
may99 /tmp* 15 7010.68u 52.17s 9:39.42 1218.9% 13 yes *RUNDIR only
may99 /tmp 15 6607.30u 34.39s 8:46.11 1262.4% 13 yes --
may99 /tmp 15 6850.34u 49.36s 9:21.38 1229.0% 13 yes --
may99 /tmp 15 6647.71u 35.93s 9:03.35 1230.0% 13 yes --
consistent_coloring turned on for the following:
may99 /tmp 15 6973.94u 47.95s 9:08.28 1280.7% 13 yes --
may99 /tmp 15 6842.98u 47.23s 8:56.32 1284.7% 13 yes --
may99 /tmp 15 6887.60u 46.26s 8:57.81 1289.2% 13 yes --
may99 hayes2 15 7022.90u 51.84s 9:37.53 1224.9% 13 yes --
------------------------------------------------------------------------
Setup MMOUT Files Average finish times
(local time, not
elapsed time)
----- ----------- --------------------
mar98 d3 f12 9:00
may99 d3 f12 10:45
mar98 d3 f24 22:40 to 11:10
may99 d3 f24 12:20
mar98 d3 f36 12:50
may99 d3 f36 13:10 --> 20 minutes later, BUT the
complete forecast is
30+ minutes faster.
mar98 d2 f24 11:00
may99 d2 f24 9:10 --> 1 hour 50 minutes faster
mar98 d2 f36 13:00
may99 d2 f36 10:00 --> 3 hours faster
mar98 d2 f48 1:40 to 14:02
may99 d2 f48 10:45 --> 3 hours faster !!
Domains Forecast Approximate Finish Time Time Change
1,2 24-hr 9:10 am/pm -1 hr 50 min (earlier)
1,2 36-hr 10:00 am/pm -3 hrs (earlier)
1,2 48-hr 10:45 am/pm -3 hrs (earlier)
3 12-hr 10:45 am/pm +1 hr 45 min (later)
3 24-hr 12:00 am/pm +1 hr 20 min (later)
3 36-hr 1:10 am/pm +20 min (later)
All complete 1:10 am/pm -30 min (earlier)
------------------------------------------------------------------------
Benchmark of new 336 MHz chips (Hayes) vs 247 MHz chips (Tahoma):
Outfiles Tahoma Hayes
finish (:elasped time) finish (:elapsed time)
-------- ------ -----
mmout_d1.f0 7:45 13:06
mmout_d1.f9 8:15 (:30) 13:31 (:25)
mmout_d2.f11 8:22 (:37) 13:36 (:30)
--
mmout_d1.f12 9:16 14:00
mmout_d3.f18 10:19 (:63) 14:48 (:48)
mmout_d3.f22 11:02 (:106) 15:21 (:81)
mmout_d1.f36 13:32 (:256) 17:16 (:196)
mmout_d1.f48 14:15 (:299) 17:51 (:231)
Summary: full 36/12/4 run with 13 processors is approximately 29% faster
on hayes' new chips.
dec99 real-time and ensemble typical runtimes
(I/O gtar means Ernie was running tape backups
that resulted in heavy I/O slowdowns for
/home/mm5rt rundirs):
----------------------------------------------
MM5RUNDIR I/O static
gtar sched ensm tahoma rainier
--------- ---- ------ ---- --------------- ----------------------
mm5rt no no no 3:20:23 (best)
mm5rt yes no no 3:32:25 (worst)
mm5rt no yes no 3:02:23 (best)
mm5rt yes yes no 3:29:28 (worst)
/tmp no yes no 2:56:22 (fast, dry day 2000012900)
/tmp no yes no 3:02:23 (fast, wet day 2000020100)
/tmp yes yes no 3:06:28 (one run)
/tmp no yes no 3:05:25 (one run)
ensemble runs:
mm5rt no no yes 3:12:24
/tmp no yes yes 2:46:25 (best, dry day 2000020400 NGM)
mm5rt no yes yes 2:48:21 (best)
mm5rt no yes yes 2:53:00 (avg)
mm5rt yes yes yes 3:05:32 (worst)
/tmp no yes 2:50:09 (cmc 2000032900)
rmm5rt yes yes 3:13:06 (best)
yes 3:18:36 (worst)
yes 3:24:05 (worst, Sunday 2000020700)
d3 simulations:
/tmp (-O4) no 6:11:34 (some pcpn, 032100)
/tmp (-O4) no 6:18:52 (wet day, 031900)
/tmp (full memory) 6:37:04 (23,824 sec 4.6%)
rmm5rt no no 6:47:23 (best, dry day 020400)
rmm5rt no no 6:55:30 (avg)
rmm5rt no no 7:02:00 (wet day, 020100)
rmm5rt no no 7:06:28 (worst)
/tmp yes no 7:02:27 (dry day, first backup
using /tmp)
rmm5rt yes no 7:19:32 (worst)
formulas for calculating times of different domains (same physics
packages) on tahoma:
y-grid pts * x-grid pts * levels * 36-km time step factor
current 36 45
101x137x32x1 81x110x22x(36/45)
(67 minutes) (24 minutes)
+
current 12 15
88x88x32x3 70x70x22x3x(36/45)
(113 minutes) (39 minutes)
=
180 minutes 63 minutes
Best of 3 runs each case, except tahoma which was 2 (3/4 - 3/7/2000).
recompiled to inform compiler of 8MB cache
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
# wall clock time speed up efficiency balance Tahomas est. WA+OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*2
cpus 1 cpu #cpus #cpus hours
1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75
13 22:18 1,338 12.21 0.939 13.154 1.731 7.43
Sun E6500 400MHz/8MB$/80MHz (buddy)
# wall clock time speed up efficiency balance Tahomas est. WA+OR
of h:mm:ss seconds over speed up/ 171 time*10*2
cpus 1 cpu #cpus #cpus hours
1 4:32:16 16,336 1.00 1.000 171.000 0.142 90.75
(assumed this time for 1 CPU)
13 23:18 1,398 11.69 0.899 13.154 1.657 7.77
19 16:29 989 15.52 0.869 9.000 2.342 5.49
25 12:30 750 21.78 0.871 6.840 3.088 4.17
29 10:56 656 24.90 0.859 5.897 3.530 3.64
recompiled to inform compiler of 8MB cache
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:32:16 16,336 1.00 1.000 171.000
13 27:30 1,650 9.90 0.762 13.154
1650/1338 = 23%
original executable compiled on tahoma for 4MB cache for everything below
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:34:02 16,442 1.00 1.000 171.000
13 23:02 1,382 11.90 0.915 13.154
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 4:34:02 16,442 1.00 1.000 171.000
13 27:57 1,677 9.80 0.754 13.154
1677/1382 = 21%
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
13 31:33 1,893 - - 13.154
Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
13 35:05 2,105 - - 13.154
2105/1893 = 11%
------
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 8:01:48 28,908 1.00 1.000 171.000
4 x:xx:xx xx,xxx x.xx 0.xxx 42.750
8 x:xx:xx xx,xxx x.xx 0.xxx 21.375
13 38:36 2,316 12.48 0.xxx 13.154
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 7:29:27 26,967 1.00 1.000 171.000
4 2:01:48 7,308 3.69 0.923 42.750
8 1:03:39 3,819 7.06 0.883 21.375
13 42:01 2,521 10.67 0.823 13.154
tahoma 1-hour simulations of d3 (fastest times for 3 or more runs):
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 2:27:29 8,849 1.00 1.000 171.000
13 14:13 853 10.37 0.798
853/765 = 11%
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
# wall clock time speed up efficiency balance
of h:mm:ss seconds over speed up/ 171 j pts/
cpus 1 cpu #cpus #cpus
1 2:27:26 8,846 1.00 1.000 171.000
13 12:45 765 11.56 0.889
(static and dynamic loop scheduling give same results for COMPAQ)
(OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000))
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave
# OPT wall clock time speed up efficiency balance Tahomas est. WA|OR
of lvl mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 O5 42:33 2,553 1.00 1.000 171.000 0.300 21:16:30
4 O5 13:25 805 3.17 0.793 42.750 0.950 6:42:30
4 O4 12:17 737 3.17 0.793 42.750 1.038 6:42:30
same as above, just different headings:
# OPT wall clock time speed up efficiency mmout csh Tahoma # of
of lvl mm:ss seconds over speed up/ interval -f Factor runs
cpus 1 cpu #cpus Tt / Tr
1 O5 42:33 2,553 1.00 1.000 171.000 no
4 O5 13:25 805 3.17 0.793 42.750 no
4 O4 12:17 737 3.17 0.793 42.750 yes 1.038 4
same csh -f turned on for these, comparing other opts
# buff wall clock time speed up efficiency mmout cxml Tahoma # of
of io mm:ss seconds over speed up/ interval math Factor runs
cpus 1 cpu #cpus Tt / Tr
4 no 12:17 737 3.17 0.793 42.750 no 1.038 8
4 yes 12:57 737 3.17 0.793 42.750 no 1.038 8
4 no 12:25 yes 3
4 yes 12:28 737 3.17 0.793 42.750 yes 1.038 8
more 1-hour d3 tests (3/28/2000):
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
# OPT wall clock time speed up little mmout Tahoma # of
of lvl mm:ss seconds over endian interval Factor runs
cpus 1 cpu Tt / Tr
4 O4 12:17 737 3.17 no 15 1.038 8
1 O4 38:26 2,306 -- yes 15 ?.??? 4
4 O4 12:08 728 3.17 yes 15 ?.??? 4
1 O4 38:27 2,307 -- yes 60 1.038 4
4 O4 12:16 736 3.13 yes 60 ?.??? 4
==
static and dynamic loop scheduling give same results
COMPAQ ES40 500MHz/4MB$ (rainier) full memory
# wall clock time speed up efficiency balance Tahomas est. WA|OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 42:33* 2,553* 1.00 1.000 171.000 xx 21:16:30
4 13:05 785 3.29 0.821 42.750 0.975 6:32:30
(* est. since only 3 runs performed with fastest at 43:00)
** ESTIMATES **
COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory
# wall clock time speed up efficiency balance Tahomas est. WA|OR
of h:mm:ss seconds over speed up/ 171 j pts/ time*10*3
cpus 1 cpu #cpus #cpus hours
1 33:30 2,010 1.00 1.000 171.000 xx 16:45:00
4 10:10 611 3.29 0.821 42.750 1.252 5:05:00
########################################################################
## 3-hour d1d2 simulations
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
# wall clock time speed up efficiency mmout compiler # of
of h:mm:ss seconds over speed up/ interval flags runs
cpus 1 cpu #cpus
1 2:00:29 7,229 1.00 1.000 guide3.7 4
1 2:04:15 7,455 15 "f77 -fast" 3
13 10:44 644 11.22 0.863 15 guide3.7
13 10:34 634 11.40 0.877 60 guide3.7
13 10:30 630 11.47 0.883 180 guide3.7
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line
(note: a version of the v2.12 code that did not have our FILECOMMAND
mods ran slightly slower that our version and both had "csh -f")
# OPT wall clock time speed up efficiency mmout csh Tahoma # of
of lvl mm:ss seconds over speed up/ interval -f Factor runs
cpus 1 cpu #cpus Tt / Tr
1 35:41 2,141 1.00 1.000 15 no 1
1 35:30 2,130 1.00 1.000 15 yes 4
1 35:13 2,113 1.00 1.000 60 yes 4
1 35:13 2,113 1.00 1.000 180 yes 1
4 O5 12:03 723 2.95 0.738 15 no 0.891 4
4 O5 11:05 665 3.21 0.803 15 yes 0.968 4
le speed up
1 O4 33:02 1,982 1.00 1.075 15 little_end 4
4 O4 10:20 620 3.20 1.039 15 little_end 4
1 O4 32:55 1,975 1.00 1.070 60 little_end 4
4 O4 10:16 616 3.21 1.011 60 little_end 4
4 O4 10:44 644 3.307 0.827 15 yes 1.000 4
4 O4 12:44 644**no speculate or pipelne**15 yes 1.000 4
4 O4 11:07 667 NCAR's flags 15 yes 0.966 4
4 O5 10:51 651 3.25 0.811 60 no 0.974 4
4 O5 10:51 651 3.25 0.811 60 yes 0.974 4
4 O4 10:23 623 3.39? 0.xxx 60 yes 1.018 4
4 O4 10:52 652 NCAR's flags 60 yes 0.972 1
4 O5 10:44 644 3.28 0.820 180 no 0.978 4
4 O5 10:42 642 3.29 0.823 180 yes 0.981 3
4 O4 10:14 614 3.44? 0.xxx 180 yes 1.026 4
########################################################################
## 2-hour d3 simulations
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
# wall clock time speed up efficiency mmout compiler # of
of h:mm:ss seconds over speed up/ interval flags runs
cpus 1 cpu #cpus
13 26:14 1574 ?.??? 0.xxx 15 ?.??? 3?
13 26:07 1567 ?.??? 0.xxx 60 ?.??? 2?
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp
# OPT wall clock time speed up efficiency mmout Tahoma # of
of lvl mm:ss seconds over speed up/ interval Factor runs
cpus 1 cpu #cpus Tt / Tr
4 O4 25:41 1540 ?.??? 0.xxx 15 ?.??? 4
4 O4 24:44 1484 ?.??? 0.xxx 60 ?.??? 3?
4 O4 26:41 1601 ?.??? 0.xxx 180 ?.??? 2?
########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) from:
## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
# wall clock time speed up efficiency # of comments
of h:mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 19:50 1190 -- 1.000 4
4 5:31 331 3.410 0.852 4
8 3:11 191 6.23** 0.779** 4 **see note below
13 2:05 125 9.520** 0.732** 4 **I believe the run is too
short to see our true speed
up which is more like 0.87
efficiency.**
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
# wall clock time speed up efficiency # of comments
of h:mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 14:39 879 1.000 1.000 4
4 4:05 245 3.588 0.897 4
8 2:22 142 6.190** 0.774** 4 **see note above
13 1:35 95 9.253** 0.711** 4 **see note above
COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp
# OPT wall clock time speed up efficiency # of comments
of lvl mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 NCAR 4:47 287 1.000 1.000 4
4 NCAR 1:28 88 3.261 0.815 4
COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp
# OPT wall clock time speed up efficiency # of comments
of lvl mm:ss seconds over speed up/ runs
cpus 1 cpu #cpus
1 NCAR 5:19 319 1.000 1.000 5
########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200)
## run for twice as long
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp
1 40:07 2407 x.xxx 0.xxx 1
4 11:13 673 3.577 0.894 2
8 6:26 386 6.236 0.779 2
13 4:14 234 10.286 0.791 2