11/29/1999 6-hour benchmarks with MPHYSTBL=1
  tahoma (benchmark 2.12 setup, 13 locked processors)
    15033.0u 18.0s 19:33 1282% 0+0k 0+0io 0pf+0w
    15134.0u 17.0s 19:41 1282% 0+0k 0+0io 0pf+0w
    15224.0u 18.0s 19:50 1280% 0+0k 0+0io 0pf+0w
    15250.0u 17.0s 19:53 1279% 0+0k 0+0io 0pf+0w
  = 15160.25u      1184.25 seconds
  rainier (writing to tahoma)
    4045.39u 13.28s 18:56 357% 611+935k 999+57412io 0pf+0w
    4036.51u 13.32s 18:51 357% 621+938k 546+57413io 0pf+0w
    4202.00u 16.91s 20:06 349% 620+938k 575+57410io 0pf+0w
    4082.65u 13.38s 19:01 358% 626+946k 554+57415io 0pf+0w
   =                1153.5 seconds   = 2.6% faster than tahoma
  rainier (local disks)		
    4155.78u 12.59s 18:02 385% 628+944k 4665+57714io 1pf+0w
    4042.11u 12.52s 17:29 386% 619+935k 10+57740io 0pf+0w
    4164.75u 12.42s 18:02 386% 621+939k 3002+57737io 0pf+0w
    4217.06u 13.16s 18:24 382% 626+946k 0+57763io 0pf+0w
   = 4144.925u      1079.25 seconds  
   ==> Wallclock speed is 8.86% faster than tahoma
   ==> single CPU =~ 3.66 * tahoma CPU
11/30/1999
 4 km simulation for 1999112912 run from fcst hour 12 to 36:
  tahoma
   111308.0u 272.0s 2:20:36 1322% 0+0k 0+0io 0pf+0w
  =                    8436 seconds
  rainier
    30524.667u 45.927s 2:13:05.07 -155.0% 782+479k 45092+74431io 0pf+0w
    30416.59u  46.25s  2:12:32 383% 803+492k 45+74346io 0pf+0w
   = 30470u            7968.5 seconds
   ==> Wallclock speed is 5.5% faster
   ==> single CPU =~ 
8/25/1999 6-hour tahoma 13-processors locked benchmarks:
  KMP_BLOCKTIME "24h"
  KMP_LIBRARY "turnaround"
  KMP_PARALLEL "13"
  KMP_STACKSIZE "8000000"
  OMP_NUM_THREADS "13"
  KMP_SCHEDULING "static" -- if yes below

 Dom  Static
      Sched  Guide  MM5    Time			       Date
 ---  -----  -----  ---	   --------------------------  ---------
 1,2  no     3.7    2.12   15253.0u 19.0s 19:51 1281%  benchmark
			   15764.0u 20.0s 20:33 1279%  benchmark
 1,2  yes    3.7    2.12   15204.0u 20.0s 19:48 1280%  benchmark
			   14793.0u 20.0s 19:18 1278%  benchmark


 Domain   Guide version  MM5    Time				Date
 1,2	  3.6		 v2.12	149431.0u 848.0s 3:06:54 1340%  1999062300
 1,2	  3.6		 v2.12			 3:03	 	1999062212
 1,2	  3.6		 v2.12		         3:01		1999062200
 1,2	  3.6		 v2.12			 2:59		1999062112
 1,2	  3.6		 v2.12                   2:54           1999062100 

 1,2	  3.6		 v2.12			 1:34:44	1999062300 24hr
 1,2	  3.6		 v2.12	 67035.0u 206.0s 1:26:10 1300%	1999062300 24hr
 1,2	  3.6		 v2.12	  race condition? even with KMP_BLOCKTIME "24h"
 1,2	  3.7		 v2.12	 68986.0u 205.0s 1:28:29 1303%	1999062300 24hr
 1,2	  3.7		 v2.12	 69299.0u 208.0s 1:29:02 1301%	1999062300 24hr

 1,2	  3.7		 v2.12	151778.0u 922.0s 3:08:03 1353%  1999062312
 1,2	  3.7		 v2.12	150913.0u 963.0s 3:05:23 1365%	1999062400
 1,2	  3.7		 v2.12	140783.0u 1087.0s 2:53:03 1366%	1999072212 **
 1,2	  3.7		 v2.12	157825.0u 1065.0s 3:15:00 1357% 1999071600 **
 1,2	  3.7		 v2.12	144987.0u 1087.0s 2:57:59 1367%	1999071100
 1,2	  3.7		 v2.12	148516.0u 1068.0s 3:02:51 1363%	1999071112
 1,2	  3.7		 v2.12	147185.0u 961.0s 3:02:10 1355%	1999071200


18-hour benchmarks:
 1,2	  3.7		 v2.12	45958.0u 62.0s 59:27 1289%	1999071600
				47752.0u 63.0s 1:01:51 1288
				44914.0u 59.0s 58:09 1288%
				44369.0u 62.0s 57:26 1288%

 1,2	  3.6		 v2.12	43726.0u 62.0s 56:33 1290%	1999071600
				43876.0u 61.0s 56:47 1289%
				44556.0u 62.0s 57:43 1288%
				46938.0u 62.0s 1:00:49 1287%		

   2-domain Guide 3.7 AVERAGES: 45748.3u 61.5s 59:13 1288%
   2-domain Guide 3.6 AVERAGES: 44773.3u 61.8u 57:58 1289%
   2-domain Guide 3.7 slowdown:     2.17%       2.16%	

 3	  3.7		 v2.12	48050.0u 28.0s 1:01:57 1293%	1999071600
				44070.0u 27.0s 56:45 1294%
				45988.0u 30.0s 59:26 1290%
				44365.0u 26.0s 57:17 1291%
 3	  3.6		 v2.12	44789.0u 26.0s 57:44 1293%	1999071600
				45030.0u 28.0s 58:00 1294%
				50943.0u 29.0s 1:05:37 1294%
				45528.0u 27.0s 58:50 1290%
   1-domain Guide 3.7 AVERAGES: 45618.3u 28.0s 58:51 1292%
   1-domain Guide 3.6 AVERAGES: 46572.5u 27.5s 60:02 1203%
   1-domain Guide 3.7  speedup:	  2.1%	        1.9%

 3	  3.6		 v2.12  119179.0u 260.0s 2:34:04 1291%	1999062300
 3	  3.6		 v2.12                   2:37           1999062212
 3	  3.6		 v2.12                   2:26           1999062200
 3	  3.7		 v2.12	114603.0u 243.0s 2:24:54 1320%  1999062312
 3	  3.7		 v2.12	109980.0u 210.0s 2:19:15 1318%	1999071100
 3	  3.7		 v2.12	113441.0u 215.0s 2:23:44 1317%	1999071112
 3	  3.7		 v2.12	109878.0u 213.0s 2:19:11 1318%	1999071200
 3	  3.7		 v2.12	108313.0u 219.0s 2:16:59 1320%	1999071300 **
----------------------------------------------------------------------------
MM5 2-Domain 6-hour benchmarks on tahoma with 13 of the 248 MHz chips locked
  The 2.12 benchmarks here were all MPHYSTBL = 0 (unless otherwise noted) and
      2.7  benchmarks here were all MPHYSTBL = 1 
----------------------------------------------------------------------------

	Special			   
	Compilation	
	for addrx1c	Guide	Fortran    Env.		
Code	and addrx1n	Version	Compiler   KMP_LIBRARY	Time(s)	  Comments
----	-----------	------- --------   -----------	-------	  --------
2.7	no		3.0	4.2	   turnaround	NA	  Bombs due to
							  	  f77 compiler 
								  error
2.7	yes		3.0	4.2        turnaround	20:38,	  
							20:41

2.7	yes		3.6	4.2	   throughput	20:36	  Guide 3.6 is
2.7	yes		3.6	4.2	   turnaround	20:20	  just as fast
								  as Guide 3.0

2.12	no		3.7	5.0	   turnaround		  MPHYSTBL=1
                                           15796.0u 20.0s 20:34 1281%,
					   15454.0u 21.0s 20:13 1275%,
					   15542.0u 17.0s 20:16 1279%
======= 2.12 all had MPHYSTBL = 0, while 2.7 had MPHYSTBL = 1 =====
=======     THIS WAS THE DIFFERENCE!!!!!!!          ===============
2.12	no		3.6	4.2	   throughput	NA	  Bombs due to
							  	  f77 compiler 
								  error
2.12	yes		3.6	4.2	   throughput	22:04,
							22:29,
							23:49
2.12	yes		3.6	4.2	   turnaround	22:11,
							24:36
2.12	yes		3.6	5.0	   throughput	23:06,
							23:37
2.12	yes		3.6	5.0	   turnaround	22:26,	  Fortran 5.0
							22:32,	  is no faster
							22:47	  than 4.2,
2.12	no		3.6	5.0	   throughput	22:46,	  however, the
							23:12	  compiler bug
2.12	no		3.6	5.0	   turnaround	22:50	  is fixed.

2.12*	yes		3.6	4.2	   throughput	22:33,	  Solve3 closer
							22:39	  in form to
								  2.7 version
---------------------- single processor jobs --------------------------------
Code        F77 Machine  Times
2.7	no  5.0	toniwha	 7151.90u 9.70s 2:05:01.03 95.4%
2.12	no  5.0	toniwha	 7151.51u 8.29s 1:59:48.51 99.6%   was it MPHYSTBL = 1?
2.7	no  5.0	blizzard 11866.26u 17.99s 3:22:54.09 97.6%
2.12	no  5.0	blizzard 12067.21u 18.41s 3:24:39.71 98.4%
2.12	no  5.0	tahoma13 15528.33u 22.84s 21:03.50 1230.8%
2.7     no  5.0	tahoma	 15343.71u 22.18s 20:49.93 1229.3%


---------------
Mar98 vs May 99
---------------
  2-Domain 3-hour benchmarks on hayes:

						 Number
2.7	      TP				 Locked  Inc.
Code  Disk    FQ Times				 Procs	 Bdy. Simultaneous Jobs
----  ------- -- ------------------------------- -----	 ---  ----------------
may99 hayes2  15 7105.48u 54.66s 9:44.71 1224.5%    0	 yes	--   
mar98 hayes2  15 7510.51u 42.31s 9:47.42 1285.7%    0	 no     --

may99 /tmp    15 6771.69u 47.23s 9:15.50 1227.5%   13	 yes  preprocess
may99 hayes2  15 6613.15u 38.57s 9:18.86 1190.2%   13	 yes  preprocess
mar98 hayes2  15 7498.85u 51.96s 9:33.08 1317.5%   13	 no   preprocess

may99 hayes2  60 6479.51u 26.39s 8:38.62 1254.4%   13	 yes  preprocess
mar98 hayes2  60 6625.63u 10.87s 8:43.58 1267.5%   13	 no   no postprocess

may99 hayes2  15 6613.36u 37.88s 9:04.43 1221.6%   13	 yes    --
mar98 hayes2  15 7265.40u 38.65s 9:21.38 1301.0%   13	 no	--

may99 hayes2  60 6609.38u 37.82s 8:34.26 1292.5%   13	 yes    --
mar98 hayes2  60 6946.99u 37.96s 8:44.47 1331.8%   13	 no	--

may99 /tmp*   15 7010.68u 52.17s 9:39.42 1218.9%   13	 yes  *RUNDIR only
may99 /tmp    15 6607.30u 34.39s 8:46.11 1262.4%   13	 yes    --
may99 /tmp    15 6850.34u 49.36s 9:21.38 1229.0%   13	 yes    --
may99 /tmp    15 6647.71u 35.93s 9:03.35 1230.0%   13	 yes	--
consistent_coloring turned on for the following:
may99 /tmp    15 6973.94u 47.95s 9:08.28 1280.7%   13	 yes	--
may99 /tmp    15 6842.98u 47.23s 8:56.32 1284.7%   13	 yes	--
may99 /tmp    15 6887.60u 46.26s 8:57.81 1289.2%   13	 yes	--
may99 hayes2  15 7022.90u 51.84s 9:37.53 1224.9%   13	 yes	--


------------------------------------------------------------------------
Setup     MMOUT Files    Average finish times
			  (local time, not
			   elapsed time)
-----	  -----------	 --------------------
mar98	  d3 f12	       9:00
may99	  d3 f12	      10:45
mar98	  d3 f24	  22:40 to  11:10
may99	  d3 f24	      12:20
mar98	  d3 f36	      12:50
may99	  d3 f36	      13:10  --> 20 minutes later, BUT the 
					 complete forecast is
					 30+ minutes faster.
mar98	  d2 f24	      11:00
may99	  d2 f24	       9:10        --> 1 hour 50 minutes faster
mar98	  d2 f36	      13:00
may99	  d2 f36	      10:00        --> 3 hours faster
mar98	  d2 f48	   1:40 to  14:02
may99	  d2 f48	     10:45         --> 3 hours faster !!

   Domains  Forecast      Approximate Finish Time         Time Change
    1,2	      24-hr	      9:10 am/pm	       -1 hr 50 min  (earlier)
    1,2	      36-hr	     10:00 am/pm	       -3 hrs	     (earlier)
    1,2	      48-hr	     10:45 am/pm	       -3 hrs	     (earlier)
     3	      12-hr	     10:45 am/pm	       +1 hr 45 min  (later)
     3	      24-hr	     12:00 am/pm	       +1 hr 20 min  (later)
     3	      36-hr	      1:10 am/pm	       +20 min	     (later)
    All	    complete	      1:10 am/pm	       -30 min	     (earlier)
------------------------------------------------------------------------

Benchmark of new 336 MHz chips (Hayes) vs 247 MHz chips (Tahoma):
Outfiles	Tahoma			Hayes
		finish (:elasped time)	finish (:elapsed time)
--------	------			-----
mmout_d1.f0	7:45			13:06
mmout_d1.f9	8:15 (:30)		13:31 (:25)
mmout_d2.f11	8:22 (:37)		13:36 (:30)
--
mmout_d1.f12	9:16			14:00
mmout_d3.f18	10:19 (:63)		14:48 (:48)
mmout_d3.f22	11:02 (:106)		15:21 (:81)
mmout_d1.f36	13:32 (:256)		17:16 (:196)
mmout_d1.f48	14:15 (:299)		17:51 (:231)

Summary:  full 36/12/4 run with 13 processors is approximately 29% faster
          on hayes' new chips.


dec99 real-time and ensemble typical runtimes
(I/O gtar means Ernie was running tape backups
that resulted in heavy I/O slowdowns for
/home/mm5rt rundirs):
----------------------------------------------
MM5RUNDIR  I/O    static	
	   gtar   sched	  ensm  tahoma		 rainier
---------  ----   ------  ----	---------------	 ----------------------
mm5rt	   no	  no	  no	3:20:23 (best)
mm5rt	   yes	  no	  no	3:32:25 (worst)

mm5rt	   no	  yes	  no	3:02:23 (best)
mm5rt	   yes	  yes	  no	3:29:28 (worst)
				
/tmp	   no     yes     no    2:56:22 (fast, dry day 2000012900)
/tmp	   no	  yes	  no	3:02:23 (fast, wet day 2000020100)
/tmp	   yes	  yes	  no	3:06:28 (one run)
/tmp	   no	  yes	  no	3:05:25 (one run)

ensemble runs:
mm5rt	   no	  no	  yes	3:12:24
/tmp	   no	  yes	  yes	2:46:25 (best, dry day 2000020400 NGM)
mm5rt	   no	  yes	  yes	2:48:21 (best)
mm5rt	   no	  yes	  yes	2:53:00 (avg)
mm5rt	   yes	  yes	  yes	3:05:32 (worst)
/tmp	   no	          yes			 2:50:09 (cmc 2000032900)
rmm5rt	   yes		  yes			 3:13:06 (best)
	   yes					 3:18:36 (worst)
                          yes			 3:24:05 (worst, Sunday 2000020700)

d3 simulations:
/tmp (-O4)                no                     6:11:34 (some pcpn, 032100)
/tmp (-O4)                no                     6:18:52 (wet day, 031900)
/tmp (full memory)                               6:37:04 (23,824 sec 4.6%)
rmm5rt	   no		  no			 6:47:23 (best, dry day 020400)
rmm5rt	   no		  no			 6:55:30 (avg)
rmm5rt	   no		  no			 7:02:00 (wet day, 020100) 
rmm5rt	   no		  no			 7:06:28 (worst)
/tmp	   yes		  no			 7:02:27 (dry day, first backup
						          
using /tmp)
rmm5rt	   yes		  no			 7:19:32 (worst)


formulas for calculating times of different domains (same physics
packages) on tahoma:

 y-grid pts * x-grid pts * levels * 36-km time step factor

 current 36      45
 101x137x32x1    81x110x22x(36/45)
 (67 minutes)     (24 minutes)
 +
 current 12      15
 88x88x32x3      70x70x22x3x(36/45)
 (113 minutes)    (39 minutes)
 =
 180 minutes     63 minutes


Best of 3 runs each case, except tahoma which was 2 (3/4 - 3/7/2000).
recompiled to inform compiler of 8MB cache

Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA+OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*2
cpus                        1 cpu      #cpus      #cpus              hours
  1    4:32:16    16,336     1.00      1.000     171.000   0.142     90.75
 13      22:18     1,338    12.21      0.939      13.154   1.731      7.43
Sun E6500 400MHz/8MB$/80MHz (buddy) 
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA+OR
 of    h:mm:ss    seconds    over    speed up/    171              time*10*2
cpus                        1 cpu      #cpus      #cpus              hours
  1    4:32:16    16,336     1.00      1.000     171.000   0.142     90.75
 (assumed this time for 1 CPU)
 13      23:18     1,398    11.69      0.899      13.154   1.657      7.77
 19      16:29       989    15.52      0.869       9.000   2.342      5.49
 25      12:30       750    21.78      0.871       6.840   3.088      4.17
 29      10:56       656    24.90      0.859       5.897   3.530      3.64

recompiled to inform compiler of 8MB cache
Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:32:16    16,336    1.00      1.000     171.000
 13      27:30     1,650    9.90      0.762      13.154

1650/1338 = 23%

original executable compiled on tahoma for 4MB cache for everything below
Sun E4500 400MHz/8MB$ (hydra) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:34:02    16,442    1.00      1.000     171.000
 13      23:02     1,382   11.90      0.915      13.154

Sun E4500 400MHz/8MB$ (hydra) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    4:34:02    16,442    1.00      1.000     171.000
 13      27:57     1,677    9.80      0.754      13.154
	  
1677/1382 = 21%
	  
Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
 13      31:33     1,893     -          -        13.154
	  
Sun E4500 336MHz/4MB$ (hayes) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
 13      35:05     2,105     -          -        13.154
	  	       
2105/1893 = 11%	       
	  	       
------	  
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    8:01:48   28,908     1.00      1.000     171.000
  4    x:xx:xx    xx,xxx    x.xx      0.xxx      42.750
  8    x:xx:xx    xx,xxx    x.xx      0.xxx      21.375
 13      38:36     2,316   12.48      0.xxx      13.154
	  	       
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    7:29:27    26,967    1.00      1.000     171.000
  4    2:01:48     7,308    3.69      0.923      42.750
  8    1:03:39     3,819    7.06      0.883      21.375
 13      42:01     2,521   10.67      0.823      13.154
	  	       
		       
tahoma 1-hour simulations of d3 (fastest times for 3 or more runs):
Sun E4000 250MHz/4MB$ (tahoma) dynamic loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    2:27:29    8,849     1.00      1.000     171.000
 13      14:13      853    10.37      0.798
       				
 853/765 = 11%			

Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling
  #     wall clock time    speed up  efficiency  balance   
 of    h:mm:ss    seconds    over    speed up/    171 j pts/     
cpus                        1 cpu      #cpus      #cpus    
  1    2:27:26    8,846     1.00      1.000     171.000
 13      12:45      765    11.56      0.889

  (static and dynamic loop scheduling give same results for COMPAQ)
  (OMP_NESTED and OMP_DYNAMIC make no difference (7 tests run 3/31/2000))
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave
  # OPT wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of lvl  mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1  O5  42:33    2,553     1.00      1.000     171.000     0.300    21:16:30
  4  O5  13:25      805     3.17      0.793      42.750     0.950     6:42:30
  4  O4  12:17      737     3.17      0.793      42.750     1.038     6:42:30
same as above, just different headings:
  # OPT wall clock time    speed up  efficiency   mmout    csh    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval   -f    Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  1  O5  42:33    2,553     1.00      1.000     171.000     no
  4  O5  13:25      805     3.17      0.793      42.750     no
  4  O4  12:17      737     3.17      0.793      42.750    yes    1.038     4
same csh -f turned on for these, comparing other opts
  # buff wall clock time    speed up  efficiency   mmout   cxml    Tahoma  # of
 of  io  mm:ss    seconds    over    speed up/   interval  math   Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  4  no  12:17      737     3.17      0.793      42.750     no    1.038     8
  4 yes  12:57      737     3.17      0.793      42.750     no    1.038     8
  4  no  12:25                                             yes              3
  4 yes  12:28      737     3.17      0.793      42.750    yes    1.038     8

more 1-hour d3 tests (3/28/2000):
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
  # OPT wall clock time    speed up  little       mmout    Tahoma   # of
 of lvl  mm:ss    seconds    over    endian      interval  Factor   runs
cpus                        1 cpu                          Tt / Tr       
  4  O4  12:17      737     3.17       no         15       1.038     8
  1  O4  38:26    2,306      --       yes         15       ?.???     4
  4  O4  12:08      728     3.17      yes         15       ?.???     4
  1  O4  38:27    2,307      --       yes         60       1.038     4
  4  O4  12:16      736     3.13      yes         60       ?.???     4

       				
==     				
       static and dynamic loop scheduling give same results
COMPAQ ES40 500MHz/4MB$ (rainier) full memory
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1      42:33*   2,553*    1.00      1.000     171.000     xx       21:16:30
  4      13:05      785     3.29      0.821      42.750     0.975     6:32:30
       								       
       								       
 (* est. since only 3 runs performed with fastest at 43:00)	       
       								       
       								       
** ESTIMATES **							       
COMPAQ ES40s 650MHz/4MB$ (estimate) each with full memory	       
  #     wall clock time    speed up  efficiency  balance  Tahomas  est. WA|OR
 of    h:mm:ss    seconds    over    speed up/    171 j pts/       time*10*3
cpus                        1 cpu      #cpus      #cpus              hours
  1      33:30    2,010     1.00      1.000     171.000     xx       16:45:00
  4      10:10      611     3.29      0.821      42.750     1.252     5:05:00
       								       
########################################################################
## 3-hour d1d2 simulations			    		       
       						    		       
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  #     wall clock time    speed up  efficiency   mmout	   compiler     # of
 of    h:mm:ss    seconds    over    speed up/   interval  flags       	runs
cpus                        1 cpu      #cpus        		       	    
  1    2:00:29    7,229     1.00      1.000         	   guide3.7       4 
  1    2:04:15    7,455                           15   	   "f77 -fast" 	  3
 13      10:44      644     11.22      0.863	  15	   guide3.7    	   
 13      10:34      634     11.40      0.877   	  60	   guide3.7    
 13      10:30      630     11.47      0.883	 180	   guide3.7    
       	       	       	     		    			       
COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
 csh -f column indicates if filecommand.csh had "#!/bin/csh -f" as 1st line
 (note: a version of the v2.12 code that did not have our FILECOMMAND  
        mods ran slightly slower that our version and both had "csh -f")
  # OPT wall clock time    speed up  efficiency   mmout    csh    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval   -f    Factor   runs
cpus                        1 cpu      #cpus                     Tt / Tr       
  1      35:41    2,141     1.00      1.000       15       no	       	    1  
  1      35:30    2,130     1.00      1.000       15      yes	       	    4  
  1      35:13    2,113     1.00      1.000       60      yes               4  
  1      35:13    2,113     1.00      1.000      180      yes	       	    1  
  4  O5  12:03      723     2.95      0.738       15   	   no     0.891	    4
  4  O5  11:05      665     3.21      0.803       15   	  yes     0.968	    4
                                   le speed up
  1  O4  33:02    1,982     1.00      1.075       15  little_end            4
  4  O4  10:20      620     3.20      1.039       15  little_end            4
  1  O4  32:55    1,975     1.00      1.070       60  little_end            4
  4  O4  10:16      616     3.21      1.011       60  little_end            4

  4  O4  10:44      644     3.307     0.827       15   	  yes     1.000	    4
  4  O4  12:44      644**no speculate or pipelne**15   	  yes     1.000	    4
  4  O4  11:07      667     NCAR's flags          15   	  yes     0.966	    4
  4  O5  10:51      651     3.25      0.811       60       no     0.974	    4
  4  O5  10:51      651     3.25      0.811       60      yes     0.974	    4
  4  O4  10:23      623     3.39?     0.xxx       60      yes     1.018	    4
  4  O4  10:52      652     NCAR's flags          60      yes     0.972	    1
  4  O5  10:44      644     3.28      0.820      180	   no     0.978	    4
  4  O5  10:42      642     3.29      0.823      180   	  yes     0.981	    3
  4  O4  10:14      614     3.44?     0.xxx      180   	  yes     1.026	    4

########################################################################
## 2-hour d3 simulations			    		       
       						    		       
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  #     wall clock time    speed up  efficiency   mmout	   compiler     # of
 of    h:mm:ss    seconds    over    speed up/   interval  flags       	runs
cpus                        1 cpu      #cpus        		       	    
 13      26:14     1574     ?.???     0.xxx       15       ?.???         3?
 13      26:07     1567     ?.???     0.xxx       60       ?.???         2?

COMPAQ ES40 500MHz/4MB$ (rainier) 2-way interleave running in /tmp     
  # OPT wall clock time    speed up  efficiency   mmout    Tahoma   # of
 of lvl  mm:ss    seconds    over    speed up/   interval  Factor   runs
cpus                        1 cpu      #cpus               Tt / Tr       
  4  O4  25:41     1540     ?.???     0.xxx       15       ?.???     4
  4  O4  24:44     1484     ?.???     0.xxx       60       ?.???     3?
  4  O4  26:41     1601     ?.???     0.xxx      180       ?.???     2?

########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) from:
## ftp://ftp.ucar.edu/mesouser/MM5V2/MM5/mm5v2.tar.Z
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/mminput_nh_data.tar.gz
## ftp://ftp.ucar.edu/mesouser/Data/SESAME/benchmark_config.tar.gz

Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
  #     wall clock time    speed up  efficiency    # of   comments
 of    h:mm:ss    seconds    over    speed up/       runs
cpus                        1 cpu      #cpus        		       	    
  1      19:50     1190      --       1.000          4
  4       5:31      331     3.410     0.852	     4
  8	  3:11	    191	    6.23**    0.779**	     4     **see note below
 13       2:05      125     9.520**   0.732**        4     **I believe the run is too
							     short to see our true speed
							     up which is more like 0.87
							     efficiency.**

Sun E4500 336MHz/4MB$ (hayes) static loop scheduling
compiled using Guide 3.7 f77 and Sun Workshop 5.0 f77.
  #     wall clock time    speed up  efficiency    # of   comments
 of    h:mm:ss    seconds    over    speed up/       runs
cpus                        1 cpu      #cpus        		       	    
  1      14:39      879     1.000     1.000          4
  4       4:05      245     3.588     0.897          4     
  8       2:22      142     6.190**   0.774**        4     **see note above
 13       1:35       95     9.253**   0.711**        4     **see note above

COMPAQ ES40 500MHz/4MB$ (EV6 chip) 2-way memory interleave running in /tmp     
  #   OPT  wall clock time   speed up  efficiency   # of   comments
 of   lvl   mm:ss    seconds  over    speed up/       runs
cpus                         1 cpu      #cpus      
  1   NCAR   4:47     287    1.000     1.000           4   
  4   NCAR   1:28      88    3.261     0.815           4   

COMPAQ DS10 466MHz/4MB$ (EV??) 1-way interleave running in /var/tmp     
  #   OPT  wall clock time   speed up  efficiency   # of   comments
 of   lvl   mm:ss    seconds  over    speed up/       runs
cpus                         1 cpu      #cpus      
  1   NCAR   5:19     319    1.000     1.000           5      

########################################################################
## NCAR MM5 v2.12 benchmarks (4/6/200) 
## run for twice as long
Sun E4000 250MHz/4MB$ (tahoma) static loop scheduling running in /tmp  
  1      40:07     2407     x.xxx     0.xxx          1
  4      11:13      673     3.577     0.894          2
  8       6:26      386     6.236     0.779          2
 13       4:14      234    10.286     0.791          2