Data Structures and Data Formats

OUTLINE

Lutz Rastδtter, CCMC                                              Tuesday, April 9, 2002


Back to top

Numerical Models (Introduction)

Magnetosphere (+coupled ionosphere):

  BATSRUS – Block Adaptive Tree Solarwind Roe Upwind Scheme, T. Gombosi, U. Michigan

  UCLA-GGCM– UCLA Geospace General Circulation Model, J. Raeder, UCLA

  LFSM Lyon-Fedder-Slinker-Mobarry model, coupled magnetosphere ionosphere model (not run at CCMC)

Ionosphere:

  CTIP Comprehensive Thermosphere Ionosphere

          Plasmasphere model, T. Fuller-Rowell, NCAR

  SAMI2 2D version of low-latitude ionospheric composition

          model, J. Huba, NRL

Inner magnetosphere:

  Fok ring current model, M.C. Fok, GSFC (uses LFSM grid)


Back to top

Data Formats Used in Models (1):

F77-unformatted: BATSRUS 3D, CTIP, SAMI2:

+ fast-loading

+ portable among SUN,SP2 (IEEE Little Endian) and
   Beowulfs (Big Endian, can be swapped with compiler option)

+ can be read with C/C++ programs as well

-- not readily portable from CRAY (64 vs. 32-bit floats, integers)

-- compiler flags affect format, e.g. f77 option translated to Sun
"-xtypemap integer:mixed" yields 4-byte integers padded to 8 bytes with arbitrary data (different from 8-byte integers)

Remarks:

Grid size information usually supplied only in separate output files or hard-coded into source code or include files.


Back to top

Data Formats Used in Models (2)

Formatted - compressed ASCII (UCLA-GGCM):

+ compact, portable over platforms

+ header information gives variable name and grid dimensions

-- slow-loading for large datasets compared to binary, because of decoding

-- library needed to read

     library provided with model, usable with programs written in Fortran, C, C++, IDL,
 variable names and grid dimensions (from separate file) are obtained during read process.


Back to top

Data Formats Used in Models (3)

Formatted ASCII:

+ readable in Fortran, C, C++

+ BATSRUS ionosphere: contained grid information,
   variable names and units

-- slow to read

Unformatted binary:  e.g. direct dump of arrays (C, C++)

+ easy to read for the same program executable

-- limited accessibility for Fortran programs as binary data

    need not be framed by long integer data record length

    (as in Fortran-unformatted).


Back to top

Data Structures - Grids  (1)

3D Cartesian grids (magnetosphere data):

   - block structure: BATSRUS

   - regular, spacing varying: UCLA-GGCM

3D spherical grids (magnetosphere, plasmasphere, ionosphere)

   - spherical, radius depending on theta, phi:

LFSM, Fok

CTIP 3D data, esp. Plasmasphere data


Back to top

Data Structures - Grids  (2)

2D spherical:

grid of two angles: (co-)latitude , longitude

 - equidistant, covering entire globe: BATSRUS,
UCLA-CTIM, CTIP: height-integrated data)

grid of angle and height:

 - vertical slice: geographic latitude, height

   non-regular (height depends on latitude):

      SAMI2 ionospheric model,
  CTIP plasmasphere (in meridional cut)


Back to top

Data Structures - Special Grids (1)

Unstructured grid output:

List of point locations and variables:  x,y,z, r,theta,phi, P,V_x,...
- often found in ASCII outputs (good for inspection).

Data arrays:  x(N),y(N),z(N),P(N), V_x(N)...

   - may be converted to Cartesian grid or other grid types (requires "extra knowledge").

Logically Cartesian grid output:

   grid coordinate arrays sized NX,NY,NZ, (X,Y,Z, respectively), data arrays sized N=NX*NY*NZ, "(NX,NY,NZ)".
arrays: x(NX),y(NY),z(NZ), Rho(N), P(N), ...

   - regular Cartesian, polar (cylindrical), spherical coordinates.


Back to top

Data Structures - Special Grids (2)

Block-based grid: BATSRUS

     x(NX,Nblocks), y(NY,Nblocks),z(NZ,Nblocks)

     data arrays P(NX,NY,NZ,Nblocks),Vx(NX,NY,NZ,Nblocks),

Distorted spherical grid: Fok, (LFSM)

     r(N1,N2),theta(N2),phi(N3), Bx(N1,N2,N3), By(N1,N2,N3),

Partially unstructured grid: SAMI2, CTIP plasmasphere

     glat(N),h(N), lon(NLon), N_H(N,NLon),...

   no regular grid description with with glat(NLat),h(NZ) possible  glat and h derive from coordinates s, l (N=NS*NL):

    "dipole field line arc length" s(NS), "fieldline number" l(Nlines)


Back to top

Data Formats – Required Information (1)

Eliminate guesswork – Specify grid basics along with data:

1) Identify grid type as Cartesian, Cylindrical, Spherical, etc.
   
and specify names of coordinates:  "x","y","z","block",
    or   "r", "theta","phi",

2) Include grid dimensions: 1D: N1, ... , 4D: N1,N2,N3,N4

3) Define special parameters such as
    Earth radius, year, month, day, time, …,

4) Identify coordinate system: x,y,z in GSE, GSM, or SM,
    GEI, as defined in model coordinates,

5) Specify number of data arrays and names of variables,

6) List variable units in SI,

7) Add data arrays.


Back to top

Data Formats – Required Information (2)

1) List grid type and coordinate names in first record and global grid dimensions in next record ("," is Fortran record separator):

    BATSRUS: "Cartesian X Y Z Block", NX NY NZ NB,

2) Add number of dimensions, name and unit, dimensions for each coordinate:
UCLA-GGCM:  "Cartesian X Y Z", NX NY NZ, 
                 "1D X R_E", 1, "1D Y R_E", 2 "1D Z R_E", 3,
Fok, (LFSM), with r(N1,N2),theta(N2),phi(N3):
                 "Spherical r theta phi", N1 N2 N3,
                 "2D r R_E",1 2, "1D theta rad", 2. "1D phi deg"  3,

    BATSRUS:    "Cartesian X Y Z Block", NX NY NZ NB,
                 "2D x R_E", 1 4, "2D y R_E", 2 4, "2D z R_E", 3 4,


Back to top

Data Formats – Required Information (3)

3) Define special parameters:
       "R_E gamma missing Year Month Day Hour Min",
       R_E g missing_data Year Month Day hour minute,

4) Identify formula to convert grid coordinates to X, Y, and Z

   in SM, GSE, GSM, or ..., coordinates:

   UCLA-GGCM: "x_GSE=-x", "y_GSE=-y","z_GSE=z",

   Fok (LFSM) "x_SM = r*cos(theta)*cos(phi)",
                    "y_SM = r*sin(theta)*sin(phi)",
                    "z_SM = r*sin(theta)*cos(phi)", :

    BATSRUS:   "x_GSM=x",  "y_GSM=y",  "z_GSM=z",


Back to top

Data Formats – Required Information (4)

5) List data: names should show whether vector components are
in the curvilinear grid coordinates or in cartesian, identified
by step (4) (Names: variable+"_"+coordinate-name).

   BATSRUS:    "N B_x B_y B_z V_x V_y V_z T"

   Fok (LFSM) for example: "B_r B_theta B_phi"

6) List units of data (SI or similar):
    "cm^-3 nT nT nT km/s km/s km/s K"

7) Add data arrays: den,bx,by,bz,vx,vy,vz,temp


Back to top

Required Information - Conclusions

Suggested format elements can be read sequentially in Fortran-90

(Fortran-77), C/C++ (with OpenDx) or in IDL (Interactive Data Language, RSI).

Data types used: "char"-arrays ("strings"), 4-byte integers, and 4-byte "real" variables and arrays.

HDF (Hierarchical) or CDF (Common Data Format) can

store additional information and data types (binary formats).

  - HDF5 has been used at CCMC with BATSRUS and

    UCLA-GGCM output data for faster access (w. OpenDx).

    -- Format not finalized

    -- Expansion of conversion to all types of data/models planned.

  - CDF is used by the NSSDC for a large variety of data.

    -- Has not been tested at CCMC.

  - Data access via shared libraries with all languages.


Back to top

Outlook

What additional items of information are deemed essential for a self-contained output data file?

How can we minimize rescaling and reorganizing of data for exchange between models and ingestion into visualization?