Data Structures and Data Formats
OUTLINE
Numerical Models (Introduction)
Data Formats Used in Models
Data Structures Grids
Data Structures Special Grids
Data Formats - Required Information
Required information Conclusions
Outlook
Lutz Rastδtter, CCMC Tuesday, April 9,
2002
Back to top
Numerical Models (Introduction)
Magnetosphere (+coupled ionosphere):
BATSRUS
Block Adaptive Tree Solarwind Roe
Upwind Scheme, T. Gombosi, U. Michigan
UCLA-GGCM UCLA Geospace General
Circulation Model, J. Raeder, UCLA
LFSM Lyon-Fedder-Slinker-Mobarry model, coupled magnetosphere ionosphere model (not
run at CCMC)
Ionosphere:
CTIP
Comprehensive Thermosphere Ionosphere
Plasmasphere model, T. Fuller-Rowell, NCAR
SAMI2
2D version of low-latitude ionospheric composition
model, J. Huba, NRL
Inner magnetosphere:
Fok
ring current model, M.C. Fok, GSFC (uses LFSM grid)
Back to top
Data Formats Used in Models (1):
F77-unformatted: BATSRUS 3D, CTIP, SAMI2:
+ fast-loading
+ portable among SUN,SP2 (IEEE Little Endian) and
Beowulfs (Big Endian, can be swapped
with compiler option)
+ can be read with C/C++ programs as well
-- not readily portable from CRAY (64 vs. 32-bit floats,
integers)
-- compiler flags affect format, e.g. f77 option
translated to Sun
"-xtypemap integer:mixed"
yields 4-byte integers padded to 8 bytes with arbitrary data (different from 8-byte integers)
Remarks:
Grid
size information usually supplied only in separate output files or hard-coded
into source code or include files.
Back to top
Data Formats Used in Models (2)
Formatted - compressed ASCII
(UCLA-GGCM):
+ compact, portable over platforms
+ header information gives variable
name and grid dimensions
-- slow-loading for large datasets
compared to binary, because of decoding
-- library needed to read
library provided with model, usable with programs written in Fortran, C, C++, IDL,
variable names and grid dimensions (from separate file) are obtained during read process.
Back to top
Data Formats Used in Models (3)
Formatted ASCII:
+ readable in
Fortran, C, C++
+ BATSRUS
ionosphere: contained grid information,
variable names and units
-- slow to read
Unformatted binary: e.g. direct dump of arrays (C, C++)
+ easy to read for
the same program executable
-- limited
accessibility for Fortran programs as binary data
need not be
framed by long integer data record length
(as in Fortran-unformatted).
Back to top
Data Structures - Grids (1)
3D Cartesian grids (magnetosphere data):
- block
structure: BATSRUS
- regular,
spacing varying: UCLA-GGCM
3D spherical grids (magnetosphere, plasmasphere,
ionosphere)
- spherical,
radius depending on theta, phi:
LFSM, Fok
CTIP 3D data, esp. Plasmasphere data
Back to top
Data Structures - Grids (2)
2D spherical:
grid of two angles: (co-)latitude , longitude
- equidistant,
covering entire globe: BATSRUS,
UCLA-CTIM, CTIP: height-integrated data)
grid of angle and height:
- vertical
slice: geographic latitude, height
non-regular
(height depends on latitude):
SAMI2
ionospheric model,
CTIP plasmasphere (in
meridional cut)
Back to top
Data Structures - Special Grids (1)
Unstructured grid output:
List of point locations
and variables: x,y,z, r,theta,phi,
P,V_x,...
- often found in ASCII outputs (good for inspection).
Data arrays: x(N),y(N),z(N),P(N), V_x(N)...
- may be converted to Cartesian grid or other grid types
(requires "extra knowledge").
Logically Cartesian grid output:
grid coordinate arrays sized NX,NY,NZ, (X,Y,Z, respectively), data arrays
sized N=NX*NY*NZ, "(NX,NY,NZ)".
arrays: x(NX),y(NY),z(NZ), Rho(N), P(N), ...
- regular
Cartesian, polar (cylindrical), spherical coordinates.
Back to top
Data Structures - Special Grids (2)
Block-based grid: BATSRUS
x(NX,Nblocks), y(NY,Nblocks),z(NZ,Nblocks)
data arrays
P(NX,NY,NZ,Nblocks),Vx(NX,NY,NZ,Nblocks),
Distorted spherical grid: Fok, (LFSM)
r(N1,N2),theta(N2),phi(N3), Bx(N1,N2,N3), By(N1,N2,N3),
Partially unstructured grid: SAMI2, CTIP
plasmasphere
glat(N),h(N), lon(NLon), N_H(N,NLon),...
no regular grid description with with glat(NLat),h(NZ)
possible glat and h
derive from coordinates s, l (N=NS*NL):
"dipole field line arc length" s(NS), "fieldline number"
l(Nlines)
Back to top
Data
Formats Required Information (1)
Eliminate guesswork Specify grid basics along
with data:
1) Identify grid type as Cartesian, Cylindrical, Spherical, etc.
and specify names of coordinates: "x","y","z","block",
or "r", "theta","phi",
2) Include grid dimensions: 1D: N1, ...
, 4D: N1,N2,N3,N4
3) Define special parameters such as
Earth radius, year, month,
day, time,
,
4) Identify coordinate system: x,y,z in GSE, GSM, or
SM,
GEI, as defined in model
coordinates,
5) Specify number of data arrays and names of
variables,
6) List variable units in SI,
7) Add data arrays.
Back to top
Data Formats Required Information (2)
1) List grid type and coordinate names in
first record and global grid dimensions in next record ("," is Fortran
record separator):
BATSRUS:
"Cartesian X Y Z Block", NX NY NZ NB,
2) Add number of dimensions, name and unit, dimensions for each
coordinate:
UCLA-GGCM: "Cartesian X Y Z",
NX NY NZ,
"1D X R_E", 1,
"1D Y R_E", 2 "1D Z R_E", 3,
Fok, (LFSM), with r(N1,N2),theta(N2),phi(N3):
"Spherical r theta
phi", N1 N2 N3,
"2D r R_E",1
2, "1D theta rad", 2. "1D phi deg" 3,
BATSRUS: "Cartesian X Y Z Block", NX NY NZ
NB,
"2D x R_E", 1
4, "2D y R_E", 2 4, "2D z R_E", 3 4,
Back to top
Data Formats Required Information (3)
3) Define special parameters:
"R_E gamma missing Year Month Day
Hour Min",
R_E g missing_data Year Month Day
hour minute,
4) Identify formula to convert grid coordinates
to X, Y, and Z
in SM, GSE,
GSM, or ..., coordinates:
UCLA-GGCM: "x_GSE=-x",
"y_GSE=-y","z_GSE=z",
Fok (LFSM) "x_SM
= r*cos(theta)*cos(phi)",
"y_SM = r*sin(theta)*sin(phi)",
"z_SM = r*sin(theta)*cos(phi)",
:
BATSRUS: "x_GSM=x", "y_GSM=y", "z_GSM=z",
Back to top
Data Formats Required Information (4)
5) List data: names should show whether
vector components are
in the curvilinear grid coordinates or in cartesian, identified
by step (4) (Names: variable+"_"+coordinate-name).
BATSRUS: "N B_x B_y B_z V_x V_y V_z T"
Fok (LFSM) for example: "B_r
B_theta B_phi"
6) List units of data (SI or similar):
"cm^-3 nT nT nT km/s km/s km/s K"
7) Add data arrays: den,bx,by,bz,vx,vy,vz,temp
Back to top
Required Information - Conclusions
Suggested format elements can be read sequentially in Fortran-90
(Fortran-77),
C/C++ (with OpenDx) or in IDL (Interactive Data Language, RSI).
Data types used: "char"-arrays ("strings"), 4-byte integers, and 4-byte "real" variables and arrays.
HDF (Hierarchical) or CDF
(Common Data Format) can
store
additional information and data types (binary formats).
- HDF5 has been used at CCMC with BATSRUS
and
UCLA-GGCM output data for faster access (w. OpenDx).
-- Format not finalized
-- Expansion of conversion to all types of
data/models planned.
- CDF is used by the NSSDC for a
large variety of data.
-- Has not been tested at CCMC.
- Data access via shared libraries with all
languages.
Back to top
Outlook
What additional items of information are
deemed essential for a self-contained output data file?
How can we minimize rescaling and
reorganizing of data for exchange between models and ingestion
into visualization?