The Australian National Groundwater Data Transfer Standard

2.3 Components of the Australian National Groundwater Data Transfer Standard

The development of data structures for the transfer of core groundwater data has involved work across a spectrum of data management issues. The data standards that have been developed have a number of components including a data model, attribute domains, data conventions, standard units of measurement, data quality indicators and references to the data source.

2.3.1. The Entity - Relationship Model

A data model describes data and how it is organised. The key reason for focussing on the design of a data model is leverage, as a small change to the data model may have major ramifications to the overall information system (Simsion, 1994). The data model influences the structures of programs that deal with data (such as storage, update, query, reporting, analysis and display functions). Hence, a well designed data model will have practical consequences in making data management simpler and cheaper.

A data model may be evaluated by several criteria:

  1. Completeness, as the model needs to support all of the necessary data.
  2. Nonredundancy, so that data is only recorded in one location within the data model. Data redundancy increases storage costs and can create data inconsistencies.
  3. Enforcement of Business Rules, as the model needs to reflect the rules governing how data is collected (eg. a chemical analysis belongs to only one groundwater sample).
  4. Data Reusability, as the model needs to allow data to be used for purposes beyond those that the data was initially collected for.
  5. Stability and Flexibility, as the model has to be generic and flexible enough to cope with change. A model is stable if it does not need to be modified in the event of a change in requirements. A model is flexible if it can be readily extended to accommodate new requirements.
  6. Simplicity and Elegance, in that the data model provides a reasonably natural classification of data. Elegant models are inherently simple, consistent and easily described and summarised.
  7. Communication Effectiveness, because the model has to be an effective communication tool for data management (Simsion, 1994).

These criteria can be conflicting, so the objective is to construct a data model that is a good compromise for these properties.

The data model for basic groundwater data has been developed using Entity Relationship (ER) modelling techniques. These underlie the relational database management systems (RDBMS) most commonly used today. ER models are based on three concepts:

  1. Entities, the concrete or abstract things that exist, did exist or might exist (eg. a person, object, event, idea, process). The universe of discourse is all of the entities that are of interest in a particular context (eg. for groundwater studies).
  2. Attributes, the named properties or the information that we want to know about entities.
  3. Relationships, the perceived associations among entities (Simsion, 1994; ISO/IEC 2382-17, 1996).

The conceptual data model is the representation of the characteristics of the universe of discourse by means of entities and entity relationships and attributes. Hence, the definition of a standard data model for core groundwater data involves the definition of fundamental entities and the relationships between these entities.

Figure 2.1 is a graphical representation (an ER diagram) of these entities and their relationships. The rounded boxes are the fundamental entities defined for the standard data structure, and basically correspond to tables in a database. Some boxes are encompassed by larger boxes, as is the case for the sample entity. This is implying that because of data management issues an entity is described using more than one table (eg. sample and sample_property).

The data model has been simplified by generalising entities as much as possible. So rather than have separate entities (tables) to individually describe groundwater measuring points such as bores, springs, drains and dug wells, these are all described as groundwater_features. Likewise, the generic construction_element is used to describe components of a bore such as casing, screens, gravel pack and grout seals.

Within each box is the name of the entity in bold with the primary key for the entity underneath in lower case. The primary key is the attribute or combination of attributes which is used to uniquely define each instance or occurrence of the entity (ie. each record in a table). Hence the feature_identifier attribute is used to uniquely define each groundwater_feature entity (ie. a unique borehole number). Note that other attributes have been defined to describe properties for these entities but for brevity are not recorded on the ER diagram.

Chapter 3 describes these entities and their attributes in greater detail.

The lines between the boxes in Figure 2.1 represent the relationships between the entities. In this way, a record of information in one table can be related to the relevant record(s) of information in another. The endings of the lines represent the nature or cardinality of the relationship – the single line ending represents a one-type relationship while the multiple line ending (or crows foot) represents a many-type relationship. Take the line linking the groundwater_feature and status entities as an example. This line is showing that each groundwater_feature can have many statuses (eg. purpose is a type of status and a borehole can be used for irrigation, livestock supply and drinking water) and each record in the status table must relate to only one groundwater_feature. Some entities such as groundwater_feature have a line connecting to itself and this represents a self-referencing relationship. For instance a bore may be constructed to replace a previous one that had been abandoned. A relationship can be established to link the information about the new bore with information about the old one.

There is also notation to describe the optionality of a relationship. The circle on the line axis represents an optional (may) relationship while the dash on the line represents a mandatory (must) relationship. Using the relationship between groundwater_feature and site as an example, a groundwater_feature may have one (and only one) site and a site must have one groundwater_feature assigned to it.

The combination of entities and their relationships is an efficient way of enforcing the business rules surrounding the collection of groundwater data. The one-to-one relationship between groundwater_feature and site demands that a bore can only be represented as one point on a map, not two or three. A sample of groundwater is derived from only one groundwater_source within the feature. A bore can be used for many statuses.

2.3.2 Attribute Domains

Attributes are the named properties or the information that we want to know about entities and can have specific values (eg. ‘blue’ is a value for the attribute ‘colour’). The set of all possible values of an attribute is called a domain (eg. a list of colours). In the data model, attributes are classified into broad data types such as character or alphanumeric fields, floating point numbers, integers, dates and codes. Code-type attributes can only be populated using a domain of codes specified within the data model. The use of codes is one mechanism to enable the consistent and efficient transfer of groundwater data, and the generation of domains of allowable codes and their definitions has been a significant part of the development of the data model. The standard has comprehensive lists of codes for parameters such as chemical analytes, lithologies, minerals, colour and construction material.

These domains of codes are stored in a table structure and are essentially treated as another entity in the data model (refer Figure 2.1). Hence, the code entity has the common attributes as defined in Table 2.3.

Table 2.3 Attributes of the code entity

Name

Data Type

Description

attribute

character

attribute of groundwater feature, or component, that code is describing eg. lithology

code

character

code that is used to define a value of the attribute eg. ALUV for alluvium

parent

character

code used to describe the parent of the attribute value that is being described eg. SED for sediment

category

character

particular value of attributed that is defined by code eg. alluvium

synonyms

character

synonyms used for attribute value eg. alluvial

description

character

description for value of attribute

qualifier(s)

character

additional properties relating to attribute value

A code, typically a 3 to 5 letter mnemonic alphabetic string, is defined for each category of an attribute. For example, the string ‘ALUV’ is the code for the ‘alluvium’ category of the ‘lithology’ attribute. Many of the coding structures are hierarchical in nature, and a code may have a parent. This is the broader category that the code belongs to, for example ‘sediment’ (SED) is the parent of ‘alluvium’. Synonyms are the terms that are used to describe essentially the same category that is defined by the code, in this way ‘alluvial’ is a synonym for ‘alluvium’. A description of the category is also provided and additional properties of the category can be defined as qualifiers. For codes that describe grain size, qualifiers define the rock type (eg. igneous, sedimentary), the minimum grain size in millimetres and the maximum grain size in millimetres, that apply for each grain size category.

The code domains are available in various digital formats, and include a list of the references that were used in their compilation. These domains and other relevant information can be found at:

The Australian National Groundwater Data Transfer Standard Web Site

Figure 2.1 Entity Relationship (ER) Diagram for the standard groundwater data model

2.3.3 Data Conventions

Conventions have been established for how certain data is represented, so as to avoid potential confusion and misuse. For example, standing water levels above the measuring point, as in artesian conditions relative to ground surface, are to be recorded as negative values. Water levels below the measuring point are to be recorded as positive values.

The conventions defined for groundwater data are presented in Chapter 4.

2.3.4 Units of Measurement

Many attributes such as depth (metres), temperature (degrees Celsius) and hydraulic conductivity (metres/day) have been assigned a standard unit of measurement. Multiplication factors have been compiled to be used to provide consistency in how data is converted from other units of measurement. The Geocentric Datum of Australia (GDA94) has been adopted as the standard coordinate system to be used to describe location in the horizontal plane, with the Australian Height Datum (AHD) to be used as the reference system in the vertical plane.

2.3.5 Indicators of Data Quality

Feedback from data users confirm the importance of indicators of data quality to be embedded into the dataset, to judge its appropriateness and how it should be analysed. The data model allows the opportunity for basic field measurements to be accompanied with information on aspects such as how they were measured, the nominal error margins, any correction factors and the ambient conditions.

Modern database systems typically require the timing of events to be stored in a complete and specific date format (ie. day/month/year), although only the month or even only the year is known. Hence, a reliability attribute accompanies each date attribute in the data model, to indicate how the date should be treated.

2.3.6 Data Source

Many entities have attributes that help define the source of the data, so that a user can investigate the data further. For example, the geographical coordinates for a site can be accompanied by information such as the person who was responsible for determining the coordinates, their organisation, when the coordinates were determined and the relevant bibliographic reference, if the location was derived from a identifiable source such as a map, report or air photo.

 

table of contents | figures | tables | download


This site is maintained by the Bureau of Rural Sciences

Copyright © 1999 Commonwealth of Australia
Last updated 1 July 1999 contact brs-webmaster