| The Australian National Groundwater Data Transfer Standard | ||||||||||||||||||||||||
2.3 Components of the Australian National Groundwater Data Transfer StandardThe development of data structures for the transfer of core groundwater data has involved work across a spectrum of data management issues. The data standards that have been developed have a number of components including a data model, attribute domains, data conventions, standard units of measurement, data quality indicators and references to the data source. 2.3.1. The Entity - Relationship ModelA data model describes data and how it is organised. The key reason for focussing on the design of a data model is leverage, as a small change to the data model may have major ramifications to the overall information system (Simsion, 1994). The data model influences the structures of programs that deal with data (such as storage, update, query, reporting, analysis and display functions). Hence, a well designed data model will have practical consequences in making data management simpler and cheaper. A data model may be evaluated by several criteria:
These criteria can be conflicting, so the objective is to construct a data model that is a good compromise for these properties. The data model for basic groundwater data has been developed using Entity Relationship (ER) modelling techniques. These underlie the relational database management systems (RDBMS) most commonly used today. ER models are based on three concepts:
The conceptual data model is the representation of the characteristics of the universe of discourse by means of entities and entity relationships and attributes. Hence, the definition of a standard data model for core groundwater data involves the definition of fundamental entities and the relationships between these entities. Figure 2.1 is a graphical representation (an ER diagram) of these entities and their relationships. The rounded boxes are the fundamental entities defined for the standard data structure, and basically correspond to tables in a database. Some boxes are encompassed by larger boxes, as is the case for the sample entity. This is implying that because of data management issues an entity is described using more than one table (eg. sample and sample_property). The data model has been simplified by generalising entities as much as possible. So rather than have separate entities (tables) to individually describe groundwater measuring points such as bores, springs, drains and dug wells, these are all described as groundwater_features. Likewise, the generic construction_element is used to describe components of a bore such as casing, screens, gravel pack and grout seals. Within each box is the name of the entity in bold with the primary key for the entity underneath in lower case. The primary key is the attribute or combination of attributes which is used to uniquely define each instance or occurrence of the entity (ie. each record in a table). Hence the feature_identifier attribute is used to uniquely define each groundwater_feature entity (ie. a unique borehole number). Note that other attributes have been defined to describe properties for these entities but for brevity are not recorded on the ER diagram. Chapter 3 describes these entities and their attributes in greater detail. The lines between the boxes in Figure 2.1 represent the relationships between the entities. In this way, a record of information in one table can be related to the relevant record(s) of information in another. The endings of the lines represent the nature or cardinality of the relationship – the single line ending represents a one-type relationship while the multiple line ending (or crows foot) represents a many-type relationship. Take the line linking the groundwater_feature and status entities as an example. This line is showing that each groundwater_feature can have many statuses (eg. purpose is a type of status and a borehole can be used for irrigation, livestock supply and drinking water) and each record in the status table must relate to only one groundwater_feature. Some entities such as groundwater_feature have a line connecting to itself and this represents a self-referencing relationship. For instance a bore may be constructed to replace a previous one that had been abandoned. A relationship can be established to link the information about the new bore with information about the old one. There is also notation to describe the optionality of a relationship. The circle on the line axis represents an optional (may) relationship while the dash on the line represents a mandatory (must) relationship. Using the relationship between groundwater_feature and site as an example, a groundwater_feature may have one (and only one) site and a site must have one groundwater_feature assigned to it. The combination of entities and their relationships is an efficient way of enforcing the business rules surrounding the collection of groundwater data. The one-to-one relationship between groundwater_feature and site demands that a bore can only be represented as one point on a map, not two or three. A sample of groundwater is derived from only one groundwater_source within the feature. A bore can be used for many statuses. 2.3.2 Attribute DomainsAttributes are the named properties or the information that we want to know about entities and can have specific values (eg. ‘blue’ is a value for the attribute ‘colour’). The set of all possible values of an attribute is called a domain (eg. a list of colours). In the data model, attributes are classified into broad data types such as character or alphanumeric fields, floating point numbers, integers, dates and codes. Code-type attributes can only be populated using a domain of codes specified within the data model. The use of codes is one mechanism to enable the consistent and efficient transfer of groundwater data, and the generation of domains of allowable codes and their definitions has been a significant part of the development of the data model. The standard has comprehensive lists of codes for parameters such as chemical analytes, lithologies, minerals, colour and construction material. These domains of codes are stored in a table structure and are essentially treated as another entity in the data model (refer Figure 2.1). Hence, the code entity has the common attributes as defined in Table 2.3. Table 2.3 Attributes of the code entity
A code, typically a 3 to 5 letter mnemonic alphabetic string, is defined for each category of an attribute. For example, the string ‘ALUV’ is the code for the ‘alluvium’ category of the ‘lithology’ attribute. Many of the coding structures are hierarchical in nature, and a code may have a parent. This is the broader category that the code belongs to, for example ‘sediment’ (SED) is the parent of ‘alluvium’. Synonyms are the terms that are used to describe essentially the same category that is defined by the code, in this way ‘alluvial’ is a synonym for ‘alluvium’. A description of the category is also provided and additional properties of the category can be defined as qualifiers. For codes that describe grain size, qualifiers define the rock type (eg. igneous, sedimentary), the minimum grain size in millimetres and the maximum grain size in millimetres, that apply for each grain size category. The code domains are available in various digital formats, and include a list of the references that were used in their compilation. These domains and other relevant information can be found at: The Australian National Groundwater Data Transfer Standard Web Site
Figure 2.1 Entity Relationship (ER) Diagram for the standard groundwater data model 2.3.3 Data ConventionsConventions have been established for how certain data is represented, so as to avoid potential confusion and misuse. For example, standing water levels above the measuring point, as in artesian conditions relative to ground surface, are to be recorded as negative values. Water levels below the measuring point are to be recorded as positive values. The conventions defined for groundwater data are presented in Chapter 4. 2.3.4 Units of MeasurementMany attributes such as depth (metres), temperature (degrees Celsius) and hydraulic conductivity (metres/day) have been assigned a standard unit of measurement. Multiplication factors have been compiled to be used to provide consistency in how data is converted from other units of measurement. The Geocentric Datum of Australia (GDA94) has been adopted as the standard coordinate system to be used to describe location in the horizontal plane, with the Australian Height Datum (AHD) to be used as the reference system in the vertical plane. 2.3.5 Indicators of Data QualityFeedback from data users confirm the importance of indicators of data quality to be embedded into the dataset, to judge its appropriateness and how it should be analysed. The data model allows the opportunity for basic field measurements to be accompanied with information on aspects such as how they were measured, the nominal error margins, any correction factors and the ambient conditions. Modern database systems typically require the timing of events to be stored in a complete and specific date format (ie. day/month/year), although only the month or even only the year is known. Hence, a reliability attribute accompanies each date attribute in the data model, to indicate how the date should be treated. 2.3.6 Data SourceMany entities have attributes that help define the source of the data, so that a user can investigate the data further. For example, the geographical coordinates for a site can be accompanied by information such as the person who was responsible for determining the coordinates, their organisation, when the coordinates were determined and the relevant bibliographic reference, if the location was derived from a identifiable source such as a map, report or air photo.
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
|
|
Copyright
© 1999 Commonwealth
of Australia
Last updated 1
July 1999
contact brs-webmaster