| The Australian National Groundwater Data Transfer Standard | |
2.1 Data Integration IssuesMany data integration issues are confronted during this process of turning basic data into tertiary data (information), namely: Data Formats, as different digital formats waste time in the processing of data that has been received from various sources. Different data models, codes and structures mean greater effort in creating a consistent database, and increase the chance of data being misused and misinterpreted. This is accentuated by the multidisciplinary nature of hydrogeological investigations, where data about aspects such as land use, climate, surface hydrology and vegetation may need to be incorporated. Data Conventions, where different data suppliers use different conventions, which can easily confuse data users. For example, some databases express standing water levels above the ground surface (ie. artesian) as positive values, while other databases use negative values. Also many different numbers or characters are used to depict a null value. Data Spatial Extent, as different data providers may use different criteria to define their data collection boundaries. For example, topography may be obtained on a 1:250,000-scale mapsheet basis, but land use and vegetation may have only been mapped on a sub catchment basis. This can lead to gaps in your data coverage, or the need to stitch together data from different sources to obtain the coverage you require. Data can also be aggregated in different ways, causing difficulties when attempts of comparison are made. For example, water use data may only be available summarised for each postcode area, while license allocations are available for each cadastral block. Data Projection, is a big issue in spatial analysis as data has to be co-registered using the same projection. Critical projection details (such as central meridian, standard parallels) may not be recorded with some historical data, seriously reducing their usefulness. Even though data may be stored in geographical coordinates (latitude, longitude) the datum used, such as AGD 66 or WGS 84, may not be known. Also, the projection used should be appropriate for the analysis, for instance areal analysis requires an equal area projection. Data Spatial Accuracy and Scale, defines how well the stated location matches reality, and thereby how the data should be used. Where the availability of basic data is low and the level of interpretation is high, the error margins on the interpreted hydrogeological boundaries and contours may be measured in kilometres. However, these arcs may well be stored in the GIS using sub-metre precision. This is a problem as there are usually no technological constraints in a GIS to prevent data being viewed and analysed at a scale well beyond the capacity of the mapping. Typically, the error margins are not stored as an attribute of the arcs. Data Quality, where users of groundwater data need to have indicators of quality embedded into the dataset, to judge its appropriateness and how it should be analysed. Basic field measurements need to be accompanied with information on how they were measured, the nominal error margins and the ambient conditions. In many cases the precision at which the data is recorded may not be representative of the accuracy of the initial measurement. For example, a water level in a well estimated in feet by observation from the ground surface may undergo metric conversion and be stored in the database at millimetre precision. Many databases require the timing of events (eg. drilling completion, time of observation) to be stored in a complete and specific date format (ie. day/month/year), although only the month or even only the year is known. Data Lineage, is important considering the level of interpretation required in hydrogeological compilations. This becomes more and more critical as data develops from basic to tertiary. A user needs a history of the source and processing steps that were used to produce the dataset. For example, potentiometric contours may have been generated by using a kriging algorithm, and then edited, filtered and smoothed. This may well produce a completely different picture of the potentiometric surface from that derived by manually interpolating between the bore measurements. Data Access, if the cost of the data is beyond the resources of the study, then for all intents and purposes the data is non existent. This is also the case if data is not deemed to be in the public domain, or even if it is difficult to find out from whom to get the data. Data Currency, defines the time interval for which the data applies to. This will be critical if the phenomenon that is being mapped is dynamic, such as the potentiometric surface in areas with a rapidly rising watertable. In the case of time-variant data, you need to ensure that you are comparing data that relate to the same period of time. Data Completeness, is an issue when data collection is sporadic or irregular over the time period or important supporting data has not been collected or interpreted. This adds complexity to data analysis or limits the utility of the dataset. Common examples include:
Certainly, the availability of national data standards would assist in resolving some of these data integration issues. In addition, initiatives such as the ANZLIC metadata guidelines (ANZLIC, 1996) assist data users in gaining an understanding of the data that they receive and covers aspects such as data quality, lineage, currency and completeness.
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
© 1999 Commonwealth
of Australia
Last updated 1
July 1999
contact brs-webmaster