Help: availability v.1

Description

The irisws-availability web service returns detailed time span information of what timeseries data is available at the DMC archive.

There are two service query methods:

/extent

Produces lists of available time extents (earliest to latest) for selected channels (network, station, location and quality) and time ranges.

/query

Produces lists of contiguous time spans for selected channels (network, station, location, channel and quality) and time ranges.

Help Contents

Sample queries

/extent Sample queries

Extent information for all network IU, station ANMO channels in text format (default)
http://service.iris.edu/irisws/availability/1/extent?network=IU&station=ANMO

Extent information for all network IU, station ANMO channels in text format (default) in a given time range
http://service.iris.edu/irisws/availability/1/extent?network=IU&station=ANMO&start=2010-02-22T12:33:12&end=2010-02-22T12:40:02

Extent information for all network IU, station ANMO channels in JSON format
http://service.iris.edu/irisws/availability/1/extent?network=IU&station=ANMO&format=json

Extent information for all network IU, sorted by number of time-spans descending, limited to 100 rows
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=timespancount_desc&rowlimit=100

Any channel that has more that has more than 1,000,000 timespans cannot be processed by the /query method. This will reveal which channels cannot be processed (ie those with more than 1 million timespans)
http://service.iris.edu/irisws/availability/1/extent?network=*&orderby=timespancount_desc&rowlimit=500

Extent information for all network IU, sorted by update-date, limited to 100 rows
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=latestupdate&rowlimit=100

Extent information for all network IU between two dates with qualities and sample rates merged.
http://service.iris.edu/irisws/availability/1/extent?network=IU&start=2011-03-11&end=2011-03-12&mergequality=true&mergesamplerate=true

Restriction information for all network AF in text format. Only non-restricted data displayed, so all are OPEN.
http://service.iris.edu/irisws/availability/1/extent?network=AF&show=restriction

Same as previous, but includes restricted data
http://service.iris.edu/irisws/availability/1/extent?network=AF&show=restriction&includerestricted=true

/query Sample queries

Demonstrations of wildcard and multiple selections via CSV (comma separated values)

All BH channels for a station
http://service.iris.edu/irisws/availability/1/query?start=2010-02-23&end=2011-02-23&network=IU&station=ANMO&channel=BH?

Network IU, stations ANMO, and BILL location 00 and BH1 and BHE channels
http://service.iris.edu/irisws/availability/1/query?start=2010-02-23&end=2010-04-20&network=IU&location=00&station=ANMO,BILL&channel=BH1,BHE

Note: the , (comma) and ? (question mark) characters may be displayed as %2C and %3F after you click on the previous two links.

Demonstrations of merging

Channel with changing sample rates
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1

Same as previous with sample rates merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true

Same as previous with overlaps overlaps merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true&mergeoverlap=true

Same as previous with gaps of one day or less merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true&mergeoverlap=true&mergetolerance=86400.0

Demonstration of memory limitation behavior

Two queries demonstrating behavior when too many timespans are present for processing. (See Memory Limitations for more information.)

This query reports no data available because the selected station contains too many timespans:
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404

Identical query, but with excludetoolarge=false explicitely set.
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404&excludetoolarge=true

This query reports Error 413 (request too large) because of excludetoolarge=false
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404&excludetoolarge=false

HTTP POST queries

/extent and /query methods can be accessed via HTTP POST. All of the parameters that can be submitted with the GET method are allowed in POST.

The general form of a POST is parameter=value pairs, one per line, followed by an arbitrary number of channel and, optionally, time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
Net Sta Loc Chan [StartTime EndTime]
Net Sta Loc Chan [StartTime EndTime]
...

Start time and end times can be specified globally, such as:

...
start=2011-03-11T00:05:46
end=2011-03-11T00:06:46
IU ANMO 00 BHZ
IU ANMO 00 BH1
...

or per line:

...
IU ANMO 00 BHZ 2011-03-11T00:05:46 2011-03-11T00:06:46
IU ANMO 00 BH1 2011-03-11T00:05:46 2011-03-11T00:06:46
...

If not given, the start and end times default to the fully available time range. Additionally, global time ranges can be mixed with individual time ranges.

Using individual time ranges per line allows for multiple time window selection. For example:

...
IU ANMO 00 BHZ 2004-12-26T00:00:58 2004-12-26T00:01:58
IU ANMO 00 BHZ 2011-03-11T00:05:46 2011-03-11T00:06:46
...

Example POST body:

$ cat availability.request
mergequality=true
mergesamplerate=true
format=text
TA A25A -- BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO * BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO 10 HHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
II KURK 00 BH? 2010-03-25T00:00:00 2010-04-01T00:00:00

This example contains parameters common to both /extent and /query methods.

Submitting POST request files via wget and curl

Requests can be made with a selection file using either the wget or curl Unix command line utilities. The commands below will POST the selection file to the server and save the results in a text files

$ wget --post-file=availability.request -O availability.txt http://service.iris.edu/irisws/availability/1/query
$ curl -L --data-binary @availability.request -o availability.txt http://service.iris.edu/irisws/availability/1/query
$ wget --post-file=availability.request -O extents.txt http://service.iris.edu/irisws/availability/1/extent
$ curl -L --data-binary @availability.request -o extents.txt http://service.iris.edu/irisws/availability/1/extent

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.

When using curl, you may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted. See http://curl.haxx.se/docs/manpage.html for more information.

Virtual Network Support

The irisws-availability service supports the selection of virtual networks . The list of current virtual networks can be viewed with the IRIS DMC MetaData Aggregator. This information can be also queried with the Virtualnetwork web service.

Virtual networks contain groupings of stations from different networks. Virtual network names start is the underscore character (_) and are not limited to two characters as are regular network names. For example _GSN.

In addition to logically grouping stations, virtual networks also impose implicit time ranges on stations. Theses time ranges can vary between stations.

When virtual networks are specified in queries to the irisws-availability service, the implicit station level time windows are applied to availability information.

It is generally not a good idea to mix queries from different virtual networks or virtual networks and regular networks as the application of the implicit station level time windows can become quite confusing!

Example Query:
The following query shows extent information for the entire _GSN virtual network for all BHZ channels with merged qualities and sample rates: http://service.iris.edu/irisws/availability/1/extent?net=_GSN&cha=BHZ&mergequality=true&mergesamplerate=true

Restricted Data Support

A small percentage of time series data held at the IRIS DMC is restricted . To access restricted time series data requires email and password credentials (see for example Accessing restricted data).

A confusing aspect of data restriction is that it only restricts the access of time series data and not the access of meta-data such as the information returned from the irisws-availability service. The irisws-availability service can reveal the availability of any time series data whether authentication is used or not. The service can be used to determine which data requires authentication and additionally allows for determining what data is available when authenticated with a valid email and password.

Intervals of time series data can be considered to be in one of three states:

  • OPEN No intervals require authentication to access
  • RESTRICTED All intervals require authentication to access
  • PARTIAL Some, but not all, intervals require authentication to access.

By default, the irisws-availability service only returns information about data which is OPEN. The includerestricted=<true|false> parameter which is common to both /query and /extent methods controls whether availability information about restricted data is also returned. The default value is false, meaning only information about OPEN data is returned. If includerestricted=true is specified both restricted on non-restricted information will be returned.

The /extent method accepts a show=restriction parameter/value. If specified, the restriction status of data is returned (OPEN,RESTRICTED,PARTIAL). Note that by default, only OPEN data is returned. includerestricted=true must be specified to see data that is PARTIAL or RESTRICTED.

Example query
http://service.iris.edu/irisws/availability/1/extent?network=AF&cha=BHZ&show=restriction&includerestricted=true

The /query method accepts includerestricted but does not support show=restriction. By default only OPEN timespans are returned. With includerestricted=true specified OPEN, RESTRICTED and PARTIAL time spans are returned.

Authenticated Access: /extentauth /queryauth

The irisws-availability service also supports /extentauth and /queryauth methods. These behave identically to the /extent and /query methods except that they require HTTP digest access authentication. The information returned by these methods reflect what the given credentials give access to.

For testing and software development purposes, the authentication credentials: {email=nobody@iris.edu, password=anonymous} may be used. Using these credentials, information returned by the /extentauth and /queryauth methods will be identical to the non-authenticated methods /extent and /query.

Chaining requests with /extent?...format=request...

The output from the /extent method, when format=request is specified, is compatible with the POST request input to the fdsnws-dataselect web service.

This makes it useful for chaining requests from the irisws-availability to the fdsnws-dataselect service.

When format=request is specified, the request parameters mergesamplerate=true and mergequality=true are automatically applied and there will be one row in the response per [network,station,location,channel,timerange] tuple.

The following simple examples show how to fetch miniSEED data for all IU/BHZ channels for the time interval 2004-12-26T00:00:58 to 2004-12-26T00:01:58 using the wget command.

Step 1 Get the availability list and save to the file IU-BHZ.request:

$ wget -O IU-BHZ.request "http://service.iris.edu/irisws/availability/1/extent?net=IU&cha=BHZ&start=2004-12-26T00:00:58&end=2004-12-26T00:01:58&format=request" -nv
2017-11-09 15:28:29 URL:http://service.iris.edu/irisws/availability/1/extent?net=IU&cha=BHZ&start=2004-12-26T00:00:58&end=2004-12-26T00:01:58&format=request [7133/7133] -> "IU-BHZ.request" [1]

Inspect the first few lines of the response:

$ head IU-BHZ.request
IU ADK 00 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z
IU AFI 00 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z
IU AFI 10 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z
IU ANMO 00 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z
IU ANMO 10 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z
IU ANTO 00 BHZ 2004-12-26T00:00:58.000000Z 2004-12-26T00:01:58.000000Z

Step 2 Retrieve miniSEED from fdsnws-dataselect

$ wget -O IU-BHZ.miniSEED --post-file=IU-BHZ.request http://service.iris.edu/fdsnws/dataselect/1/query -nv
2017-11-09 15:39:40 URL:http://service.iris.edu/fdsnws/dataselect/1/query [617984] -> "IU-BHZ.miniseed" [1]

The file IU-BHZ.miniseed contains the miniSEED data.

/extent Row Sorting

The orderby parameter is useful for quickly identifying channels with large numbers of timespans and channels that have been recently been updated.

Warning on webservice performance when using /extent row sorting

WARNING: For general availability queries, users are recommend to use the default orderby=nslc_time_quality_samplerate row sorting. The other sorting options can potentially significantly reduce webservice performance and in some circumstances result in queries which time-out.

Row sorting examples

The top 100 entries in the IU network sorted by timespan count.
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=timespancount_desc&rowlimit=100

The top 100 entries in the IU network sorted by update time. This shows the most recently updated rows.
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=latestupdate_desc&rowlimit=100

Default Sorting: orderby=nslc_time_quality_samplerate

Sorting priority order is:

  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

With this sorting option there is, by default, no limit to the number of rows returned.

Sorting by timespan count: orderby=timespancount, orderby=timespancount_desc

Sorting priority order is:

  • number of timespans (small to large: timespancount) or (large to small timespancount_desc)
  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

Defaults: With these sorting options, the row limit defaults to 1000 (rowlimit=1000), and the timespan-count field is shown (show=timespancount)

Sorting by updated time: orderby=latestupdate, orderby=latestupdate_desc

Sorting priority order is:

  • latest data update time (past to recent: latestupdate) or (recent to past latestupdate_desc)
  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

Defaults: With these sorting options, the row limit defaults to 1000 (rowlimit=1000), and the updated field is shown (show=latestupdate)

Timespan Merging Logic

When the cache of assembled timespans is compiled, timespans from identical network, station, location, channel, sample-rate and quality tuples are merged together where possible.

As illustrated in the following figure, timespan A can be merged with timespan B if the start of B is in the window of time shown: End-of-A + 1/2-sample-period to End-of-A + 3/2-sample-period:


.
By default, timespans from identical network, station, location, channel and sample-rate tuples but different qualities are merged together using the logic shown above. This can be disabled by setting mergequality=false. If mergesamplerate=true is chosen, the same logic shown above will be applied, with the sample-period taken from Timespan B.

In general, the distribution of timespans for a network, station, location, channel, sample-rate and quality tuple can be quite complicated. This is illustrated in this figure:

If mergeoverlap=true is selected, timespans that overlap in time will be merged together. Also, timespans that are separated by less than 1/2 sample-period will also be merged.

The mergetolerance=<seconds> option will suture together timespans that are separated by no more than the given time. It can only be used with mergeovelap=true

Limitations

Cache Latency

In order to be performant, the irisws-availability service uses a cache of assembled timespan information. The cache is derived from an internal database which tracks miniSEED data in the DMC archive. The cache used by the irisws-availability service sutures together time segment information recorded in the database. The cache takes over an hour to assemble and is refreshed several times per day

The vast majority of the data contained in miniSEED archive does not change between cache refreshes, however, there will always be a certain amount of disagreement between the cache used by the webservice and the archive.

Realtime Data

The irisws-availability service only catalogs data in the archive and not data in the realtime system (BUD). Consequently, it is generally not useful for querying data availability close to realtime. It usual takes between 4 and 26+ hours for data to be copied from the BUD into the DMC archive. Data is archived in 24 hour segments by GMT day. Consequently, data from just before the end of a GMT day is placed into the archive quicker than data just after the start of a GMT day.

Memory Limitations

/query method

A small number of channels cannot be processed by the service’s /query method due to having too many timespans to load into memory. The maximum processable limit is currently set to 1,000,000 timespans. Any channel with more timespans than this value cannot be processed by the service. Clicking on the link http://service.iris.edu/irisws/availability/1/extent?orderby=timespancount_desc&rowlimit=500 will show the top 500 channels sorted by number of timespans. As can be seen, only a comparatively, small number (less than 100) cannot be processed.

By default channels with too many timespans will be ignored by the /query method. Using the excludetoolarge=false option will cause a HTTP code 413 (request too large) to be returned if any of these channels are selected.

/extent method

Queries to the /extent which contain no time constraints are not subject to timespan memory limitations. For time constrained queries, channels with over 500,000 time spans, calendar day availability information is used rather that detailed timespan information when calculating availability extents. The returned earliest and latest times will be based on which days data was available rather than the detailed timespans. Displayed, timespan counts for such channels will be reported as -1.

For example, for a channel with > 500,000 time spans, if a query of ...start=2015-02-01T12:34:56&end=2015-02-04T10:00:00... is given, and the channel has some data on days 2015-02-01 and 2015-02-04, then the returned earliest and latest times will exactly matching the query times: 2015-02-01T12:34:56 to 2015-02-04T10:00:00. However, if the channel does not have data during day 2015-02-01, but does have data during the next day then the returned earliest and latest extent times would be: 2015-02-02T00:00:00 to 2015-02-04T10:00:00.

Be aware that if a virtual network is selected in the query, implicit timespans are attached to stations. Thus, even if no time constraints are present in the query, time constraints may be applied.

Missing Metadata.

A small amount of timeseries miniSEED data in the lacks meta-data. Currently, the irisws-availability service will show this data as available. Attempting to request this data using a tool such as fdsnws-dataselect may not work because the extraction logic looks for the metadata before doing the extraction.

Latest Update-Date Inaccuracies

In some circumstances the reported latest update-dates returned from the /query method maybe later (but never earlier) than their actual values.

This is a result of how the irisws-availability service catalogs these dates in it’s internal cache. In the cache, latest update-dates are stored per GMT calendar day per network, station, location, channel, quality, sample-rate tuple. Because of the way in which most data is archived, this method of caching results in accurate update-dates being reported. However if the requested time segment does not cover a part of a day that was most recently loaded, the reported time may be later than it’s actual value.

For the majority of the data this is not an issue. It is worth emphasizing that this behavior should never result in update-dates dates being reported as earlier than their actual values, only later.

/extent Timespan Count Inaccuracies

For performance reasons, when mergequality=true and/or mergesamplerate=true are selected, a simplification is used when calculating timespan counts; the timespan counts from the different qualities and/or sample rates are simply added together. If there is no overlap between the different qualities and sample rates, the returned values will be accurate, but if they do overlap the values might be higher than they should be. As an extreme example, if two different qualities were selected, and the qualities have identical data, the returned timespan count would be double the actual value.

Page built 19:44:20 | v.fc4e8c92