Discussion:
[CF-metadata] Missing data bins in histograms
m***@stfc.ac.uk
2016-10-11 12:41:21 UTC
Permalink
Hello,

the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.

regards,
Martin
Jonathan Gregory
2016-10-11 17:17:06 UTC
Permalink
Dear Martin

I feel there would be an advantage in flexibility, by not requiring the missing
data count to be the first bin necessarily. The new attribute could indicate
the index of the bin which contains the missing data count. I suggest that
this would be an attribute of the coordinate variable of the histogram for the
quantity which is binned (cloud top height), since the index refers to that
dimension specifically.

I agree it would be neat if it were possible instead to put _FillValue in the
coordinate variable. Actually _FillValue is not allowed in coordinate vars by
CF, so as far as CF is concerned it would not be a problem to adopt this as a
new convention. But maybe software would have problems with it. If we need the
new attribute, I'd suggest missing_value_index, to make it more similar to
missing_value and _FillValue. What would you put in the coordinate and bounds
for the missing data bin?

In any case, this needs a new convention to be proposed as a trac ticket.

Best wishes

Jonathan
Date: Tue, 11 Oct 2016 12:41:21 +0000
Subject: [CF-metadata] Missing data bins in histograms
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----
Karl Taylor
2016-10-11 17:24:08 UTC
Permalink
Hello,

The histogram records frequencies of a single characteristic of a
variable (in this case for cloud top height). I think that information
about whether or not a cloud exists should not be formally a part of the
histogram. We could adopt the convention for this variable that in the
absence of clouds, the cloud is considered to be "under ground" so the
upper bound of the height of a missing cloud would be 0. [This is
akin to Lorenz's definition of the potential temperature isotherms as
coinciding with the ground in his discussion of available potential energy.]

By the way, I couldn't find this variable in the current release of the
CMIP6 data request. Is it there? If not, could you say a bit more
about how the bins are defined? Are they height or pressure bins?

thanks,
Karl
Post by m***@stfc.ac.uk
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Jim Biard
2016-10-11 18:39:56 UTC
Permalink
Hi.

Another approach could be to use flag_values and flag_meanings on the
coordinate variable to indicate one or more special coordinate values
that correspond to any number of "missing data" or "out of bounds" bins.
These attributes aren't forbidden by CF, and everything should be fine
as long as the coordinate variable remains monotonic.

Grace and peace,

Jim
Post by m***@stfc.ac.uk
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: ***@cicsnc.org <mailto:***@cicsnc.org>
o: +1 828 271 4900

/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us
on Twitter at @NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and
@NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /
m***@stfc.ac.uk
2016-10-12 17:14:38 UTC
Permalink
Dear Karl, Jonathan, Jim,

thanks for those comments.

The CMIP6 variable in question is clmisr (http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html) with a coordinatte of 16 altitude bins (http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).

I'd be happy with Jim's proposed solution, which does not need any change to the convention, though it may be a bit cryptic: all the examples in the convention are for cases in which all array values are intended to match one of the flag_values. Having an array which is a mixture of flags and "normal" values would be a new usage. We could, perhaps, introduce a consistency problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, for variables with standard_name "area_type", flag_values and flag_meanings can be used to encode the data, in which case it is the "flag_meanings" which match the requirements of the standard name. Here, on the other hand, we want the special bin to be the exception which is not described by the standard name (altitude). So .. perhaps it is simpler to introduce a new attribute name?

Concerning Jonathan and Karl's comments, the idea of calling it a "missing_value" was a mistake I made, but it actually refers to locations where cloud is detected but the height of the cloud cannot be retrieved.

The current proposal is to have a value of 0.0 in the coordinate and (-99000.0,0.0) in the bounds of the special value "bin". I imagine these need to be present, but I think their values are not going to mean anything.

It is certainly possible to do as Karl suggests and place an explanation in the variable description. Having the special status of the first bin explicitly flagged in way which can be easily picked up by software brings added value.

regards,
Martin
Jonathan Gregory
2016-10-12 17:30:03 UTC
Permalink
Dear Jim

That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree.

Best wishes

Jonathan
Date: Tue, 11 Oct 2016 14:39:56 -0400
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0
Hi.
Another approach could be to use flag_values and flag_meanings on
the coordinate variable to indicate one or more special coordinate
values that correspond to any number of "missing data" or "out of
bounds" bins. These attributes aren't forbidden by CF, and
everything should be fine as long as the coordinate variable remains
monotonic.
Grace and peace,
Jim
Post by m***@stfc.ac.uk
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
o: +1 828 271 4900
/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow
<https://twitter.com/NOAANCEIocngeo>. /
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----
Jim Biard
2016-10-12 18:58:11 UTC
Permalink
Jonathan,

Missing/fill values are not allowed, but I don't see any language
prohibiting flags. I'd appreciate it if you could expand on your
thoughts about why they aren't allowed.

Grace and peace,

Jim
Post by Jonathan Gregory
Dear Jim
That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree.
Best wishes
Jonathan
Date: Tue, 11 Oct 2016 14:39:56 -0400
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0
Hi.
Another approach could be to use flag_values and flag_meanings on
the coordinate variable to indicate one or more special coordinate
values that correspond to any number of "missing data" or "out of
bounds" bins. These attributes aren't forbidden by CF, and
everything should be fine as long as the coordinate variable remains
monotonic.
Grace and peace,
Jim
Post by m***@stfc.ac.uk
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
o: +1 828 271 4900
/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow
<https://twitter.com/NOAANCEIocngeo>. /
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: ***@cicsnc.org <mailto:***@cicsnc.org>
o: +1 828 271 4900

/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us
on Twitter at @NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and
@NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /
Jonathan Gregory
2016-10-12 17:42:21 UTC
Permalink
Dear Martin

I'm still uneasy about it having to be the first bin, in particular, or are
you not set on that? If it can be identified from the coordinate value by
flags, it could be any bin.

I believe that a change to convention would be needed to allow flag values
to be used with coordinates, unless we've already agreed that in some ticket.

Best wishes

Jonathan
Date: Wed, 12 Oct 2016 17:14:38 +0000
Subject: [CF-metadata] Missing data bins in histograms
Dear Karl, Jonathan, Jim,
thanks for those comments.
The CMIP6 variable in question is clmisr (http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html) with a coordinatte of 16 altitude bins (http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).
I'd be happy with Jim's proposed solution, which does not need any change to the convention, though it may be a bit cryptic: all the examples in the convention are for cases in which all array values are intended to match one of the flag_values. Having an array which is a mixture of flags and "normal" values would be a new usage. We could, perhaps, introduce a consistency problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, for variables with standard_name "area_type", flag_values and flag_meanings can be used to encode the data, in which case it is the "flag_meanings" which match the requirements of the standard name. Here, on the other hand, we want the special bin to be the exception which is not described by the standard name (altitude). So .. perhaps it is simpler to introduce a new attribute name?
Concerning Jonathan and Karl's comments, the idea of calling it a "missing_value" was a mistake I made, but it actually refers to locations where cloud is detected but the height of the cloud cannot be retrieved.
The current proposal is to have a value of 0.0 in the coordinate and (-99000.0,0.0) in the bounds of the special value "bin". I imagine these need to be present, but I think their values are not going to mean anything.
It is certainly possible to do as Karl suggests and place an explanation in the variable description. Having the special status of the first bin explicitly flagged in way which can be easily picked up by software brings added value.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----
m***@stfc.ac.uk
2016-10-13 08:20:56 UTC
Permalink
Dear Jonathan,

I'm sorry I didn't respond on the point about it being the first bin: I had not intended the special value to be restricted to the first bin, so I guess there is something ambiguous in my intial formulation which is giving this impression. I agree that we should formulate any extension so that it can apply to any bin, and I also think it should be possible to label multiple bins in this way.

regards,
Martin
Jonathan Gregory
2016-10-13 09:35:33 UTC
Permalink
Dear Martin

Ah, OK, thanks. I must have misunderstood.

Best wishes

Jonathan
Date: Thu, 13 Oct 2016 08:20:56 +0000
Subject: [CF-metadata] Missing data bins in histograms
Dear Jonathan,
I'm sorry I didn't respond on the point about it being the first bin: I had not intended the special value to be restricted to the first bin, so I guess there is something ambiguous in my intial formulation which is giving this impression. I agree that we should formulate any extension so that it can apply to any bin, and I also think it should be possible to label multiple bins in this way.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Jonathan Gregory
2016-10-13 09:42:47 UTC
Permalink
Dear Jim

In Appendix A it does not say that the flag attributes are allowed for
coordinate variables - it has just "D" in the "Use" column. This is not an
argument why they shouldn't be if there is a need, but they weren't introduced
with that in mind. The use which you suggested for Martin's case is a good
idea, but I think it would need a change to the convention.

Best wishes

Jonathan
Date: Wed, 12 Oct 2016 14:58:11 -0400
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0
Jonathan,
Missing/fill values are not allowed, but I don't see any language
prohibiting flags. I'd appreciate it if you could expand on your
thoughts about why they aren't allowed.
Grace and peace,
Jim
Post by Jonathan Gregory
Dear Jim
That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree.
Best wishes
Jonathan
Date: Tue, 11 Oct 2016 14:39:56 -0400
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0
Hi.
Another approach could be to use flag_values and flag_meanings on
the coordinate variable to indicate one or more special coordinate
values that correspond to any number of "missing data" or "out of
bounds" bins. These attributes aren't forbidden by CF, and
everything should be fine as long as the coordinate variable remains
monotonic.
Grace and peace,
Jim
Post by m***@stfc.ac.uk
Hello,
the CF standard name list has two "histogram_.... " entries, and in the CMIP6 data request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we also need, for this new variable, a method of encoding the "missing data" bin in the histogram. That is, the histogram should record frequency in 16 data bins and one additional bin for the frequency of missing data.
Can we define a "missing_data_index" attribute for histogram variables, and use this to indicate that the first bin in the array has this special purpose. It might be more pythonic to put the _FillValue in the coordinate value for the missing data bin, but I suspect that this would cause substantial problems for many software packages.
regards,
Martin
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
o: +1 828 271 4900
/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow
<https://twitter.com/NOAANCEIocngeo>. /
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
o: +1 828 271 4900
/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow
<https://twitter.com/NOAANCEIocngeo>. /
_______________________________________________
CF-metadata mailing list
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
----- End forwarded message -----

Continue reading on narkive:
Loading...