Skip to content
This repository was archived by the owner on Aug 1, 2020. It is now read-only.
This repository was archived by the owner on Aug 1, 2020. It is now read-only.

query for newest vintage of each variable  #2

@TomGoBravo

Description

@TomGoBravo

Thank you for gathering and sharing this data in an easy-to-use form.

Taking a look at:

c = cmdc.Client()
c.covid()
df = c.fetch()
df.groupby(['fips', 'vintage']).size()

I see that different fips have different vintages of data available. For example in the data I fetched yesterday the state fips and CA/06 counties have 4 vintages 2020-05-29 to 2020-06-01 but AK/02 counties have only 2020-05-30. I get that it is handy to keep past versions of the data for debugging but suspect most clients only want the latest and greatest version of each timeseries.

I'm guessing you expect data consumers to group by ['fips', 'dt'], sort by 'vintage' then keep only the newest row. How does df.sort_values('vintage').groupby(['fips', 'dt']).last() look to you? Can you make fetching old vintages optional?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions