query for newest vintage of each variable 

Thank you for gathering and sharing this data in an easy-to-use form.

Taking a look at:
```
c = cmdc.Client()
c.covid()
df = c.fetch()
df.groupby(['fips', 'vintage']).size()
```

I see that different fips have different vintages of data available. For example in the data I fetched yesterday the state fips and CA/06 counties have 4 vintages 2020-05-29 to 2020-06-01 but AK/02 counties have only 2020-05-30. I get that it is handy to keep past versions of the data for debugging but suspect most clients only want the latest and greatest version of each timeseries.

I'm guessing you expect data consumers to group by ['fips', 'dt'], sort by 'vintage' then keep only the newest row. How does `df.sort_values('vintage').groupby(['fips', 'dt']).last()` look to you? Can you make fetching old vintages optional?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query for newest vintage of each variable #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

query for newest vintage of each variable #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions