You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 1, 2020. It is now read-only.
Thank you for gathering and sharing this data in an easy-to-use form.
Taking a look at:
c = cmdc.Client()
c.covid()
df = c.fetch()
df.groupby(['fips', 'vintage']).size()
I see that different fips have different vintages of data available. For example in the data I fetched yesterday the state fips and CA/06 counties have 4 vintages 2020-05-29 to 2020-06-01 but AK/02 counties have only 2020-05-30. I get that it is handy to keep past versions of the data for debugging but suspect most clients only want the latest and greatest version of each timeseries.
I'm guessing you expect data consumers to group by ['fips', 'dt'], sort by 'vintage' then keep only the newest row. How does df.sort_values('vintage').groupby(['fips', 'dt']).last() look to you? Can you make fetching old vintages optional?
Thank you for gathering and sharing this data in an easy-to-use form.
Taking a look at:
I see that different fips have different vintages of data available. For example in the data I fetched yesterday the state fips and CA/06 counties have 4 vintages 2020-05-29 to 2020-06-01 but AK/02 counties have only 2020-05-30. I get that it is handy to keep past versions of the data for debugging but suspect most clients only want the latest and greatest version of each timeseries.
I'm guessing you expect data consumers to group by ['fips', 'dt'], sort by 'vintage' then keep only the newest row. How does
df.sort_values('vintage').groupby(['fips', 'dt']).last()look to you? Can you make fetching old vintages optional?