
Daniel Rossi • about 8 years ago
Data Sets
Hi I had a look at the datasets and it seems indeed there is a massive problem in terms of how the government expresses their data. They really need to open up data apis.
They are nearly all XLS / ODS yeah some are open office spreadsheets. There seems to be python packages available to extract this out to JSON so either in command line or via django perhaps.
OR process the datasets into a local db which might help with caching , which can be scheduled to update.
A command line script using python could be used to suck the whole thing up into a DB and scheduled via cron.
Comments are closed.
9 comments
Frank Arrigo Manager • about 8 years ago
great suggestion daniel - are you signing up to make it so??
Daniel Rossi • about 8 years ago
The structures are very rigid so not common fields so alot of hard coding required but it's possible into one rest api. That is what's needed. Probably not in a few days. You wouldn't be able to put this off in one day without wasting time doing so.
This is standard of government.
I actually thought it was last weekend. Im not promising I can make it again but I can try and have a go.
I had setup a temporary VPS for this project for others to login to and use if needed. I don't have a group as such.
https://techfugees.electroteque.org
Daniel Rossi • about 8 years ago
I grabbed one of them and am able to now split out the different sections per worksheet it requires a bit of format detection but most of it can be dynamic rather than static.
TOTAL PROJECT POPULATION
ANNUAL GROWTH RATE (%)
TOTAL POPULATION GROWTH INDEX (2011 populations = 100)
NSW SUMMARY POPULATION ACCOUNTS
New South Wales population by five year age groups
New South Wales population by five year age groups
New South Wales population by five year age groups
TOTAL PROJECT POPULATION
ANNUAL GROWTH RATE (%)
Sydney Metropolitan population by five year age groups
Sydney Metropolitan population by five year age groups
Sydney Metropolitan population by five year age groups
Lower Hunter and Central Coast population by five year age groups
Lower Hunter and Central Coast population by five year age groups
Lower Hunter and Central Coast population by five year age groups
Illawarra population by five year age groups
Illawarra population by five year age groups
Illawarra population by five year age groups
Regional NSW population by five year age groups
Regional NSW population by five year age groups
Regional NSW population by five year age groups
LGA
Sydney Metropolitan LGAs
Lower Hunter and Central Coast
Illawarra
Regional NSW LGAs
NEW SOUTH WALES
Sydney Metropolitan
LGA
Lower Hunter and Central Coast
LGA
Illawarra
LGA
Regional NSW:
LGA
Small LGAs - Regional NSW:
LGA
Easiest to get these into json files first before looking at a way getting this into a REST service.
Daniel Rossi • about 8 years ago
Apologies won't be able to make it. It seems doing it dynamically is too hard it has to be scripted statically. the dimensions of the data is too inconsistent. This is data just from one worksheet as an example. At least it makes the data usable. It could be done differently but a start. I might try and generate json files from all the available XLS but might not make it for tommorow haha.
{
"title": "New South Wales State and Local Government Area Population Projections: 2014 Final",
"sections": {
"total_project_population": {
"title": "TOTAL PROJECT POPULATION",
"years": {
"2011": 7218550.0,
"2016": 7708850.0,
"2021": 8230400.0,
"2026": 8739950.0,
"2031": 9228350.0
}
},
"annual_growth_rate": {
"title": "ANNUAL GROWTH RATE (%)",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"rate": 1.3143264505152674
},
"2016": {
"start": "2016",
"end": "2021",
"rate": 1.3093691794821598
},
"2021": {
"start": "2021",
"end": "2026",
"rate": 1.2013578998379273
},
"2026": {
"start": "2026",
"end": "2031",
"rate": 1.0875727542634235
},
"2031": {
"start": "2031",
"end": "2036",
"rate": 0.9982007171722904
}
}
},
"summary_population_accounts": {
"title": "NSW SUMMARY POPULATION ACCOUNTS",
"data": [
{
"name": "Start-of-period population",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"total": 7218550.0
},
"2016": {
"start": "2016",
"end": "2021",
"total": 7708850.0
},
"2021": {
"start": "2021",
"end": "2026",
"total": 8230400.0
},
"2026": {
"start": "2026",
"end": "2031",
"total": 8739950.0
},
"2031": {
"start": "2031",
"end": "2036",
"total": 9228350.0
}
}
},
{
"name": "Births",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"total": 507550.0
},
"2016": {
"start": "2016",
"end": "2021",
"total": 538150.0
},
"2021": {
"start": "2021",
"end": "2026",
"total": 558350.0
},
"2026": {
"start": "2026",
"end": "2031",
"total": 568000.0
},
"2031": {
"start": "2031",
"end": "2036",
"total": 580350.0
}
}
},
{
"name": "Deaths",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"total": 250950.0
},
"2016": {
"start": "2016",
"end": "2021",
"total": 267000.0
},
"2021": {
"start": "2021",
"end": "2026",
"total": 284600.0
},
"2026": {
"start": "2026",
"end": "2031",
"total": 306600.0
},
"2031": {
"start": "2031",
"end": "2036",
"total": 335100.0
}
}
},
{
"name": "Net interstate migration",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"total": -100000.0
},
"2016": {
"start": "2016",
"end": "2021",
"total": -100000.0
},
"2021": {
"start": "2021",
"end": "2026",
"total": -100000.0
},
"2026": {
"start": "2026",
"end": "2031",
"total": -100000.0
},
"2031": {
"start": "2031",
"end": "2036",
"total": -100000.0
}
}
},
{
"name": "Net overseas migration",
"years_range": {
"2011": {
"start": "2011",
"end": "2016",
"total": 333700.0
},
"2016": {
"start": "2016",
"end": "2021",
"total": 350450.0
},
"2021": {
"start": "2021",
"end": "2026",
"total": 335800.0
},
"2026": {
"start": "2026",
"end": "2031",
"total": 327000.0
},
"2031": {
"start": "2031",
"end": "2036",
"total": 327000.0
}
}
}
]
}
}
}
Frank Arrigo Manager • about 8 years ago
daniel - thx for giving it a go. appreciate you looking at making the data more accessible
Daniel Rossi • about 8 years ago
This could take manually a whole week not stop to make sense of it into proper keys and dimensions. Sorry I have to let it go right now but if it's needed it is possible. I only managed to get 4 worksheets done of one workbook. It can't be processed dynamically, and the listings because it will be static json output rather than searchable in a db, the keys need to be setup right. Excel like access is no good.
You have to specify rows and columns manually , and then double check the data is right, the indexes always get messed up lol.
I could keep going tomorrow and update here with static json files. I won't be able to make it all the way in to the cbd.
Anne-Marie Elias Manager • about 8 years ago
Daniel the hack is in Liverpool and I'm sure we can access Better data sets https://www.govhack.org/2015-data/ And http://data.nsw.gov.au
Daniel Rossi • about 8 years ago
They are just links back to pdf's, Excel files etc.
They even have a data repository here, it has JSON links but that is JSON representation of links to pdf files.
http://data.nsw.gov.au/data/dataset/818bf625-5e69-437d-9839-1913a8890190
There is no central REST interface for applications to consume, with API keys etc. that is a real shame.
That is why I was attempting to transform them into JSON format. You should even be able to run filters to return only some of the data in such rest interface.
Anne-Marie Elias Manager • about 8 years ago
I'm sorry about that Daniel, they are the sets used for GovHack. We will have data mentors around over the weekend. And thanks so much for your feedback.