•   about 8 years ago

Data Sets

Hi I had a look at the datasets and it seems indeed there is a massive problem in terms of how the government expresses their data. They really need to open up data apis.

They are nearly all XLS / ODS yeah some are open office spreadsheets. There seems to be python packages available to extract this out to JSON so either in command line or via django perhaps.

OR process the datasets into a local db which might help with caching , which can be scheduled to update.

A command line script using python could be used to suck the whole thing up into a DB and scheduled via cron.

  • 9 comments

  • Manager   •   about 8 years ago

    great suggestion daniel - are you signing up to make it so??

  •   •   about 8 years ago

    The structures are very rigid so not common fields so alot of hard coding required but it's possible into one rest api. That is what's needed. Probably not in a few days. You wouldn't be able to put this off in one day without wasting time doing so.

    This is standard of government.

    I actually thought it was last weekend. Im not promising I can make it again but I can try and have a go.

    I had setup a temporary VPS for this project for others to login to and use if needed. I don't have a group as such.

    https://techfugees.electroteque.org

  •   •   about 8 years ago

    I grabbed one of them and am able to now split out the different sections per worksheet it requires a bit of format detection but most of it can be dynamic rather than static.

    TOTAL PROJECT POPULATION
    ANNUAL GROWTH RATE (%)
    TOTAL POPULATION GROWTH INDEX (2011 populations = 100)
    NSW SUMMARY POPULATION ACCOUNTS
    New South Wales population by five year age groups
    New South Wales population by five year age groups
    New South Wales population by five year age groups
    TOTAL PROJECT POPULATION
    ANNUAL GROWTH RATE (%)
    Sydney Metropolitan population by five year age groups
    Sydney Metropolitan population by five year age groups
    Sydney Metropolitan population by five year age groups
    Lower Hunter and Central Coast population by five year age groups
    Lower Hunter and Central Coast population by five year age groups
    Lower Hunter and Central Coast population by five year age groups
    Illawarra population by five year age groups
    Illawarra population by five year age groups
    Illawarra population by five year age groups
    Regional NSW population by five year age groups
    Regional NSW population by five year age groups
    Regional NSW population by five year age groups
    LGA
    Sydney Metropolitan LGAs
    Lower Hunter and Central Coast
    Illawarra
    Regional NSW LGAs
    NEW SOUTH WALES
    Sydney Metropolitan
    LGA
    Lower Hunter and Central Coast
    LGA
    Illawarra
    LGA
    Regional NSW:

    LGA
    Small LGAs - Regional NSW:
    LGA

    Easiest to get these into json files first before looking at a way getting this into a REST service.

  •   •   about 8 years ago

    Apologies won't be able to make it. It seems doing it dynamically is too hard it has to be scripted statically. the dimensions of the data is too inconsistent. This is data just from one worksheet as an example. At least it makes the data usable. It could be done differently but a start. I might try and generate json files from all the available XLS but might not make it for tommorow haha.

    {
    "title": "New South Wales State and Local Government Area Population Projections: 2014 Final",
    "sections": {
    "total_project_population": {
    "title": "TOTAL PROJECT POPULATION",
    "years": {
    "2011": 7218550.0,
    "2016": 7708850.0,
    "2021": 8230400.0,
    "2026": 8739950.0,
    "2031": 9228350.0
    }
    },
    "annual_growth_rate": {
    "title": "ANNUAL GROWTH RATE (%)",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "rate": 1.3143264505152674
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "rate": 1.3093691794821598
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "rate": 1.2013578998379273
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "rate": 1.0875727542634235
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "rate": 0.9982007171722904
    }
    }
    },
    "summary_population_accounts": {
    "title": "NSW SUMMARY POPULATION ACCOUNTS",
    "data": [
    {
    "name": "Start-of-period population",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "total": 7218550.0
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "total": 7708850.0
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "total": 8230400.0
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "total": 8739950.0
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "total": 9228350.0
    }
    }
    },
    {
    "name": "Births",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "total": 507550.0
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "total": 538150.0
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "total": 558350.0
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "total": 568000.0
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "total": 580350.0
    }
    }
    },
    {
    "name": "Deaths",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "total": 250950.0
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "total": 267000.0
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "total": 284600.0
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "total": 306600.0
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "total": 335100.0
    }
    }
    },
    {
    "name": "Net interstate migration",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "total": -100000.0
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "total": -100000.0
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "total": -100000.0
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "total": -100000.0
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "total": -100000.0
    }
    }
    },
    {
    "name": "Net overseas migration",
    "years_range": {
    "2011": {
    "start": "2011",
    "end": "2016",
    "total": 333700.0
    },
    "2016": {
    "start": "2016",
    "end": "2021",
    "total": 350450.0
    },
    "2021": {
    "start": "2021",
    "end": "2026",
    "total": 335800.0
    },
    "2026": {
    "start": "2026",
    "end": "2031",
    "total": 327000.0
    },
    "2031": {
    "start": "2031",
    "end": "2036",
    "total": 327000.0
    }
    }
    }
    ]
    }
    }
    }

  • Manager   •   about 8 years ago

    daniel - thx for giving it a go. appreciate you looking at making the data more accessible

  •   •   about 8 years ago

    This could take manually a whole week not stop to make sense of it into proper keys and dimensions. Sorry I have to let it go right now but if it's needed it is possible. I only managed to get 4 worksheets done of one workbook. It can't be processed dynamically, and the listings because it will be static json output rather than searchable in a db, the keys need to be setup right. Excel like access is no good.

    You have to specify rows and columns manually , and then double check the data is right, the indexes always get messed up lol.

    I could keep going tomorrow and update here with static json files. I won't be able to make it all the way in to the cbd.

  • Manager   •   about 8 years ago

    Daniel the hack is in Liverpool and I'm sure we can access Better data sets https://www.govhack.org/2015-data/ And http://data.nsw.gov.au

  •   •   about 8 years ago

    They are just links back to pdf's, Excel files etc.

    They even have a data repository here, it has JSON links but that is JSON representation of links to pdf files.

    http://data.nsw.gov.au/data/dataset/818bf625-5e69-437d-9839-1913a8890190

    There is no central REST interface for applications to consume, with API keys etc. that is a real shame.

    That is why I was attempting to transform them into JSON format. You should even be able to run filters to return only some of the data in such rest interface.

  • Manager   •   about 8 years ago

    I'm sorry about that Daniel, they are the sets used for GovHack. We will have data mentors around over the weekend. And thanks so much for your feedback.

Comments are closed.