Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem executing DigitalConnector recipe #545

Open
thomiko opened this issue Mar 17, 2018 · 15 comments
Open

Problem executing DigitalConnector recipe #545

thomiko opened this issue Mar 17, 2018 · 15 comments

Comments

@thomiko
Copy link

thomiko commented Mar 17, 2018

{
"dataset": {
"subjects": [
{
// The output subjects are all LSOAs
"provider": "uk.gov.ons",
"subjectType": "lsoa",
"matchRule": {
"attribute": "name",
"pattern": "London%"
}
}
],
"datasources": [
{
"importerClass": "uk.org.tombolo.importer.dft.AccessibilityImporter",
"datasourceId": "acs0507"
},
{
// Importer for LSOA geographies
"importerClass": "uk.org.tombolo.importer.ons.OaImporter",
"datasourceId": "lsoa"
}
],
"fields": [
{
// Area of LSOA
"fieldClass": "uk.org.tombolo.field.value.LatestValueField",
"label": "component:Travel time",
"attribute": {
"provider": "uk.gov.dft",
"label": "SUPO008"
}
}
]
},
"exporter": "uk.org.tombolo.exporter.GeoJsonExporter"
}

@borkurdotnet
Copy link
Contributor

The LSOA names in London start with the name of the Borough and the LSOA labels do not have a pattern. However, all boroughs in London have a label starting the string E090. Hence the way to get all LSOAs in London is:
{
"subjectType": "lsoa",
"provider": "uk.gov.ons",
"geoMatchRule": {
"geoRelation": "within",
"subjects": [
{
"subjectType": "localAuthority",
"provider": "uk.gov.ons",
"matchRule": {
"attribute": "label",
"pattern": "E090%"
}
}
]
}
}

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

After having downloaded the external Excel file, the gradle export fails with a Java OutOfMemoryError:

Downloading external resource: https://www.gov.uk/government/uploads/system/uplo
ads/attachment_data/file/357469/acs0507.xls
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceede
d
at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord(HSSFRow.ja
va:223)

My laptop has 8GB of RAM, so that's not necessarily the problem. Is there a way for the gradle runExport process to assign more RAM to the process, for example by means of a commandline parameter or from an ini file?

For example in an R script you can do:
options(java.parameters = "-Xmx4g" )

This often avoids Java outOfMemory problems because by default an R script only receives 1GB of RAM.

@borkurdotnet
Copy link
Contributor

You could look at changing the value for the runExport process in the build.gradle

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

If I want to retrieve the green areas info for London similar to the 'greenspace-hertfordshire.json' example recipe, what do I have to put in here?

green-areas-recipe

If I do it this way, I get the following error message:

-----> TASK FAILED: http://download.geofabrik.de/europe/great-britain/england/lo
ndon-latest.osm.pbf<-----

java.io.FileNotFoundException: http://download.geofabrik.de/europe/great-britain
/england/london-latest.osm.pbf

@borkurdotnet
Copy link
Contributor

The OSM file for london is called: europe/great-britain/england/greater-london

See here a list of different areas you could put:

https://download.geofabrik.de/europe/great-britain/england.html

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

What does this targetCRSCode refer to? Is it a reference to the location being queried? It's from the green areas example recipe.

green-areas-recipe2

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

Back to the Java OutOfMemory error: I set all the MaxHeapSize values to the maximum value (i.e. the RAM available)

maxheapsize_2

maxheapsize_3

With these settings the export process ran much longer (~ 40 mins) than before but eventually failed again with an OutOfMemoryError:

2018-03-17 18:10:10.160 [main] INFO u.org.tombolo.importer.DownloadUtils - Fetc
hing local file: C:\tmp\TomboloData\uk.gov.dft\767f0676-d56b-3d2f-ab29-754898185
b8e.xls
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord(HSSFRow.ja
va:223)

@borkurdotnet
Copy link
Contributor

In this case the target code is used to determine the unit used for the area. In the WGS4326 code the unit is degrees, which gives hard to interpret numbers for area. However the 27700 code uses the metric system and the results are more easily interpreted.

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

So for London I should use the 27700 code as well?

@borkurdotnet
Copy link
Contributor

Yes

@borkurdotnet
Copy link
Contributor

Regarding the out of memory ... did you change the value for runExport as well?

But more generally, I think that we can conclude that we need to look at this importer after the weekend and find a more scalable solution.

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

When I 'runExport' the following recipe to get the green areas for London, the build's successful in 10 seconds but the output file is almost empty:

{
"dataset": {
"subjects": [
{
// The output subjects are all LSOAs
"provider": "uk.gov.ons",
"subjectType": "lsoa",
"matchRule": {
"attribute": "name",
"pattern": "E090%"
}
}
],
"datasources": [
{
// Importer for LSOA geographies
"importerClass": "uk.org.tombolo.importer.ons.OaImporter",
"datasourceId": "lsoa"
},
{
//": "Green space data for the entire UK",
"importerClass": "uk.org.tombolo.importer.osm.OSMImporter",
"datasourceId": "OSMGreenspace",
"geographyScope": ["europe/great-britain/england/greater-london"]
}
],
"fields": [
{
//Proportion of green space
"fieldClass": "uk.org.tombolo.field.transformation.ArithmeticField",
"label": "index:GreenspaceFraction",
"operation": "div",
"field1": {
// Sum of green space areas
"fieldClass": "uk.org.tombolo.field.aggregation.GeographicAggregationField",
"label": "GreenspaceSum",
"subject": {
"provider": "org.openstreetmap",
"subjectType": "OSMEntity"
},
"function": "sum",
"field": {
"fieldClass": "uk.org.tombolo.field.assertion.OSMBuiltInAttributeMatcherField",
"label": "AreaGreenspace",
"attributes": [
{
"provider": "org.openstreetmap",
"label": "built-in-greenspace"
}
],
"field": {
// Area of LSOA
"fieldClass": "uk.org.tombolo.field.transformation.AreaField",
"label": "AreaLSOA",
"targetCRSCode": 27700
}
}
},
"field2": {
// Area of LSOA
"fieldClass": "uk.org.tombolo.field.transformation.AreaField",
"label": "AreaLSOA",
"targetCRSCode": 27700
}
},
{
// Sum of green space areas
"fieldClass": "uk.org.tombolo.field.aggregation.GeographicAggregationField",
"label": "component:GreenspaceSum",
"subject": {
"provider": "org.openstreetmap",
"subjectType": "OSMEntity"
},
"function": "sum",
"field": {
"fieldClass": "uk.org.tombolo.field.assertion.OSMBuiltInAttributeMatcherField",
"label": "AreaGreenspace",
"attributes": [
{
"provider": "org.openstreetmap",
"label": "built-in-greenspace"
}
],
"field": {
// Area of LSOA
"fieldClass": "uk.org.tombolo.field.transformation.AreaField",
"label": "AreaLSOA",
"targetCRSCode": 27700
}
}
},
{
// Area of LSOA
"fieldClass": "uk.org.tombolo.field.transformation.AreaField",
"label": "component:AreaLSOA",
"targetCRSCode": 27700
}
]
},
"exporter": "uk.org.tombolo.exporter.GeoJsonExporter"
}

What's wrong with it?

Output:
{"type":"FeatureCollection","features":[]}

@borkurdotnet
Copy link
Contributor

The subject specification for LSOAs in london is:

{
"subjectType": "lsoa",
"provider": "uk.gov.ons",
"geoMatchRule": {
"geoRelation": "within",
"subjects": [
{
"subjectType": "localAuthority",
"provider": "uk.gov.ons",
"matchRule": {
"attribute": "label",
"pattern": "E090%"
}
}
]
}
}

instead of

{ // The output subjects are all LSOAs "provider": "uk.gov.ons", "subjectType": "lsoa", "matchRule": { "attribute": "name", "pattern": "E090%" } }

@thomiko
Copy link
Author

thomiko commented Mar 17, 2018

Unfortunately still no full success:

-----> TASK FAILED: Could not compute Field component:AreaLSOA for Subject E0100
0001(2480), reason: For input string: "590983,03"<-----
Caused by null

java.lang.IllegalArgumentException: Could not compute Field component:AreaLSOA f
or Subject E01000001(2480), reason: For input string: "590983,03"
at uk.org.tombolo.exporter.GeoJsonExporter.lambda$getPropertiesForSubjec
t$0(GeoJsonExporter.java:71)

@borkurdotnet
Copy link
Contributor

Interesting ... It could that the digital connector is not German proof :/ (need more debugging to be sure)

I.e. it could be that some of the system outputs numbers using your localised environment (using commas for decimals) but another part of the system does not use the localised version (using dots for decimals).

Thanks for hanging in there and trying ... sorry for things not working well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants