It seems that empty URLs are in some cases resolved to the encompassing rdf file leading to incorrect (and at least for Münster) malformed entries in the turtle files.
Example from Münster_(Westfalen).rdf:
<https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2>
a cat:Distribution ;
dcterms:description "" ;
dcterms:format "wms" ;
dcterms:issued "2019-07-01T17:33:24+02:00"^^xsd:date ;
dcterms:modified "2019-07-18T11:21:22+02:00"^^xsd:date ;
dcterms:title "Sporthallen und Sportstätten - Standorte - WMS-GetMap" ;
cat:accessURL <https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2> ;
cat:byteSize "" ;
cat:downloadURL <file:///home/lisa/repos/crawling/target/Münster_(Westfalen).rdf> ;
cat:mediaType "" ;
foaf:page "https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2" .
The cat:downloadURL <file:///home/lisa/repos/crawling/target/Münster_(Westfalen).rdf> ; is incorrect and malformed ('ü').
Looking at the catalog entry on the website it should be empty: <dcat:downloadURL rdf:resource=""/>
Grepping for 'home/lisa' in catalog/toLoad leads to 2948 results for various fields (at least dcat:accessURL, dcat:downloadURL, and vcard:hasURL). I did not check if the reason is always an empty URL in the original data.
It seems that empty URLs are in some cases resolved to the encompassing rdf file leading to incorrect (and at least for Münster) malformed entries in the turtle files.
Example from
Münster_(Westfalen).rdf:The
cat:downloadURL <file:///home/lisa/repos/crawling/target/Münster_(Westfalen).rdf> ;is incorrect and malformed ('ü').Looking at the catalog entry on the website it should be empty:
<dcat:downloadURL rdf:resource=""/>Grepping for 'home/lisa' in
catalog/toLoadleads to 2948 results for various fields (at leastdcat:accessURL,dcat:downloadURL, andvcard:hasURL). I did not check if the reason is always an empty URL in the original data.