Skip to content

entur/geocoder

Repository files navigation

Geocoder

Geocoding service consisting of a Photon backend search engine and a Proxy frontend.

Deployment

All deployment runs from the main branch.

The daily import uses the prod-approved tag. Remember to update that whenever needed:

git tag -f prod-approved [sha]
git push origin prod-approved --force

Proxy

Automatic

  • Push to main → builds and deploys to dev. Further tst and prd deploys needs approval. All builds run acceptance tests after deployment.

Manual

  • proxy.yml also supports manual dispatch (target: dev only | dev → tst → prd | tst → prd).
  • proxy-deploy.yml — Deploy an existing image tag

Photon

Scheduled

  • Daily at 06:27 UTC: full data import + build + deploy to tst → prd (no approval gates). Checks out the prod-approved tag to avoid using untested commits, and updates the latest-prod.txt pointer.

Manual:

  • photon.yml — Import data, build Photon image, deploy (target: dev only | dev → tst → prd | tst → prd; optional config, default converter-prod.json)
  • photon-deploy.yml — Deploy an existing Photon image tag

Sweden (dev only)

Scheduled & monitoring

  • cache-data-sources.yml — Daily at 03:00 UTC: downloads the third-party source files (matrikkel, stedsnavn, custom POIs from poiman) plus PostHog popular-stops, verifies size, and uploads them to gs://ent-geocoder-prd/data-sources/. The nightly Photon import reads from this cache rather than hitting upstream directly.
  • monitor-photon-data.yml — Daily at 08:22 UTC: checks photonImportDate from the prod /v2/info endpoint and alerts Slack if the data is older than 50h.
  • api-docs.yml — Lints the OpenAPI specs (v2 openapi.yml + v3 openapi3.yml) on every push/PR touching proxy/docs/** or the specs; on push to main publishes both API specs and the docs to the developer portal.

Most workflows post a Slack notification on failure. The reusable _generate-tag.yml and _deploy-and-test.yml workflows back the build/deploy jobs; shared build steps live as composite actions under .github/actions/.

Photon data artifacts (GCS)

Built artifacts live in the public bucket gs://ent-geocoder-prd/:

Prefix Contents
nominatim-data/ nominatim.ndjson.gz per build (+ .sha256)
nominatim-data-se/ Sweden variant
photon-data/ photon_data.tar.gz per build (+ .sha256)
photon-data-se/ Sweden variant
data-sources/ Daily-refreshed source files (written by cache-data-sources.yml)

Each build writes to <prefix>/<tag>/<filename>. The <tag> is generated once and shared between the docker image and the GCS upload, so geocoder-photon:<tag> always pairs with gs://.../photon-data/<tag>/photon_data.tar.gz. Pointer files at the prefix root track recent builds:

  • latest.txt — most recent build from any branch
  • latest-prod.txt — most recent build deployed to prod (written by photon-scheduled.yml)

The photon container fetches photon_data.tar.gz from $PHOTON_DATA_URL on startup, verifies its .sha256 sidecar, and writes a photon_data/.ready sentinel after extraction so in-place container restarts skip the download. CI derives the URL from the image tag in _deploy-and-test.yml and injects it into helm values; templates/photon-data-validation.yaml fails the helm render if it's missing.

Rolling back to a previous build:

# See available pointers and recent tags
curl -s https://storage.googleapis.com/ent-geocoder-prd/photon-data/latest-prod.txt

# Re-deploy a known-good image (the data is paired automatically)
gh workflow run photon-deploy.yml -f target='tst → prd' -f image_tag=<previous-tag>

90-day lifecycle rule (apply once per bucket):

{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "Delete"},
        "condition": {
          "age": 90,
          "matchesPrefix": ["nominatim-data", "photon-data"],
          "matchesSuffix": [".gz", ".sha256"]
        }
      }
    ]
  }
}

The matchesSuffix filter spares the latest*.txt pointer files.

Usage

Running locally

# Build geocoder
./gradlew build

# Download a photon jar
cd photon
./import/download-photon-jar.sh

# EITHER download source data, convert to nominatim.ndjson (downloads nominatim-converter binary automatically)
./import/create-nominatim-data.sh import/config/converter-prod.json -z

# OR download the latest nominatim.ndjson build by Github Actions
./download-latest-nominatim-data.sh

# Create the photon index
./import/create-photon-data.sh nominatim.ndjson.gz

# OR just download the latest Photon search index built by Github Actions
rm -rf photon_data
./download-latest-photon-data.sh

# Run Photon
./photon-start.sh

# Switch to a different terminal and start the proxy (or just run `no.entur.geocoder.proxy.AppKt` from your IDE)
cd ../proxy
java -jar build/libs/proxy-all.jar

Now try some example requests:

curl -s 'http://localhost:8080/v2/autocomplete?text=sk%C3%B8yen%20stasjon&size=20'
curl -s 'http://localhost:8080/v2/reverse?point.lat=59.92&point.lon=10.67&boundary.circle.radius=1&size=10&layers=address%2Clocality'

Adding &debug=true will also reveal native Photon results with importance (input weight) and score (calculated weight).

You can also access Photon directly:

curl -s 'http://localhost:2322/api?q=Berglyveien&include=layer.stopplace'

Or use the opensearch endpoint:

curl -s 'http://localhost:9201/photon/_mapping' | jq .       # Available fields
curl -s 'http://localhost:9201/photon/_doc/719158973' | jq . # Get document by ID

Debugging data in k8s / GKE

Accessing the opensearch queries in k8s:

kubectl --context dev port-forward geocoder-photon-85994c94dd-6lqhv -n geocoder 9201
curl -s 'https://geocoder-photon.dev.entur.io/api?q=ullerud' |jq  '.features[].properties.osm_id' |head -1
200127208213
curl -s 'http://localhost:9201/photon/_doc/200127208213' |jq -c "[._source.importance, ._source.name.default]"
[0.23010299956639815,"Ullerud terrasse"]

Verifying score and importance

We set the importance field in the Nominatim data, while score is calculated by Photon.

$ curl -s 'http://localhost:8080/v2/autocomplete?text=Oslo&debug=true&size=1' \
  | jq -c '.geocoding.debug.raw_data[] | [.localeTags.name.default, .infos.importance, .score]'
["Oslo",1.0,51.235104]
["Oslo lufthavn",0.347712,26.492702]
["Oslo S",0.330103,25.840235]
["Oslo bussterminal",0.330103,24.307642]

(Debug shows three more results than we ask for, see PhotonAutocompleteRequest.RESULT_PRUNING_HEADROOM)

Using a patched Photon version

Build and release patched Photon

  • Fetch Photon from source (https://github.com/komoot/photon) and make your changes
  • Build with ./gradlew build
  • Create a tag and push that (git push --tags entur) to EnTur's fork (https://github.com/entur/photon)
  • Draft a new release at https://github.com/entur/photon/releases/new
  • Click "Select tag" --> and select the tag name
  • Fill in release title and description
  • Add photon-<tag>.jar from Photon's target/ folder as a binary asset
  • Check "Set as a pre-release"
  • Publish the release
  • On the release page, right-click the photon-<tag>.jar asset and copy the link address

Update geocoder to use the patched Photon

Links

Grafana dashboards

Internal references

External references

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors