Topology Preserving Simplification

04 Mar 2014 Comments

This is a reprint that was originally published as a Boundless blog post.

OpenLayers 3 aims to push the boundaries when it comes to rendering vector data in a browser. Whether you’re editing a stream network, exploring congressional district boundaries, or just viewing point markers on a world map, we want you to be able to see and interact with beautifully rendered vector data even when dealing with large, complex datasets. To achieve this, the library needs to be very efficient at simplifying geometries without sacrificing the visual integrity of the data.

When a geometry has many vertices per device pixel, it is best not to waste a user’s time by trying to draw every one of them. Developers typically try to solve this problem by simplifying polygons with an algorithm based on Douglas–Peucker (although Visvalingam has not completely been overlooked). However, these line simplification algorithms are not suitable for polygon simplification when topology needs to be preserved — meaning when geometry rules like “must not overlap” and “must not have gaps” need to be enforced.

Considering Douglas-Peucker

If, for example, you are rendering polygons that represent administrative boundaries, and your audience is going to be surprised to see gaps between adjacent boundaries, you’ll need to consider a simplification algorithm that preserves topology. The graphic below shows national borders simplified with a Douglas-Peucker–based algorithm at a few different tolerance levels.

Douglas-Peucker

I’ll use units of device pixels when talking about simplification tolerance. While these are translated to map units before simplifying, it is easier to reason about pixels when discussing rendering. As shown above, simplification using Douglas-Peucker results in a visual loss of topology at the 2px tolerance level. At a 4px tolerance, the method is clearly unsuitable for simplification before rendering. Faced with these results, you might conclude that the method should just use a 1px tolerance or below and then move on.

An aside on frame budget

When animating things like map panning or zooming, having a constant frame rate prevents the viewer from experiencing stuttering map movements. On capable devices, sixty frames per second is a good goal, but it leaves a frame budget of only 16.7 milliseconds. Within that budget, you will need to access all features for rendering, calculate all symbolizers to be used, do any geometry simplification, and then do the actual drawing. The Douglas-Peucker simplification above takes about 35ms to run on a moderately “low resolution” set of features with a capable browser. Since this completely blows the target frame budget, simplification during animation is out of the question.

The answer, of course, is to cache the simplified geometries (or the draw instructions themselves) and use the cached results during animation. However, this presents a new problem. During animated zooming, simplified geometries that looked okay at the original resolution may show gaps and overlaps at a 2x or 4x higher resolution — this is sort of like what you see going from the 1px simplification to 4px in the graphic above.

Quantization-based simplification

So, to avoid extra work rendering visually insignificant vertices without artifacts produced by Douglas-Peucker, you’ll need to consider an algorithm that preserves topology.

Quantization is basically the process of taking a continuous (or large) set of values and mapping them to a discrete (or smaller) set of values. Rounding to the nearest integer is an example of quantization. The reason to consider quantization as the basis for geometry simplification is that this repeatable mapping of continuous to discrete values preserves topology - two geometries that share coordinate values before simplification will still share values after. As mentioned earlier, I’m concerned primarily with “must not overlap” and “must not have gaps” type topology rules. Quantization will change the topology in other ways (rings may no longer have area), but it will respect topology in this aspect: if adjacent geometries don’t have gaps before, they won’t after. Same goes for overlaps.

Simple Quantized

While quantization at the 4px tolerance level above generates undesirable results, the 1px and 2px levels look good. Quantization itself is a form of simplification. But we can take it a bit farther without doing much work. The simplification steps are basically these:

quantize coordinate values given some tolerance
drop duplicate and collinear points (when more than 2)

This quantization based simplification can be performed in a single pass over the original coordinate values. For similar data sets and tolerance, the method is considerably faster than Douglas-Peucker ^[1]. It is also clearly less aggressive at simplifying ^[2]. However, it accomplishes our goals. Quantization based simplification avoids rendering vertices that don’t have a significant visual impact while preserving the kind of topology we care about.

Quantization based simplification is itself rather simplistic. While it is possible to maintain topology with methods like Visvalingam’s line simplification, this requires that you have a known or fixed set of vertices before simplifying. This doesn’t work if you are progressively loading and rendering data or otherwise need the simplification process to be repeatable for a single geometry regardless of which other geometries it is rendered with.

See for yourself!

Take a look at the OpenLayers 3 vector rendering examples to see this topology preserving simplification in action — and let us know if you find any gaps or overlaps in adjacent polygons!

[1] Quantization-based simplification performed 2–7x faster than Douglas-Peucker on the same data sets with the same tolerance in unscientific benchmarks performed while writing this post.

[2] With a tolerance of 80,000m, the Douglas-Peucker method reduced the original number of vertices in the sample dataset by 70% while the quantization based method reduced the number by 27%. The 80,000m tolerance corresponds to about 2 pixels at zoom level 2 in the common XYZ tiling scheme.

Mocking the file system

17 Feb 2014 Comments

When writing tests, it’s common to run up against code that relies on file system access. People often end up including a bunch of test fixtures along with their code. Tests might create some temporary space, copy fixtures in place, run code, make assertions about what was read or written, and then clean up. The result can be fragile when working across multiple operating systems and leads to long test run times. In cases like this, a mock file system provides an alternative.

The `mock-fs` module

In Node, the fs module provides file system access. The mock-fs module allows the built-in fs module to be temporarily backed by an in-memory file system during testing. The code below shows how a mock file system can be set up with a few directories and files:

var mock = require('mock-fs');

mock({
  'path/to/dir': {
    'file1': 'file contents here',
    'file2': new Buffer([8, 6, 7, 5, 3, 0, 9])
  }
});

With the mock file system in place, any code that uses the fs module will have access your fake files.

var assert = require('assert');
var fs = require('fs');

fs.readFile('path/to/dir/file1', function(err, data) {
  assert.equal(String(data), 'file contents here');
});

And when you’re done working with mocks, you can restore access to the real file system.

mock.restore();

The mock() function allows you to create directories, regular files, or symlinks with additional configuration options. For example, you can set the file modification time, user or group id, permissions, and more. On POSIX-compliant systems (where process.getuid() and process.getgid() are available), file access is restricted based on the mode setting. See the readme for more detail on usage.

How it works

The methods exported by Node’s fs module delegate to methods provided by the native bindings. For example, fs.open() and fs.openSync() call binding.open(). The mock-fs package implements the binding methods with an in-memory file system.

When you require('mock-fs'), an internal copy of the fs module with a settable binding is loaded. Methods from this modified module are copied to the fs module’s exports, making it so that any other modules that require('fs') get methods that are backed by a configurable binding. From then on, calls to mock() set the binding to the in-memory backed one, and calls to mock.restore() set the binding back to the original, native implementation.

AngularJS Whitespace Guide

04 Dec 2013 Comments

By choice or otherwise, you may someday find yourself a slave to a strict linter. This can lead to headaches with AngularJS applications in particular because they tend to have many chained calls and functions with many arguments. While there are other guides on structuring AngularJS applications that address things like directory structure, I thought the web was lacking a guide on satisfying strict whitespace linters in AngularJS apps.

So here’s my take on formatting your AngularJS modules in a way that satisfies both JSHint and gjslint.py.

Why Pull Requests?

11 Sep 2013 Comments

I work on a number of GitHub hosted projects where all changes to the default branch (typically master) are accompanied by a pull request (e.g. ol3). Typically, the pull request serves as a place where changes get reviewed and eventually approved by other contributors. But we also use pull requests for “trivial” changes that don’t typically get reviewed. A conversation came up at work about this topic, and I thought I’d share my thoughts. This is specific to GitHub’s handling of pull requests, but may be relevant to other hosted services as well.

GitHub provides a way for people to get notification about pull requests. The same is not true for merges or other commits. So opening pull requests, even if it means merging without review, allows people to keep up without reading all the commit logs.

Similarly, if you never merge into the default branch without a pull request, the GitHub API provides a fairly easy way to generate a simple change log. I say fairly easy because while the API doesn’t let you filter merged pull requests by date, you can filter closed issues by date. The full git log often provides more detail than you want, but the list of closed issues can be an effective summary of activity.

On top of that, a pull request provides a single place where all comments associated with any commits in a merge are collected. You don’t have to comment directly on the pull request, just on individual commits. If a set of changes ever needs to be revisited, the pull request is a good place to return to for a transcript of the discussion.

Older Posts

Sublime Text 2 and OpenLayers 3	15 Jan 2013	Comments
CommonJS Modules with Rhino	17 May 2011	Comments
Asynconsistency	08 Apr 2011	Comments
Canvas Hit Detection	31 Mar 2011	Comments
Light Cycle (Redux)	29 Dec 2010	Comments
Cr-48 Chrome Notebook (Charging)	14 Dec 2010	Comments
Is This Thing On?	08 Apr 2010	Comments

tschaub@work

Topics

Recent Posts

Topology Preserving Simplification

Considering Douglas-Peucker

An aside on frame budget

Quantization-based simplification

See for yourself!

Mocking the file system

The `mock-fs` module

How it works

AngularJS Whitespace Guide

Why Pull Requests?

Older Posts

Topics

Recent Posts

Considering Douglas-Peucker

An aside on frame budget

Quantization-based simplification

See for yourself!

The mock-fs module

How it works

Older Posts

The `mock-fs` module