I like to keep my GPS pre-loaded with as many geocaches as possible so if I’m out and about with a few minutes to kill, I have info on what’s nearby. The phone application seems like it would be ideal for this, since it can query on-the-fly, but the implementation has a lot of problems, the subject of another blog entry. I’ve run into some technical issues keeping my caches up to date. Put simply, I want to avoid looking for a geocache that’s known to not be there. I also want to keep updated on new caches that pop up.
Queries radiate from a specified point — home, another geocache, or coordinates. They can be filtered by difficulty, terrain, state/country, cache type, container type, how old it is and whether I’ve already done them. If the cache owner has set attributes of their cache (and often they do not), there is a rich set of attributes to filter on. Filtering is the data geek’s playpen.
Wait – there’s a cache that requires maracas?
One of the first things I tried was creating a bunch of radius queries based in different areas I expected to visit:
|North Seattle||South Seattle|
Because it gets dense in some places, I had to fiddle around to cover the region I do most (95%) of my obsessive Tupperware hunting. It takes 8-10 PQs to cover this area:
A Ninja would do it in two points before silently killing you.
Once I got this much down, there are three more problems:
GC.com lacks a convenient way to maintain alternate coordinates for a cache. My most pressing need for this is in maintaining solved puzzle coordinates. (It should be no surprise to anyone that knows me that I am even more obsessive about solving puzzles. Currently, ? caches are an abnormally high percentage of puzzles caches found. I have another few hundred solved-but-unfound.) The only practical alternative is to maintain a separate, offline database, and use bookmarks (or the “notes” field) to store updated coordinates in case the database has issues. The database itself isn’t a problem, since it’s a lot faster to work on locally. Ideally, I would just make the entry in one place.
While I can directly query for caches that are inactive, I cannot search for caches that have been archived and permanently removed from play. (Caches are archived when there’s a maintenance issue that are not being addressed (the owner’s inactive, over-committed, etc), the owner wants to free up a spot, there are placement issues (with new caches, this is most commonly it being on private property without permission), or the owner commits geocide.) The only mechanical solution is induction. Because archived caches are excluded from pocket queries, caches that aren’t updated after a week are likely down for some reason and you’d just delete them from your offline database. An alternative approach is to set up email alerts, but these are only per cache type and for a specific radius.
Caching outside the polygon. On longer trips with other my friends, like to Bellingham two weeks ago, we planned it. With more frequency, we’re finding the collective will want to hit, say, a cluster of caches south of Auburn, just beyond my polygon. To address this, I’ve been looking at ways of extending the polygon, eliminating the overlap among cache circles without adding additional queries and affecting the frequency that the entire database gets refreshed.
An idea I got from another cacher is to do range-based queries, filtered by placement date. This is a great idea because it completely eliminates overlap (caches can have one and only one placement date). Furthermore, I don’t have to guess a good centerpoint – I just use my house. I just need to the date ranges to keep the query results below 1000 caches. The older queries will always decrease
I experimented with three different radiuses, shifting dates as necessary to minimize the number of queries and keeping the caches below 1000 (so I could be certain I had everything).
Using a radius of 25 miles from Issaquah (the blue circle, aka “baby bear”) I can get everything (~3500 caches) with four queries:
Date range # caches 05/01/00 - 04/30/07 971 05/01/07 - 06/30/09 972 07/01/09 - 08/31/10 959 09/01/10 - present 575
This is doable in a day – and good for a quick update of the most local caches. The downside is I have whacked my radius of interest on two sides.
For fun, I looked at an 80 mile radius (the red line, aka “papa bear”). This covers a huge swath of Puget Sound, some ~11,400 caches. It can be done in twelve queries:
Date range # caches 01/01/00 - 10/31/04 983 11/01/04 - 02/28/06 986 03/01/06 - 02/25/07 999 02/26/07 - 12/31/07 993 01/01/08 - 08/31/08 999 09/01/08 - 04/15/09 992 04/16/09 - 09/25/09 995 09/26/09 - 02/24/10 997 02/25/10 - 06/29/10 999 06/30/10 - 10/09/10 991 10/10/10 - 03/20/11 991 03/21/11 - present 471
Fun, but it’s a lot of wasted querying, especially since it’s fetching stuff on the peninsula. With a 50 mile radius (the green line, aka “mama bear”), I use nine queries for ~8,300 caches:
Date range # caches 05/01/00 - 07/31/05 984 08/01/05 - 03/30/07 997 03/31/07 - 01/31/08 994 02/01/08 - 01/31/09 969 02/01/09 - 08/31/09 996 09/01/09 - 03/17/10 989 03/18/10 - 09/03/10 988 09/04/10 - 03/27/11 990 03/28/11 - present 362
This seems like a good compromise, since it’s >99% of my cache radius and can be completed in less than two days, especially if I rerun the last query (3/28/11 – present) every day to pick up the most recent entries. Since I can manually filter out the peninsula, my polygon now starts to reach near Tacoma while still staying near 4000 caches in the GPS.