I spotted the SEOmoz blog entry about finding supplemental pages from a site and turned it into a tool that works across datacenters:
http://oy-oy.eu/google/supplemental/
It takes your domain, passes it as a "site:domain.com ***" query to Google and extracts the total "approximate" count as well the effective real count (if less than 1000, it goes to the last page of the results and checks).
What's also interesting in this regard is taking the same query and running it through the Google API (combined with my PR display): http://oy-oy.eu/google/searchpr/go.aspx -- some of those pages still show a toolbar PR value. To be honest, I have no idea what that would mean: is the displayed PR just outdated (should it be 0)? or can a page in the supplemental index have a PR number? or is the displayed PR just estimated from other existing pages? (probably) Ah, the joy of tools whos output you can't understand :-)
The main problem with a tool like this is that Google turns on a block when it sees lots of queries from the same server -- the tool more or less goes against the Google terms of service, which is not really such a good idea, but is the only way to get this data. Personally, I feel that I'm not doing automated queries since I do it in real time with the user triggering them - but that argument is probably moot since Google only sees a flood of queries from the server. I could trick them a bit more (perhaps spread the requests over several IPs or go through proxy servers) and possibly get a few more queries out of them before they "recognize" it again, but then again if they need / want to block mass queries then I'll just take what I can get until then.
What surprised me was that the fluctuations in supplemental URLs across the datacenters was much higher than the fluctuations in indexed URLs. To me that sounds like a sign that they are playing with the supplemental index, with perhaps vastly different settings. Also, the sometimes large difference in the actual count and the "about" count (my "bad data push correction") seems much higher with supplementals than with indexed pages (or links) - that would make sense however, since the supplemental index is probably not something they would give a higher priority to generate better approximations for.
The "Flux-Factor" is a rough and dirty value I calculate to determine how large the spread in numbers is. A high "flux-factor" could signify that things are in movement and that no stable equilibrium has been reached. It could also signify that Google is using / testing different settings (as I think is the case with the supplementals). To be exact, the flux-factor is the percentage of displayed datacenters that returns values outside of 15% from the average. Example: if the average is "100", +/- 15% would be 85 - 115. The flux factor would be the percentage of datacenters queried that returns either below 85 or above 115. I played around with other values such as variance, etc. but this one seemed to return the "best" results based on the number I have seen so far (and it's easy to calculate in javascript on the client-side).
Further discussion of this tool is at the best forum out on the web: cre8asiteforums.