Scaling out for better performance
07 June 2020
If all went well you shouldn’t have noticed anything. Of the big migration that is. You should be able to notice the more stable and faster user interface of course!
Up until now most performance problems could be solved by just running on a bigger server. More memory, more CPU and problems went away. I always knew this would not last forever and at some point we’d be scaling out instead of up. So, fortunately, things were prepared.
Web and DB servers are split now, and if the backend gets busy the frontend will still respond fast. Most database clusters deep down just are fancy ways of serialising writes to one DB node and spreading reads over other DB nodes. ShadowTrackr now handles this at the application level for even better performance. At every DB query both frontend and backend specify if it needs to be a write query (done on the master DB node) or a read query (done on a slave DB node). This freed up much CPU. The many small servers now perform way better than the few big servers before the scaling out.
On top of this, the backend nodes spread around the world now have a shared cache. This reduces lookups to the central databases, and also reduces the number of queries the nodes send out to external APIs.
So, lots of improvements. Next up is the ShadowTrackr API, we’ll be adding functionality to add assets and query scan results.
Adding and removing assets through the API
03 May 2020
I love automating stuff. If you do this properly from the start you can do so much more work in so much less time. Really, any task you do more than twice should be automated if possible.
ShadowTrackr power users that want to automate things can now add and remove asset through the API. In bulk that is. Just throw a mixed list of urls, ips and subnets at it and it will validate, deduplicate and add it for you. Check out the details in the
API documentation.
And if you have any cool API idea I’m always happy to hear them. Have fun!
Website scanning in-depth
19 April 2020
Scanning a website seems easy, and it is if you just do a one-off, single scan for a url.
Things get more interesting when you host your website on multiple servers (for better performance or reliability). You probably also have both ipv4 and ipv6 addresses available. Your website runs on HTTPS, and you want your visitors to be able to find you without typing in the protocol too. So you also have HTTP configured. That’s two protocol versions, on two versions of ip addresses and maybe multiple hosts.
Some websites run in the cloud. You can limit this to a specific cloud in a specific country (which most governments do with their websites), but you can also have the cloud provider figure out what the best spot is. If you do this with Azure, your website will get served from the nearest Azure cloud. ShadowTrackr has nodes all over the world, and this means we’ll be able to detect your website in multiple clouds. That’s on purpose of course, but it does complicate things.
Then there are CDNs like Cloudflare and Akamai. You host your website on a server where the CDN can reach you, and they handle all your visitor requests. You’ll need a trick to point your visitors at the CDN of course, and this is where it gets ugly for scanners. There are multiple ways of doing this and these can be mixed and matched. On top of this some CDN hire subcontractors that are really hard to attribute and you might end up detecting Vodafone instead of Akamai.
It seemed so easy to scan a website, but in practice it can get really complex. The goal has always been for ShadowTrackr to detect
all your website instances on
all internet-reachable hosts, including clouds and CDNs. I had underestimated how complex this is and did not achieve the goal from the start. After getting it wrong a couple of times, this week’s update features a much improved algorithm. This might result in a storm of new websites being found on your timeline. I’m on it and regularly clean things up until they are all properly ingested and monitored.
If you do find irregularities, or have any other questions,
drop me a line.