24-11-2016 | Remy van Elst | Text only version of this article
I read an article on The Register (via) regarding the Investigatory Powers Bill. The part were ISP's are forced to save their customers browsing history for a year is the most horryfing part, just as that whole bill. Let's hope the political process and organizations like the Open Rights Group and the EFF have enough lobbying power to change people's minds. If that fails, then we can all try to overflow the logging. Just as some people put keywords in their mail signatures to trigger automatic filters and generate noise, we should all generate as much data and noise as possible. This way the information they do gather will not be usefull, it will take too much time, storage and effort to process it and thus the project will fail. 2 years ago I wrote a small Python script which browser the web for you, all the time. Running that on one or two Raspberry Pi's or other small low power computers 24/7 will generate a lot of noise in the logging and filtering.
The article on The Register is very well written and has a lot of information on the bill. I want to quickly recap why mass Surveillance is a bad idea, both in practice as well as the idea itself.
The Dutch organization Bits of Freedom has a few good articles on European Data retention laws, on the Amsterdam Municipal Register during WWII (having nothing to hide) and more examples of big data usage. Bruce Schneier also has written on espionage vs. mass surveilance. The blog of Bits of Freedom (english) contains a lot of great articles.
I myself try not to get involved with the political side of issues like this, since other people, like the EFF, BoF and the ORG are much better on that front. I do however am technical enough to try to work against this bill.
Just as in the early days where people put the word Bomb in their email signature to trigger the filters and generate noise, we use a script to simulate real browsing 24/7. If you have a Raspberry Pi or other small, low power linux computer, it's perfect for this purpose.
3 years ago I wrote a script, browsa.py. It is a small python script which uses BeautifulSoup4 to browse websites. It saves all the links and chooses a random link to visit next. If there are no links, on of the previously saved links is selected. There is also a fallback list which includes
news.google.com and such. Between every page there is a small delay to simulate a real user and the user agent is set to a Firefox version.
Since the URL it chooses is random, you will never know where you end up, thus generating a lot of random data.
Make sure you have Git, Screen and Pip installed
apt-get install python2 python2-pip git screen
Clone the repository:
git clone https://github.com/RaymiiOrg/browsa.git cd browsa
Run the script with a starting URL:
python2 ./browsa.py https://news.google.nl
# Info: Count 1 # Info: Downloading URL: https://news.google.nl # Info: URL's on page: 189 # Info: New URL: http://www.rtlnieuws.nl/gezondheid/ziekenhuizen-limburg-overvol-operaties-afgezegd # Info: Count 2 # Info: Downloading URL: http://www.rtlnieuws.nl/gezondheid/ziekenhuizen-limburg-overvol-operaties-afgezegd # Info: URL's on page: 47 # Info: New URL: http://www.rtlnieuws.nl/kleding-presentatoren # Info: Count 3 # Info: Downloading URL: http://www.rtlnieuws.nl/kleding-presentatoren # Info: URL's on page: 46
Sometimes the script hangs or crashes. You can start up a screen session and run the script for a set amount of time, 5 minutes, then restarting it automatically:
screen while true; do timeout 300 python2 ./browsa.py https://news.google.nl; done
If you run this 24/7, the logging on you will be so large, it will be very hard to get anything usefull out of it. Plus, if enough people do this, there will be so much useless data that the ISP's, who have to save it, will complain and demand money from the government for the storage. (Or they will raise their prices to their customers...).