pseudomonas: "pseudomonas" in London Underground roundel (Default)
O2, in common with just about all mobile companies, has blocking. Unlike most, they helpfully provide a URL checker http://urlchecker.o2.co.uk/ where anyone can check if a URL is blocked. update: that page has been taken down "to ensure it's fit for purpose and provides transparent info to [O2's] customers".

There are three levels of blocking:

Open Access - what people who've asked for "no filtering at all" see.
Default Safety - what people who've signed up without expressing preferences see.
Parental Control - what people who've actively asked for a child-friendly device see.

Now, O2's Parental Control is a funny old thing. It allows http://www.mcdonalds.com but blocks http://www.childline.org. To be honest, it blocks most of the internet apart from a tiny number of mostly corporate sites. It allows amazon.co.uk but blocks amazon.com. We may never know why - this is all done by their unspecified third-party partner (rumour has it that this is probably Symantec).

Wikipedia seems to be an interesting case - it's allowed, but certain pages are blacklisted. This is all done very shoddily, if the URL checker is to be believed. So https://en.wikipedia.org/wiki/Penis is blocked but https://en.wikipedia.org/wiki/Penises is allowed, even though they both go to the same damn page1 Also, they block Penis but fail to block Clitoris2

The choice of which pages to block on Wikipedia is interesting. A bit of playing around revealed that there wasn't much consistency; it looked like, rather than applying a classifier to every page, someone had made a list of a few pages with titles that seemed dodgy to them, and had called it a day. This seemed an ideal opportunity to find out what the spirit was behind the blocking, especially since they kindly tell us what the category of nastiness is.

Wikipedia has a nice list of the 5000 most visited pages. I ran them through the checker3 and made a list of the aberrations, sorted by category. Pages in more than one category will appear twice; if they're blocked in one and not in the other, they're still blocked to the user.

cut for longish table )

Notice that their "lifestyles" category has only three items within the top 5000 Wikipedia pages4. What these have in common is left as an exercise for the reader. Whether that falls foul of the Equality Act 2010 is left as an exercise for the reader who knows about English law.

Notice also that for instance the list does not include the following top-5000 pages: Asexuality, Celebrity_sex_tape, Child_pornography, Homosexuality, Human_sexuality, List_of_female_porn_stars, List_of_Masters_of_Sex_episodes, List_of_pornographic_actresses_by_decade, Masters_of_Sex, Pansexuality, Pornhub, Pornographic_film_actor, Pornographic_film, Pornography, Revenge_porn, Same-sex_marriage, Same-sex_marriage_in_the_United_States, Sex, Unsimulated_sex, YouPorn ... and that's just in the top 5000 out of 4 million. Anyone who thinks the filter is effective is going to be very disappointed. And those are just some of the sex-related pages - they make no attempt to block pages about war, death, torture, or other potentially distressing subjects. Again, speculation about the mindset behind this is left to the reader.

This all reflects very badly on O2; but I think we should assume that the other ISPs are every bit as incompetent, until they present us with evidence to the contrary.

If anyone would like to help me with a similar but more extensive project for TalkTalk, BT or Sky, has a line with one of those ISPs, a willingness to give me SSH access to something at your end (probably helps with that bit if you're a wee bit tech-y), and a preparedness to turn the dreaded filters on for a bit, please let me know in the comments.

I've been ranting about this at more length at [twitter.com profile] pseudomonas

.


1 Note as an aside that the URL checker claims to be able to tell the difference between the two httpS URLs. This is very worrying if it's true, but my suspicion is that it's not and the URL checker is just shoddily written and assuming they're plain http.
2 Perhaps because they had problems finding it.
3 Actually, I misinterpreted how the URL checker dealt with encoding and ones with brackets, punctuation, apostrophes, and diacritics got skipped. Sorry.
4 Since you asked and to save you a click or two, "Bisexual" isn't in the top-5000 list of pages, but that Wikipedia page is indeed classified as "lifestyle".
pseudomonas: "pseudomonas" in London Underground roundel (Default)
Upgoer-five stuff - if I used the COCA first 1000 distinct entries on http://www.wordfrequency.info/top5000.asp instead we'd lose the following 364 words:

lost )

and we'd gain the following 399:
gained )

The lists aren't the same length because of the difference in the way they treat words with apostrophes in - I've removed apostrophe-containing terms from both lists. Note also that the COCA list categorises words by their part of speech, so this is a bit approximate, and might exclude lexemes that score pretty highly in lots of parts of speech, but are not in the top 1000 in any single one. If I have time I'll find a way to combine them.

pseudomonas: "pseudomonas" in London Underground roundel (Default)
Is there a simple way in Linux to (preferably as a non-root user) get either the MAC address or SSID of the access point I'm currently connected to? (I want to set up some communication jobs things so that they only run when I'm connected to my home network; these aren't security things so I don't need to worry about people spoofing stuff)
pseudomonas: "pseudomonas" in London Underground roundel (Default)
OK, I know it's really just [personal profile] hatam_soferet that still needs convincing, but anyway:



From here, via [twitter.com profile] fanf. Available here if the video above doesn't like your location.
pseudomonas: "pseudomonas" in London Underground roundel (Default)
I think it'd be fun to build a web page where one could put in text in a language where there's a consistent mapping between orthography and pronunciation, and get out some IPA. Has this been done already?
pseudomonas: My rat is confused by technology (technology)
Livejournal announces that they're going to have a round of purging and reselling account names of people who've deleted their journals, suspended journals, and "inactive" journals*.

ETA: I misread that - they're currently purging deleted accounts as before, what's new is that they'll also be purging suspended and "inactive" ones, and purging the deleted ones only 30 days after deletion, not 60.

As I understand it, if you have an LJ and have one of these accounts as your friend, you do not need to do anything, they will automatically be removed from your friendslist.

But on other sites, like, for instance, Dreamwidth, if you have granted access via OpenID to an account whose name is then resold, whoever buys it will gain access to your locked posts.

If you don't want that to happen, the only way to prevent it (short of the ideal of getting people not to delete their accounts even if they stop using them) is to remove access from the OpenIDs of such journals. Note that if you used the Dreamwidth importer, you might have granted OpenID access to a large number of people - you can manage the details here.

Please feel free to copy/link this around the place.

Note also if you buy an account name and the previous owner has gone round getting the OpenID of the account banned in lots of places, you're stuck with that too

*An inactive LJ journal is apparently one with only one post that's not been logged into for 24 months. If you have any placeholder accounts on LJ, you may want to check that this does not apply to them.


ETA: LJ is taking steps to disable OpenID on resold names as an interim solution (thanks [personal profile] andrewducker). This is a big improvement (unless you've bought one of these names, in which case it's a PITA), but I'll be interested to see how they deal with this long-term.

Profile

pseudomonas: "pseudomonas" in London Underground roundel (Default)
pseudomonas

November 2024

S M T W T F S
     12
34567 89
10111213141516
17181920212223
24252627282930

Syndicate

RSS Atom

Most Popular Tags

Expand Cut Tags

No cut tags
Page generated Jun. 29th, 2025 02:30 am
Powered by Dreamwidth Studios

Style Credit