pseudomonas

O2, in common with just about all mobile companies, has blocking. Unlike most, they helpfully provide a URL checker http://urlchecker.o2.co.uk/ where anyone can check if a URL is blocked. update: that page has been taken down "to ensure it's fit for purpose and provides transparent info to [O2's] customers".

There are three levels of blocking:

Open Access - what people who've asked for "no filtering at all" see.
Default Safety - what people who've signed up without expressing preferences see.
Parental Control - what people who've actively asked for a child-friendly device see.

Now, O2's Parental Control is a funny old thing. It allows http://www.mcdonalds.com but blocks http://www.childline.org. To be honest, it blocks most of the internet apart from a tiny number of mostly corporate sites. It allows amazon.co.uk but blocks amazon.com. We may never know why - this is all done by their unspecified third-party partner (rumour has it that this is probably Symantec).

Wikipedia seems to be an interesting case - it's allowed, but certain pages are blacklisted. This is all done very shoddily, if the URL checker is to be believed. So https://en.wikipedia.org/wiki/Penis is blocked but https://en.wikipedia.org/wiki/Penises is allowed, even though they both go to the same damn page¹ Also, they block Penis but fail to block Clitoris²

The choice of which pages to block on Wikipedia is interesting. A bit of playing around revealed that there wasn't much consistency; it looked like, rather than applying a classifier to every page, someone had made a list of a few pages with titles that seemed dodgy to them, and had called it a day. This seemed an ideal opportunity to find out what the spirit was behind the blocking, especially since they kindly tell us what the category of nastiness is.

Wikipedia has a nice list of the 5000 most visited pages. I ran them through the checker³ and made a list of the aberrations, sorted by category. Pages in more than one category will appear twice; if they're blocked in one and not in the other, they're still blocked to the user.

( cut for longish table )

Notice that their "lifestyles" category has only three items within the top 5000 Wikipedia pages⁴. What these have in common is left as an exercise for the reader. Whether that falls foul of the Equality Act 2010 is left as an exercise for the reader who knows about English law.

Notice also that for instance the list does not include the following top-5000 pages: Asexuality, Celebrity_sex_tape, Child_pornography, Homosexuality, Human_sexuality, List_of_female_porn_stars, List_of_Masters_of_Sex_episodes, List_of_pornographic_actresses_by_decade, Masters_of_Sex, Pansexuality, Pornhub, Pornographic_film_actor, Pornographic_film, Pornography, Revenge_porn, Same-sex_marriage, Same-sex_marriage_in_the_United_States, Sex, Unsimulated_sex, YouPorn ... and that's just in the top 5000 out of 4 million. Anyone who thinks the filter is effective is going to be very disappointed. And those are just some of the sex-related pages - they make no attempt to block pages about war, death, torture, or other potentially distressing subjects. Again, speculation about the mindset behind this is left to the reader.

This all reflects very badly on O2; but I think we should assume that the other ISPs are every bit as incompetent, until they present us with evidence to the contrary.

If anyone would like to help me with a similar but more extensive project for TalkTalk, BT or Sky, has a line with one of those ISPs, a willingness to give me SSH access to something at your end (probably helps with that bit if you're a wee bit tech-y), and a preparedness to turn the dreaded filters on for a bit, please let me know in the comments.

I've been ranting about this at more length at

pseudomonas

¹ Note as an aside that the URL checker claims to be able to tell the difference between the two httpS URLs. This is very worrying if it's true, but my suspicion is that it's not and the URL checker is just shoddily written and assuming they're plain http.
² Perhaps because they had problems finding it.
³ Actually, I misinterpreted how the URL checker dealt with encoding and ones with brackets, punctuation, apostrophes, and diacritics got skipped. Sorry.
⁴ Since you asked and to save you a click or two, "Bisexual" isn't in the top-5000 list of pages, but that Wikipedia page is indeed classified as "lifestyle".

Current Mood: angry

Upgoer-five stuff - if I used the COCA first 1000 distinct entries on http://www.wordfrequency.info/top5000.asp instead we'd lose the following 364 words:

( lost )

and we'd gain the following 399:
( gained )

The lists aren't the same length because of the difference in the way they treat words with apostrophes in - I've removed apostrophe-containing terms from both lists. Note also that the COCA list categorises words by their part of speech, so this is a bit approximate, and might exclude lexemes that score pretty highly in lots of parts of speech, but are not in the top 1000 in any single one. If I have time I'll find a way to combine them.

perl -le 'print "$_ + 1 = ", $_ + 1 for qw/milli micro nano pico/'

For the benefit of bassoonists, cellists, etc.

I did some fiddling with some perl and abcm2ps:

A Bass-clef version of Playford's Dancing Master (Treble) and A Bass-clef version of the Fiddler's Tune Book (Treble)

A Bass-clef version of Playford's Dancing Master and A Bass-clef version of the Fiddler's Tune Book (I don't think it automatically becomes the 'cellist's tune book) adjusted for 'cello, with pieces containing notes below a bottom C raised by an octave^*.

All based on Chris Partington's ABC versions

NEW!
Playford's Dancing Master for Viola and Fiddler's Tune Book for viola

Big Round Band sets for Viola ; 'Cello ; Bassoon etc.

^*perl -pe '$BEGIN{$/="\n\r\n"};$x=/[A-G]\,/?"D bass":"d bass+8";s/^(K:\w+)(\s*?[%\n])/$1 middle=$x$2/msg' plyfrd1.ABC | abcm2ps - -N1 --footer 'Based on Chris Partington'\''s transcription\nhttp://www.cpartington.plus.com' -O - | ps2pdf - Playford_dancing_master_bass_clef_cello.pdf
This is not guaranteed to work for all ABC, but it works for the subset found in this file.

perl -ne 'print if /^(\w*)[AEIOU](\w*)\1[AEIOU]\2$/i' sowpods.txt
( singsong flimflam mishmash )

perl -ne 'print if /^[^AEIOU]+[AEIOU](\w{3,})[^AEIOU]+[AEIOU]\1$/i' sowpods.txt is left as an exercise for the reader.

So, draft 1 of http://www.chiark.greenend.org.uk/~adamb/dw-thataway.pl.txt .

It runs through DW posts, finding LJ entries that have already been cross-posted or imported to DW, and for some subset of them based on things like their visibility, date, length, whether they contain polls, and so on, edits them so that their text is commented out (possibly with a snippet of the original post), adding a note saying that the post has been moved to such-and-such a DW post which is thataway.

I've tested it on my test account, and it seems to do what it's supposed to, but more testing is needed; if you're brave enough to be a guinea pig on this, bug reports would be most welcome.

Also, feel free to mine the script for snippets of code that work with the XMLRPC API. I haven't specified a license yet, but basically do what you like, and it'd be nice but not obligatory to credit me.

Does LJ::Simple work reliably with Dreamwidth? If not, is there a comparable module that does?

Also, is there a way that given a DW entry that's been either imported from, or crossposted to, LJ, I can get (programmatically) the ID or URL of the LJ entry, without resorting to comparing timestamps and post contents?

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

A stamped addressed antelope

Entries tagged with perlish

O2 vs Wikipedia - a quick look inside the minds of the folks who build the blockers.

up goer five wordlist noodling

eeeeeenteresting.

Bass-clef versions of tunebooks (ETA: Now with added alto!)

(no subject)

DW-thataway

DW Perl module client

Profile

November 2024

Syndicate

Page Summary

Active Entries

Expand Cut Tags

Style Credit