2011-07-17 Archive Fever: A Crowd-Sourced Investigation into Library Catalogue Classification of WikiLeaks as an "Extremist Web Site"

NB: See end of story for updates, now including an official reply by The National Library of Australia & the US Library of Congress. Additional contact details for National Library and Archives Canada as well as the US Library of Congress have been added.

Submitted by @nyxpersephone.

This article would not have been possible without the help of many Twitter users, most notably @Asher_Wolf, @carwinb, @CassPF, @dexter_doggie, @issylvia, @jaraparilla, @JLLLOW, @m_cetera, @NOH8ER.

A Cataloguing-in-Publication (CiP) record is what you usually see on the second page of a book, right after the title page. It is similar to the catalogue record of a book in a library and contains basic information on the book, such as the author's name and the title of the book. It also contains "keywords" ("subject headers") that may be used by librarians and other information professionals to classify the book in their collections.

CiP records are usually provided upon request by national libraries and/or national bibliographic databases, such as the National Library of Australia (NLA) and the Library of Congress (LOC). So the CiP is a useful thing, albeit rather boring and usually only of interest to librarians and information professionals.

However, in the case of WikiLeaks and Andrew Fowler's book "The Most Dangerous Man in the World", the CiP makes for an interesting story. Examining the keyword section headers of the CiP-record on Fowler's book, one cannot help but notice the last one: "Extremist web sites".

Twitter user @nyxpersephone realised this and posted a screen grab of the Google Books view of Fowler's book to yfrog and twitter. The tweet got picked up and was retweeted by other users, using the #wlcat hashtag. A lively discussion on the reasons and implications of this subject occurred (discussion archived at Chirpstory and Pastebin).

Following the initial findings, confusion reigned on the Twittersphere as it was hard to understand why the keywords "extremist web sites" would be used in relation to WikiLeaks.

Twitter user @CassPF suggested contacting the National Library of Australia to enquire about the process of cataloguing and their choice of keywords. Another Twitter user, @jaraparilla, asked whether the David Leigh and Luke Harding book "WikiLeaks: Inside Julian Assange's War on Secrecy" was similarly classified. Several twitter users, most notably @asher_wolf, @issylvia and others then confirmed that this was not the case - the keywords used for the Leigh/Harding book included "Official Secrets" and "National Security".

Shortly later, @CassPF, a digital archivist, pointed out that The National Library of Australia apparently uses the US Library of Congress' subject headers/ keywords to catalogue their CiP records. Taking a closer look at the US Library of Congress' Subject Headings website for the entry "Extremist Web sites", one can see that the references given point to web sites inciting hate crimes, such as neo-fascist and jihadist web sites, radicalising immigrants and citizens in Western countries.

Based on @jaraparilla's question above, other books relating to WikLeaks were checked for their CiP records subject headers to see if they were also classified under "Extremist web sites". The results were quite surprising, even disturbing. Daniel Domscheit-Berg's "Inside WikiLeaks" was listed twice in the NLA catalogue, as a US edition and an Australian edition. The US edition did not have the "extremist web sites" keyword in its catalogue entry, while the Australian edition did.

A search with the parameters subject of "Extremist web sites" yielded four results in the catalogue of the NLA - all of which were related to WikiLeaks:

- the two books by Fowler and Domscheit-Berg,
- a book on WikiLeaks by an Indonesian author, and
- Julian Assange's as-yet unpublished autobiography, with the working title "WikiLeaks Versus the World: My Story".

Meanwhile, a similar catalogue search for "Extremist web sites" in the US Library of Congress only returned two hits: "Inside WikiLeaks" by Domscheit-Berg, and the records of a US Congress hearing on the strategy for countering jihadist websites (thanks to @issylvia for that search).

In the Library and Archives of Cananda, 17 items were found when searching for WikiLeaks, 4 of which have the "Extemist web site" subject header. These four are all copies of Domscheit-Berg's "Inside WikiLeaks".

The British Library, Deutsche Nationalbibliothek und Nationalbibliographie as well as Bibliothèque Nationale de France apparently do not use the US Library of Congress' Subject headers, so this particular subject heading cannot be found in either one of those institutions.

Surprisingly enough, there are other books related to WikiLeaks that are not classified with the "Extremist web sites" tag. Among them are Marcel Rosenbach's & Holger Stark's "Staatsfeind WikiLeaks", Alexander Star's & Bill Keller's "Open secrets", and Micah Sifry's "WikiLeaks and the Age of Transparency".

"Underground: Tales of Hacking, Madness and Obsession on the Electronic Frontier" by Suelette Dreyfus & Julian Assange has not been catalogued as "Extremist web sites" by the NLA or Library of Congress either.

These findings left those engaged in the discussion with mixed feelings. The following questions were raised:

1.) Who decides on and approves of the Subject Headers for the Cataloguing-in-Publication records?

2.) How objective and politically neutral is the choice of Subject Headers for Cataloguing, considering that the LOC is a US government institution that has blocked access to WikiLeaks?

3.) Why have The National Library of Australia and Library of Archives of Canada chosen to adopt the US Library of Congress' Cataloguing-in-Publication scheme, whereas The British Library hasn't?

4.) What are the potential implications of a book being labelled "extremist"?

5.) Will the subject headers for a CiP record be revised at one point, given that the first edition of Fowler's book is already out on the market and has the "Extremist web sites" subject heading in its CiP record?

It is important to note a few issues arising from these discussions:

1. The print and electronic editions of Andrew Fowler's book "The Most Dangerous Man in the World" as published by Melbourne University Press, have been permanently tagged with the label "Extremist web sites" (as @CassPF remarked).

Whereas it may be rather simple for the publisher to update the ebook version, the hardcopy edition cannot be easily revised and most certainly will not be pulled and revised due to a "simple" controversial subject heading.

This permanent record of a choice of a subject heading made by the NLA and the US Library of Congress might be of historical value decades from today, as it reflects the mind-set of NLA's/ LOC's cataloguing-in-publication units at that specific point in time (remember that William S. Burrough's "Naked Lunch" was banned for obscenity some 50 years ago but has been freely available in book stores for quite a while).

2. It is also a legitimate and important question to ask what the potential implications of this issue might be, as has been suggested above. @Asher_wolf wondered whether a book tagged with the "Extremist web sites" heading would be considered for purchase by high school librarians, while @CassPF pointed out that in the case of WikiLeaks, it is important to "preserve the meta-story accompanying it's core material; without prejudice or misinformation".

Furthermore, considering that a secret Grand Jury investigation on WikiLeaks and Julian Assange is currently underway in Alexandria, Va., the "extremist web sites" tag for a book associated with Julian Assange and WikiLeaks may also prove quite damaging, as @Asher_wolf stated.

3. Lastly, this choice of a subject heading does not appear to be politically neutral, which should ideally be the case in classification schemes in libraries and archives, as @nyxpersephone pointed out.

The search results for "extremist web sites" in NLA's and Library and Archives Canada's catalogues yielded four results each, all of them connected to books on WikiLeaks, while only two catalogue entries (one unrelated to WikiLeaks) were obtained for a similar search in the US Library of Congress catalogue (the largest library in the world, in numbers of items held). Based on these facts, one cannot help but wonder whether political neutrality of those library collections is still ensured.

Especially in the case of the NLA, with four different books being marked with the "Extremist Web sites" tag, an enquiry should be made with the person(s) in charge of cataloguing asked to check whether it is possible to rethink their keywords/subject headers attributions.

Andrew J Fowler, the author of "The Most Dangerous Man in the World", was contacted by several Twitter users on 16 & 17 July to seek his opinion on this matter. Mr Fowler replied that "this is news to him and that he will investigate" the issue.

Additionally, the publisher of Mr Fowler's book, Melbourne University Press, an imprint of Melbourne University Publishing, has been contacted via email by the author of this article to seek comment and clarification on the CiP record.

The National Library of Australia has also been contacted for comment and clarification on their process of choosing subject headers for Cataloguing-in-publication records and for their own catalogue. Comments and statements have also been sought from the US Library of Congress and Canada Library and Archives. At the time of publication, no replies were received from either institution (the story will be amended/updated appropriately as soon as we hear back from them).

Meanwhile, feel free to contact the NLA, LOC and Canada Library and Archives as well as Melbourne University Publishing to make your own enquiries using the contact details below.

The National Library of Australia's Cataloguing-in-Publication division can be contacted for enquires on existing CiP records via an online form and via postal mail at:

Cataloguing in Publication Unit
National Library of Australia
Canberra ACT 2600

Telephone: +61-(0)2-6262-1458

Fax: +61-(0)2-6273-4492

NOTE: The ISBN number of Fowler's book is: 978-0-522-858-662 (paperback) & 978-0-522-860-528 (ebook), mandatory for contact with NLA.

The US Library of Congress' Cataloguing-in-Publication unit can be contacted via postal mail at the following address:

Library of Congress
US & Publisher Liaison Division
Cataloging in Publication Program
101 Independence Avenue, S.E.
Washington, D.C. 20540-4283
United States of America

Additionally, the Cataloguing and Acquisition division of the Library of Congress can also be contacted.
Ms Barbara Tillet, chief of Policy and Standards division can be reached via email at policy@loc.gov.

General enquiries about and comments on the Library of Congress' cataloguing may be submitted using this online form.

Library and Archives Canada can be contacted via an online form. The Cataloguing and Metadata division of LAC can be contacted via email at standards@bac-lac.gc.ca and the CiP Coordinator office at the following email address cip@bac-lac.gc.ca.
The postal mail address of LAC is:

Library and Archives Canada
395 Wellington Street
Ottawa, ON K1A 0N4

Telephone: +1-819-994-6881
Toll free number in Canada: 1-866-578-7777
Fax: +1-819-997-7517
Website: www.collectionscanada.gc.ca/cip

Melbourne University Publishing contact details are as follows:

Melbourne University Publishing Ltd
187 Grattan Street
Carlton VIC 3053
Telephone: +61-(0)3-9342-0300
Fax: +61-(0)3-9342-0399

UPDATE 1: The National Library of Australia (@nlagovau on Twitter) has responded: "Categorisation of Wikileaks in our catalogue records has been updated. We thank everyone for their comments."

CassPF confirms: "To see listing of @nlagovau subject areas associated with all #Wikileaks books in catalogue see: http://t.co/aObKKUX Thank you NLA."

In a reply to the enquiry of the author of this article, Ms C Foster, Director of Australian Collections Management and Preservation Branch at the National Library of Australia stated:

"When creating the Catalogue in Publication record for The most dangerous man in the world by Andrew Fowler we referred to catalogue records for similar titles. The subject term "Extremist web sites" was one of the terms used in a record from the Library of Congress for a similar title and the term was reused in the NLA catalogue record for The most dangerous man in the world.
The inappropriateness of this term was not identified at the time. The record has been corrected this morning and the amended version is now reflected in our catalogue."

UPDATE 2: As of Tuesday, 19 July 2011, the US Library of Congress has removed all "Extremist web sites" subject headers in catalogue entries for books relating to WikiLeaks and its (former) staff.
This has been confirmed by LoC via their official Twitter account, @librarycongress. LoC stated that they "reviewed this matter and [had] taken steps to remove the "Extremist web sites" categorization from the record in question".

UPDATE 3: On Thursday, 21 July 2011, a spokesman for US Library of Congress, Mr J Sayers, stated in an article at CNET that Library of Congress "was not responsible for categorizing a WikiLeaks-related book as "extremist" and had decided to remove that label". Mr Sayers further said that "LoC adopted this classification in its catalog automatically after another major library system [...] had applied it to a recent book about the document-leaking Web site."
Sharing cataloguing notes is common practice among libraries and e.g. serves to avoid double entries in catalogues. This technique is known as copy-cataloguing.

As this statement from LoC somewhat contradicts the statement made by National Library of Australia, both libraires have been contacted on Friday, 22 July 2011 for further clarification on this issue.

The purpose of 6XX subject headings in LoC CiP data

The first job of any public, school, research, or depository library is to make information available to patrons. For catalogers, that means providing accurate records of the library holdings patrons can search to find the information they are looking for. Catalogers rely on specific sources to do this work, citing rules for accuracy.

Libraries either operate their own cataloging departments, or outsource to companies that produce library databases to implement within their library. Smaller libraries rely more on these outsourcing companies to save money. Larger national libraries, university libraries and so on usually have cataloging departments.

With this in mind, libraries don't create databases of their holdings from whole cloth. This is where OCLC comes in. This database provides billions of records that libraries and cataloging companies import to their databases, then make modifications to suit the needs of their library holdings. OCLC provides records for monographs, serials, video, audio, toys, documents, microfilm, artifacts etc. The big boy on the block is the Library of Congress, which has the three character code of DLC on their holding records in OCLC.

In the four years I worked as a cataloging assistant for a large university/depository library my work revolved around searching for records on OCLC. My job was to find the best quality record available in the database and import it into my library's database for use in cataloging media. Specifically, I always looked for a DLC record first because they were considered the highest quality record of any other library records in the system. Records imported from other libraries would be used only if a DLC record was not available. If no record could be found, then our library would produce a record for the media and import it to OCLC. For the majority of libraries, this is not a common practice.

Now, for the subject of subject headings. Subject headings are created by another part of the cataloging department called authorities or authority control. This is another database that the library holdings database references when locating a piece of media searched for by the patron. There are authority control records for authors, titles, publishers, subjects and so on. Everything necessary to connect the searcher with the right media. Authority departments create subject headings based on subjects covered in the media. If an author includes a chapter on "extremist websites" in his book, one of the subject headings in the library database cataloging record will be "extremist websites." Once again, the whole point of this is to help the patron find what they are looking for.

If books about Wikileaks contain this subject heading, does that particular term come up in the book? If so, then other books about Wikileaks or written by members of Wikileaks will probably have that subject heading associated with their media. For example: If I search for books written by historian Norman Davies, I will also get hits from other historians who write about the same historical period he does, or if the book cites the work of Norman Davies in their own work. I hope this makes sense.

As for OCLC and their leadership team. I remember reading the bio of the current president's predecessor, and it mentioned him coming to OCLC after retiring from the Pentagon. I don't recall his work for RAND being mentioned. I thought that was curious too when I read it, but didn't dwell on it at the time.

I hope this explanation helps. If you have questions feel free to ask.


Thanks, very informative.

Thanks, very informative.

Taxonomy and information control

Further digging during the #wlcat crowdsourced investigation on twitter revealed that the OCLC is responsible for defining catalogue classifications used by LOC.

The definition as set by OCLC equivocates the concept of "extremist website" with terrorist, hate, neo-nazism websites.

This label is then erroneously applied to Wikileaks - recipient of many human rights and peace prizes.

Further investigation reveals all Presidents of the OCLC - past and present - have a military, intelligence or RAND Corporation career history.



We need to ask: why are our libraries being overseen by men whose histories reveal links with war-mongers?

Library of Congress

I've contacted Crown Publishers, who are the US publishers for "Inside WikiLeaks." I've also contacted Daniel Domscheit-Berg to see if he had heard of this or had any insight into the situation.

Presently, it's unclear whether the US Library of Congress is responsible for the improper classification, or if it was an error of the publisher.


Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer