Mailing List Archives

Note: This document features as Chapter 11 in my Learning, Teaching and Researching on the Internet, published by Addison Wesley Longman, November 1998

Introduction: Mailing List Archives as a Resource

Mailing lists are a useful medium for listening in on communications on particular topics, keeping abreast of developments in particular fields and exchanging views with individuals who have particular interests or areas of expertise. Such considerations do not exhaust the potential usefulness of mailing lists to the social scientist. Past communications on many lists are archived. These can be searched for information on a wide range of matters, factual, conceptual and affective. In addition, mailing lists themselves are interesting subjects of social research respecting their patterns of interaction, online identities, narrative attributes, intensity of communications and characteristics of their participants, to mention a few dimensions.

Mailing lists potentially have three important advantages over more conventional typographical sources of information on many topics. Books and journal articles are frequently published one or more years after they were originally written. This invariably is at some considerable distance from when the original data collection was completed. The interval between completion of writing and publishing on mailing lists is more or less immediate. Commentary on contemporary issues is frequently engaged more or less immediately after they assume prominence on agendas. By the time some of the matters discussed on mailing lists are the subject of printed publications or conference presentations subscribers to mailing lists have frequently long since discussed them, sometimes in considerable detail.

The second important difference is that mailing list communications do not purport to deal with the `big idea’, the major paradigm shift or the lengthy exegesis on the philosophy of X, Y or Z. If they do, they do so in small chunks. There is, of course, much useful and usable information, many interesting hypotheses and conclusions that never see the printed presses because they are insufficiently bounded by additional material. Mailing lists also afford a reasonably efficient and rapid means of contacting individuals with expertise in particular areas. If you are looking for information on a particular subject, pointers to the research literature or an amplification of views expressed in other contexts, mailing lists can be a very good resource.

Mailing list archives have uses other than the mining of data. Many lists function as organising and mobilising platforms for special interest, pressure, support and shared orientation groups. There are a number of lists that focus on progressive social and human rights issues (PSN, REVS, ACTIV-L) whose members are active, partly through these lists, in bringing pressure to bear on governments, organisations or individuals with a view to securing particular ends. Amnesty International works some lists to galvanise support in furtherance of human rights objectives. During the active period of the uprising in Chiapis, Mexico, during 1995 and 1996, there was a lot of activity on some lists aimed at securing support for its leaders and constraining the activities of the Mexican authorities. Subscribers to lists have also been quite effective in campaigning for the rights of illegal migrants in California in response to the passing of Proposition 187 in 1994 and others have been campaigning for the lifting of sanctions against Iraq due to alleged consequences for the civilian population, particularly children. Subscribers have also successfully exerted pressure against alleged miscarriages of justice, as in the Abu Mumia Jamal case in Pennsylvania. Mobilisation of opinion opposed to the Communications and Decency Act was effectively channelled through many mailing lists.

There are other lists that function as support forums for individuals with shared interests, characteristics or preferences. Although their members may infrequently exchange communications that might be classified as `scholarly' they are very relevant to social scientists who are interested in understanding, explaining, and researching issues relating to the dispositions, interests or characteristics of members of such groups. They are also a rich source of `research' and `information' leads relating to individuals who do not appear on any lists that can be used for sampling purposes, or where the compilation of such lists can be expensive in both time and resources.

TRNSPLNT (@wuvmd.wustl.edu), for example, is a mailing list for those who have received transplanted organs. The exchanges on this list are archived. The addresses of those who are active participants can be read from the headers of their communications. Those of other subscribers can frequently be downloaded, as they are available to other subscribers. This is potentially useful information for anyone interested in conducting research on the experiences, medical histories, views, and transformations of identity of those who have had organ transplants. At the very least there are bound to be some subscribers who can provide leads to other sources of information and individuals. There are literally hundreds of other mailing lists focusing on medical dysfunctions, sexual dispositions, political identities, leisure pursuits, etc. All are potentially rich sources of varied categories of information for the social scientist. Moreover, their membership is ordinarily drawn from a much wider social and demographic base than the subjects who have been included in more traditional social scientific studies.

Mailing List Archive Database Commands

(1) Indexes and Files

Unlike newsgroups there is no search engine that can be employed to trawl either current or past postings to all or even a few mailing lists. To locate lists that are likely to deal with the issues that you are interested in that also archive their communications, use one of the mailing list database search facilities reviewed in a previous document in this series. You should also establish whether they maintain Web based archives. Catalist < http://www.lsoft.com/lists/list_q.html> provides information on whether Listserv lists have Web archives, although their information is not always accurate on this detail. Generally, Web based archive search engines cannot filter queries as powerfully as the database commands described in the remainder of this document.

Having located one or more lists with archives that you think may hold information on issues that you are interested in, the first thing that you need to do is to obtain an index of their files. This is accomplished by sending the command below to the administrative address of the list in the body of the message.

INDEX <listname>, e.g. [INDEX POWR-L]

(POWR-L is the Psychology of Women Resource List, aimed at sharing information on all matters relating to the psychology of women.) The mailer will respond with a message that will take the form of that illustrated in Figures 1 and 2. The returned document lists the files that are archived and provide various details concerning each entry.

My objective in what follows is to enable readers to download relevant files and construct queries so that they will be able to download pertinent information. I will not, therefore, elaborate on the complete set of Listserv database functions. A document explaining the latter for "general users" (release 1.5n), takes up 45 A4 pages, and can be obtained by sending to the administrative address of any Listserv the command <Info DATABASE>]. You can obtain addresses by accessing Catalist.

wpe40.jpg (27263 bytes)

Figure 1

Taking the first file entry in figure 1 for illustrative purposes, the list name (POWR-L) appears on the extreme left, followed by the file name, which is LOG9502. The period that it applies to is read from the right. In this instance the log is for the second month of 1995. At the extreme right of this file entry it is indicated that the log was started on the 7th of February. The log for a mailing list includes information on all the messages that were sent for the period indicated. This file will include information on all messages sent between 7th February and 1st March.

The headers for the third and fourth columns in Figure 1 are GET and PUT. The row cells under the headers include the abbreviations PRV and OWN respectively. The GET signifies who is entitled to download (get) the file. The PUT specifies who is entitled to make files available on the server for downloading. On this particular list PRV designates list subscribers, as explained elsewhere in the document to which Figure 1 relates. Sometimes you will come across the category ANY instead of PRV. That indicates that those files are available to anyone, irrespective of whether they are subscribers to the list.

If you want to receive any of the files indicated you send the command:

GET <Listname> <Filename>, e.g. [GET POWR-L LOG9505 ]

It is essential to remember to include the listname in the command as the mailer is likely to handle a large number of different lists. You can request as many files as you want by including a space between filenames.

wpe41.jpg (44329 bytes)

Figure 2

Figure 2 shows another portion of the same document illustrated in Figure 1. The same interpretative code applies. The second column, filetype, describes what the file is. This could be a syllabus, welcome message, text, bibliography or other category. To download a file you need to send a command to the administrative address including the filename, which is indicated in the extreme left column, and the listname.

GET POWR-L FREUD or GET POWR-L SELFHELP

In Figure 2 the first file that is indicated is the welcome message sent to new subscribers, and named POWR-L. The file named FREUD contains various messages that make some reference to Freudian paradigms, interpretation or practice.

wpe42.jpg (19399 bytes)

(2) Batch Database Commands

This section will probably strike many readers as being the most difficult in the book. I suggest that you read through it first and then go over it subsequently as a practical exercise. Locate a number of lists that cover the topics that you are interested in, subscribe to them and then send them some of the batch commands illustrated below, substituting for the terms that I have included some that suit the lists you have subscribed to and your interests. I am afraid that at this Rubicon there is no alternative but to move from theoretical to practical activity. You should probably subscribe a day before you undertake this exercise. On some moderated lists it may take a few days before you find that you receive acknowledgement of your membership. This does not apply, of course, to lists that do not require subscription to access their files.

For the most part it is likely that you will want to search through the database of archived messages in order to locate specific information, rather than scrutinise files or logs. On a high volume list even searching the logs for a week can be very time consuming. Database searches can be accomplished by using what is known as a batch command. When you use a batch command you send a database job to a server via email. A database job is a sequence of commands that the server can process in the context of executing a database operation. After the job has been executed the results will be transmitted to you via email.

The syntax for a database job may strike you as being somewhat obtuse. It is not necessary, however, to be overly concerned with the overall purpose of specific lines of syntax other than those that you need to compose yourself. The rest of the syntax, the obtuse component, you just need to copy accurately. There is a standard format to a database job that you need to use, modifying it to include your particular search query. The basic structure of a search request is detailed in the Batch Command Template illustrated in Figure 3.

wpe43.jpg (7293 bytes)

Figure 3

In this database job the only variable components are the line entries designated command 1 and command 2. In other words, when writing out a query you copy all the details of the Batch Command Template exactly as they are, varying only the entries for command 1 and command 2, as discussed below. These latter instruct the listserv to implement your search and do something with the results. The meaning of the rest is not something that you need be overly concerned about, although it is transparent that //JOB signifies the start of the batch command and //EOJ its ending. If you intend to search mailing list archives type out and save this template and then copy and paste it into your mail message when required, filling in only the details for command 1 and command 2 in accordance with your search requirements.

All this may be somewhat confusing so before proceeding to discuss how to structure commands 1 and 2 in the above template I will illustrate the procedure with an example. Below is a batch command, which I sent to the list ACTIV-L in order to retrieve details about its holdings of information on human rights in Colombia.

// JOB Echo=No

Database Search DD=Rules

//Rules DD *

Search "human rights" and Colombia in ACTIV-L

Index

...

/*

// EOJ

If you compare this with the batch command template, Figure 3 above, you will observe that it is identical with it, excepting that for line command 1 there has been substituted Search "human rights" and Colombia in ACTIV-L, and for command line 2 the word Index. Search "human rights" and Colombia in ACTIV-L instructs the listserv to search through its database and extract details on the location of documents including information on both human rights and Colombia. This is the Search command. The command to Index instructs the listserv to provide an index of the findings resulting from the execution of command 1. Command 2, therefore, instructs the database to do something with the output of the search command, namely, to index it. The product of this database job will be a file whose details are similar to those illustrated in Figures 1 and 2.

There is, unfortunately, an additional minor complication. There are different releases of the LISTSERV mail management package. The batch database job commands referred to below are used for versions 1.8b or earlier. For these you need to include the above template, varying only the command lines 1 and 2 to meet your requirements when you send a database query, as illustrated in Figure 3. For releases 1.8c and later you only need to send command 1 in the body of your message. In other words the obtuse syntax ( e.g., // JOB Echo=No) above and below the command lines does not need to be included. Similarly, after you have received information relating to your query in the form of a listing of files that match it and you want to request that some of those files be sent to you, which is the substance of command 2, you do not need to include the syntax above and below it. To establish which version a list is using send a message with the command RELEASE in the body of the message to the administrative address of the list, leaving the Subject line empty.

(2.1) The Search Command

The Search command is the first one that you use when you begin a database search and is positioned where command 1 appears in the batch command template illustrated above. All the other commands that can be issued are executed in relation to or upon the findings of a search command.

The search command has two components, which are referred to as the search rules and the optional rules. The term search rules is basically another term for search syntax. The term optional rules is applied to the syntax linking variables you can employ to constrain the search that you are conducting. In other words, the optional rules are appended to the search query and are employed to make it more focused. Before expanding on this, and to help in clarifying the distinction, look at the search commands below, all of which would be included as command 1 in the batch command template above.

 

Search "cocaine poisoning" in ADDICT-L

Search "cocaine poisoning" in ADDICT-L where subject contains –

(treatment or death)

Search "cocaine poisoning" in ADDICT-L where subject contains –

(treatment or death) FROM July 96 to Aug 97

In the first query the command requests a search for the term "cocaine poisoning" in the database of the list ADDICT-L. Search "cocaine poisoning" constitutes the search rules component. ADDICT-L is the database optional rules component. In the second query, the search rules content remains the same but I have added the constraint where subject contains (treatment or death). This requests that the search select out those records where the phrase "cocaine poisoning" occurs, but only where either of the terms treatment or death appears in the Subject: line entry of the message header as well. This appendage to the search rules, where subject contains (treatment or death) falls under the heading of keyword optional rules. The third query confines the search to a time band. This is referred to as the date optional rules component of the search command.

The simplest search string consists of a single word: GATT, Paris, Rousseau, punishment, etc. Frequently you will want to narrow the search further. If you are searching a list that focuses on psychoanalytic thought, the search string Reich might land a large number of files dealing with varying aspects of his work and life. Your specific interest might, however, be with his notion of the social psychology of authoritarian regimes. To locate such documents you need to employ search syntax in the search rules that is similar to that described earlier in connection with advanced searches using the Alta Vista search engine (Review pp. ). This could take the form

Search "Wilhelm Reich" AND ("race theory" OR "organi*ed mysticism")

in PSYCHOANALYTIC-STUDIES

You can employ the logical operators AND, OR, and NOT, double inverted commas for phrases, and parentheses, as illustrated in the example below.

Search oceans AND ("toxic waste" OR nuclear) NOT -

"uranium isotope" in ENVINF-L

You can make the query as complex as you like to narrow the focus of your search. When working out how to formulate your query remember that operations in parentheses are performed prior to those outside of them, as in arithmetic and algebraic operations.

The default rule for a string of words not included in double inverted commas is Boolean AND. In other words if you include the query repressed memory syndrome, documents including all of these words will usually be identified. They will not, however, necessarily be sequentially proximate in the documents retrieved. If you want to retrieve documents in which the words appear in the same order and juxtaposed, you need to bound them with double inverted commas, as in "repressed memory syndrome".

Unless a search string is bounded by double inverted commas, a query including it will extract records that include it irrespective of case. If the search string is BSE, the records retrieved will include, if available, not only those that include BSE, but also bse, Bse, bSe, and other variants. In addition, the LISTSERV database functions do not require that query terms be surrounded by blanks. Thus, records that included absent, absent-minded, absentee, and similar, will also be retrieved. If you want to restrict your query to a specific search string then it should be included in double inverted commas to cut down on the numbers of records returned.

(2.2) The Optional rules

The optional rules component enables researchers to constrain their search queries by date, keywords, and a database list, consisting of one or more named databases (mailing lists).

(2.2.a) Date Rules

Dates can be specified using alternative date rules:

SINCE (e.g. SINCE JULY 1995)

FROM (e.g. FROM JULY 1996 TO APRIL 1997)

UNTIL (e.g. UNTIL MAY 1993)

A query might, therefore, take the following forms:

Search "potato famine" in H-ALBION SINCE FEB 97

Search "frustration aggression"AND (hypothesis OR theory) in AGGRESS FROM JAN 1996 TO AUG 1997

Dates may be specified in a number of alternative formats:

TODAY

yy (96)

mm (04)

<dd><->month name<-><yy> (17-06-97)

mm/yy (04/95)

yy/mm/dd (97/04/17)

yy-mm-dd (97-04-17)

 

By default, if you specify the month without a date (e.g. July) the records retrieved will include all those between 00:59:59, June 30, to 00:59:59, July 31.

(2.2.b) KEYWORD-RULES

The second component of optional rules is keyword rules. The term has its origins in the fact that all messages to mailing lists have some common parameters. They all have a list name, subject and sender information and a message header and body. They all were sent at a specific time on a particular date, are entered into the data base at a particular time, etc. These parameters can be utilised in the context of database organisation. The words designating these attributes are referred to as keywords. The format of keyword rules is:

WHERE/WITH keyword-expression

The keyword expressions that are likely to be primary interest to most users are SUBJECT and SENDER. The former refers to the entry in the subject line of the message header and sender refers to the author of the message. FROM can be employed as a substitute for SENDER. The format of search commands that include keyword rules is illustrated in the examples in Figure 4

wpe44.jpg (27398 bytes)

The terms IS and CONTAINS are referred to as comparison operators for WHERE/WITH clauses. The complete list of comparison operators is listed in Figure 5.

wpe45.jpg (7470 bytes)

Figure 5

The operators that are included on the same line are synonymous. The operator IS indicates identity (as in SENDER IS [identical to] "j-crow@ucla.edu"). Conversely, IS NOT signifies absence of congruence. The mathematical symbols only apply in expressions relating to numerical entities. CONTAINS/DOES NOT CONTAIN are homologous with includes/excludes. The last two operators SOUNDS LIKE/DOES NOT SOUND LIKE are employed for database searches in which you are uncertain of the precise spelling of the search or optional rule variables.

The syntax that can be employed in the formulation of the search rules component of the query can be used as well in the optional rules segment of the search command. That is, you can employ Boolean operators, parentheses, double inverted commas and a number of keywords. You could, for instance, formulate a search command like the following:

Search "A Monetary History of the United States" in EKONLIST

WHERE Subject CONTAINS "review of" and SENDER IS

((Smith or Jones) NOT Hutton))

If you were uncertain of the exact name of a sender, but thought that it sounded similar to something else, you could try:

Search Fournier in MARXISM WITH SENDER SOUNDS LIKE `Johns'

This should retrieve records with Johns, Jones, Johnston, Johnson, and similar, if they appear in the database.

(2.2.c) Database Lists

At last we have arrived back at something relatively simple. Having composed the search query and constrained it with the optional rules, it is, of course, necessary to specify the databases (mailing lists) to be searched. You can specify more than one database to be searched. Remember, however, that both databases must have the same address. That is, the same LISTSERV or other mail management package must manage them both on the same server. You can establish which lists are managed at a particular address by sending the command: List to the administrative address. Catalist <http://www.lsoft.com/lists/list_q.html> provides a link which informs you of all the lists that are included on the listserv of any list you select.

Having established the databases available you can compose the query to include all those you consider likely to be relevant. A query could take the following form:

Search <query> in <mailing list name 1> <mailing list name 2>

<other optional rules>

Having got this far you will no doubt appreciate that although the search procedure appears somewhat involved, it is also quite powerful in that you can refine your search to focus closely on a combination of topics/authors/time periods that interest you.

 

 

Summary of Search Query Procedures

(Command 1 in Batch Query Template)

 

(1) Copy the batch command template reproduced below to the body of the message. (2) Compile your search query, taking into consideration the various factors discussed in chapter. (3) Enter the name of the list you want to search on and decide on other variables which form part of the optional rules that you want to employ. When you have completed all that the database job should have the following form:

 

 

// JOB Echo=No

Database Search DD=R

// Rules DD *

Search Query, e.g.

[SEARCH "police brutality" AND Los Angeles IN RIOT-L –

SINCE 1995 WHERE SUBJECT CONTAINS `Clinton']

<command 2>

...

/*

// EOJ

Figure 6

Final Note: The whole of the search command must appear on the same line. If you start a new line before the end of the query you must insert a - before pressing the Enter/Return key. In Figure 6 there is a dash after RIOT-L. It is best, therefore, to let the text word wrap and to only press the Return key when you want to being a new line of search syntax.

 

(3) The Output Commands

As you can see in the example given in Figure 6, the standard batch database job includes two commands, the first being the search command just discussed. The second command relates to what you want done with the results obtained from submitting the search command. There are two options here, INDEX and PRINT/GETPOST.

Ordinarily, when you send your first batch database job you will want to have returned a list of all the files that include references to your query. If the mailing list is being managed by Listserv release 1.8c or above, you will automatically be sent a list of the files that correspond with the search command submitted without having to do anything further. If the Listserv is release 1.8b or earlier, you substitute INDEX for command 2 in the batch database job template. To establishing which version a list is using send a message with the command RELEASE in the body of the message to the administrative address of the list, leaving the Subject line empty. The batch database job will now take the following form:

// JOB Echo=No

Database Search DD=Rules

// Rules DD *

Search "cocaine addiction" in Addict-L FROM FEB 1997

Index

...

/*

// EOJ

The file returned in response to the above database job is illustrated in Figure 7.

wpe46.jpg (35915 bytes)

Figure 7

In addition to details relating to the file, further on in the document there will be a few lines from the start of each message, which may give you some idea of its relevance to your query. You now need to select which of the postings you want to look at. This is done using the PRINT or other command as may be specified in the document that is returned. In the document illustrated in Figure 3 it is specified that to retrieve a file GETPOST and filename should be used, and an example of the syntax is provided.

In this particular instance the command would be of the form:

GETPOST ADDICT-L 016997 017017

If some of the file numbers are consecutive and you wish to see all those that fall within a range, you can use the following syntax:

GETPOST ADDICT-L 016997-017009

If the file returned does not specify how to request particular documents, and the GETPOST command results in an error message being returned, then you should use the PRINT command, which is inserted as command 2 in the batch database job template. Taking the above example again, the syntax would take the following form:

// JOB Echo=No

Database Search DD=Rules

// Rules DD *

Search "cocaine addiction" in Addict-L FROM FEB 1997

PRINT 016997 017017

...

/*

// EOJ

Summary and Conclusion

The database search functions of mailing list management packages are extremely powerful. It is true that the more intricate your search requirements, the more complicated the composition of an appropriate search command is likely to be. It is not necessary, however, to jump in initially at the deep end. Frequently the information required can probably be retrieved with a relatively simple query to an appropriate list, involving a word or a phrase, perhaps with date and/or subject constraints. A limited amount of practice with simple to middling complexity search commands will provide the necessary experience to experiment with more elaborate searches when required.

It is, in any event, a mistake to accept uncritically some of the hype about the Web, namely, that it is just a question of click and go. No social scientist with experience in library based or field research should expect that online research using databases running to hundreds of millions of megabytes of data will be either simple or problem free. Mailing list database features, on the other hand, do have the advantage of being extremely rapid in returning results once you acquire the requisite skills. The time required to acquire these is not extensive.

Those who do persevere should consult some of the online documents on database functions that are available from servers using LISTSERV and other mailing packages. My discussion above has not been exhaustive of the commands that are available.

Finally, it is worth noting that there are an increasing number of lists that maintain Web based archives. This means that instead of having to master rather arcane syntax, you access the URL of the mailing list archive and then use the search engine provided to select out messages that match your query. Simplicity, alas, is not everything. First, only a small proportion of mailing lists currently maintain Web based archives (12 out of 134 history, 16/151 women, 1/35 economics listserv lists, 25 February 1998). Secondly, the search facilities are frequently not as sophisticated as archives that employ the electronic mail database functions. Thirdly, you frequently cannot establish whether the list has a Web based archive before subscribing. Although CataList claims to indicate for LISTSERV based mailing lists whether the archives are Web based or not, its data is not always accurate.

Previous Document: Contributing to Mailing Lists


Document compiled by Dr S D Stein
Last update 05/10/98
Stuart.Stein@uwe.ac.uk

ESS Home Page