Introduction: Mailing List Archives as a Resource
Mailing lists are a useful medium for listening in on communications on
particular topics, keeping abreast of developments in particular fields and exchanging
views with individuals who have particular interests or areas of expertise. Such
considerations do not exhaust the potential usefulness of mailing lists to the social
scientist. Past communications on many lists are archived. These can be searched for
information on a wide range of matters, factual, conceptual and affective. In addition,
mailing lists themselves are interesting subjects of social research respecting their
patterns of interaction, online identities, narrative attributes, intensity of
communications and characteristics of their participants, to mention a few dimensions.
Mailing lists potentially have three important advantages over more conventional
typographical sources of information on many topics. Books and journal articles are
frequently published one or more years after they were originally written. This invariably
is at some considerable distance from when the original data collection was completed. The
interval between completion of writing and publishing on mailing lists is more or less
immediate. Commentary on contemporary issues is frequently engaged more or less
immediately after they assume prominence on agendas. By the time some of the matters
discussed on mailing lists are the subject of printed publications or conference
presentations subscribers to mailing lists have frequently long since discussed them,
sometimes in considerable detail.
The second important difference is that mailing list communications do not purport to
deal with the `big idea, the major paradigm shift or the lengthy exegesis on the
philosophy of X, Y or Z. If they do, they do so in small chunks. There is, of course, much
useful and usable information, many interesting hypotheses and conclusions that never see
the printed presses because they are insufficiently bounded by additional material.
Mailing lists also afford a reasonably efficient and rapid means of contacting individuals
with expertise in particular areas. If you are looking for information on a particular
subject, pointers to the research literature or an amplification of views expressed in
other contexts, mailing lists can be a very good resource.
Mailing list archives have uses other than the mining of data. Many lists function as
organising and mobilising platforms for special interest, pressure, support and shared
orientation groups. There are a number of lists that focus on progressive social and human
rights issues (PSN, REVS, ACTIV-L) whose members are active, partly through these lists,
in bringing pressure to bear on governments, organisations or individuals with a view to
securing particular ends. Amnesty International works some lists to galvanise support in
furtherance of human rights objectives. During the active period of the uprising in
Chiapis, Mexico, during 1995 and 1996, there was a lot of activity on some lists aimed at
securing support for its leaders and constraining the activities of the Mexican
authorities. Subscribers to lists have also been quite effective in campaigning for the
rights of illegal migrants in California in response to the passing of Proposition 187 in
1994 and others have been campaigning for the lifting of sanctions against Iraq due to
alleged consequences for the civilian population, particularly children. Subscribers have
also successfully exerted pressure against alleged miscarriages of justice, as in the Abu
Mumia Jamal case in Pennsylvania. Mobilisation of opinion opposed to the Communications
and Decency Act was effectively channelled through many mailing lists.
There are other lists that function as support forums for individuals with shared
interests, characteristics or preferences. Although their members may infrequently
exchange communications that might be classified as `scholarly' they are very relevant to
social scientists who are interested in understanding, explaining, and researching issues
relating to the dispositions, interests or characteristics of members of such groups. They
are also a rich source of `research' and `information' leads relating to individuals who
do not appear on any lists that can be used for sampling purposes, or where the
compilation of such lists can be expensive in both time and resources.
TRNSPLNT (@wuvmd.wustl.edu), for example, is a mailing list for those who have received
transplanted organs. The exchanges on this list are archived. The addresses of those who
are active participants can be read from the headers of their communications. Those of
other subscribers can frequently be downloaded, as they are available to other
subscribers. This is potentially useful information for anyone interested in conducting
research on the experiences, medical histories, views, and transformations of identity of
those who have had organ transplants. At the very least there are bound to be some
subscribers who can provide leads to other sources of information and individuals. There
are literally hundreds of other mailing lists focusing on medical dysfunctions, sexual
dispositions, political identities, leisure pursuits, etc. All are potentially rich
sources of varied categories of information for the social scientist. Moreover, their
membership is ordinarily drawn from a much wider social and demographic base than the
subjects who have been included in more traditional social scientific studies.
Mailing
List Archive Database Commands
(1) Indexes and Files
Unlike newsgroups there is no search engine that can be employed to trawl either
current or past postings to all or even a few mailing lists. To locate lists that are
likely to deal with the issues that you are interested in that also archive their
communications, use one of the mailing list database search facilities reviewed in a previous document in this series. You should also
establish whether they maintain Web based archives. Catalist <
http://www.lsoft.com/lists/list_q.html> provides information on whether Listserv lists
have Web archives, although their information is not always accurate on this detail.
Generally, Web based archive search engines cannot filter queries as powerfully as the
database commands described in the remainder of this document.
Having located one or more lists with archives that you think may hold information on
issues that you are interested in, the first thing that you need to do is to obtain an
index of their files. This is accomplished by sending the command below to the
administrative address of the list in the body of the message.
INDEX <listname>, e.g. [INDEX POWR-L]
(POWR-L is the Psychology of Women Resource List, aimed at sharing information on all
matters relating to the psychology of women.) The mailer will respond with a message that
will take the form of that illustrated in Figures 1 and 2. The returned document lists the
files that are archived and provide various details concerning each entry.
My objective in what follows is to enable readers to download relevant files and
construct queries so that they will be able to download pertinent information. I will not,
therefore, elaborate on the complete set of Listserv database functions. A document
explaining the latter for "general users" (release 1.5n), takes up 45 A4 pages,
and can be obtained by sending to the administrative address of any Listserv the command
<Info DATABASE>]. You can obtain addresses by accessing Catalist.

Figure 1
Taking the first file entry in figure 1 for illustrative purposes, the
list name (POWR-L) appears on the extreme left, followed by the file name, which is
LOG9502. The period that it applies to is read from the right. In this instance the log is
for the second month of 1995. At the extreme right of this file entry it is indicated that
the log was started on the 7th of February. The log for a mailing list includes
information on all the messages that were sent for the period indicated. This file will
include information on all messages sent between 7th February and 1st
March.
The headers for the third and fourth columns in Figure 1 are GET and PUT. The row cells
under the headers include the abbreviations PRV and OWN respectively. The GET signifies
who is entitled to download (get) the file. The PUT specifies who is entitled to make
files available on the server for downloading. On this particular list PRV designates list
subscribers, as explained elsewhere in the document to which Figure 1 relates. Sometimes
you will come across the category ANY instead of PRV. That indicates that those files are
available to anyone, irrespective of whether they are subscribers to the list.
If you want to receive any of the files indicated you send the command:
GET <Listname> <Filename>, e.g. [GET POWR-L LOG9505
]
It is essential to remember to include the listname in the command as the mailer is
likely to handle a large number of different lists. You can request as many files as you
want by including a space between filenames.
Figure 2
Figure 2 shows another portion of the same document illustrated in Figure 1. The same
interpretative code applies. The second column, filetype, describes what the file is. This
could be a syllabus, welcome message, text, bibliography or other category. To download a
file you need to send a command to the administrative address including the filename,
which is indicated in the extreme left column, and the listname.
GET POWR-L FREUD or GET POWR-L SELFHELP
In Figure 2 the first file that is indicated is the welcome message sent to new
subscribers, and named POWR-L. The file named FREUD contains various messages that make
some reference to Freudian paradigms, interpretation or practice.

(2) Batch Database Commands
This section will probably strike many readers as being the most difficult in
the book. I suggest that you read through it first and then go over it subsequently as a
practical exercise. Locate a number of lists that cover the topics that you are interested
in, subscribe to them and then send them some of the batch commands illustrated below,
substituting for the terms that I have included some that suit the lists you have
subscribed to and your interests. I am afraid that at this Rubicon there is no alternative
but to move from theoretical to practical activity. You should probably subscribe a day
before you undertake this exercise. On some moderated lists it may take a few days before
you find that you receive acknowledgement of your membership. This does not apply, of
course, to lists that do not require subscription to access their files.
For the most part it is likely that you will want to search through the database of
archived messages in order to locate specific information, rather than scrutinise files or
logs. On a high volume list even searching the logs for a week can be very time consuming.
Database searches can be accomplished by using what is known as a batch command. When you
use a batch command you send a database job to a server via email. A database job is a
sequence of commands that the server can process in the context of executing a database
operation. After the job has been executed the results will be transmitted to you via
email.
The syntax for a database job may strike you as being somewhat obtuse. It is not
necessary, however, to be overly concerned with the overall purpose of specific lines of
syntax other than those that you need to compose yourself. The rest of the syntax, the
obtuse component, you just need to copy accurately. There is a standard format to a
database job that you need to use, modifying it to include your particular search query.
The basic structure of a search request is detailed in the Batch Command Template
illustrated in Figure 3.
Figure 3
In this database job the only variable components are the line entries designated
command 1 and command 2. In other words, when writing out a query you copy all the details
of the Batch Command Template exactly as they are, varying only the entries for command 1
and command 2, as discussed below. These latter instruct the listserv to implement your
search and do something with the results. The meaning of the rest is not something that
you need be overly concerned about, although it is transparent that //JOB signifies the
start of the batch command and //EOJ its ending. If you intend to search mailing list
archives type out and save this template and then copy and paste it into your mail message
when required, filling in only the details for command 1 and command 2 in accordance with
your search requirements.
All this may be somewhat confusing so before proceeding to discuss how to structure
commands 1 and 2 in the above template I will illustrate the procedure with an example.
Below is a batch command, which I sent to the list ACTIV-L in order to retrieve details
about its holdings of information on human rights in Colombia.
// JOB Echo=No
Database Search DD=Rules
//Rules DD *
Search "human rights" and Colombia in ACTIV-L
Index
...
/*
// EOJ
If you compare this with the batch command template, Figure 3 above, you will observe
that it is identical with it, excepting that for line command 1 there has been substituted
Search "human rights" and Colombia in ACTIV-L, and for command line 2 the word
Index. Search "human rights" and Colombia in ACTIV-L instructs the listserv to
search through its database and extract details on the location of documents including
information on both human rights and Colombia. This is the Search command. The command to
Index instructs the listserv to provide an index of the findings resulting from the
execution of command 1. Command 2, therefore, instructs the database to do something with
the output of the search command, namely, to index it. The product of this database job
will be a file whose details are similar to those illustrated in Figures 1 and 2.
There is, unfortunately, an additional minor complication. There are different releases
of the LISTSERV mail management package. The batch database job commands referred to below
are used for versions 1.8b or earlier. For these you need to include the above template,
varying only the command lines 1 and 2 to meet your requirements when you send a database
query, as illustrated in Figure 3. For releases 1.8c and later you only need to send
command 1 in the body of your message. In other words the obtuse syntax ( e.g., // JOB
Echo=No) above and below the command lines does not need to be included. Similarly, after
you have received information relating to your query in the form of a listing of files
that match it and you want to request that some of those files be sent to you, which is
the substance of command 2, you do not need to include the syntax above and below it. To
establish which version a list is using send a message with the command RELEASE in the
body of the message to the administrative address of the list, leaving the Subject line
empty.
(2.1) The Search Command
The Search command is the first one that you use when you begin a
database search and is positioned where command 1 appears in the batch command template
illustrated above. All the other commands that can be issued are executed in relation to
or upon the findings of a search command.
The search command has two components, which are referred to as the
search rules and the optional rules. The term search rules is basically another term for
search syntax. The term optional rules is applied to the syntax linking variables you can
employ to constrain the search that you are conducting. In other words, the optional rules
are appended to the search query and are employed to make it more focused. Before
expanding on this, and to help in clarifying the distinction, look at the search commands
below, all of which would be included as command 1 in the batch command template above.
Search "cocaine poisoning" in ADDICT-L
Search "cocaine poisoning" in ADDICT-L where subject
contains
(treatment or death)
Search "cocaine poisoning" in ADDICT-L where subject
contains
(treatment or death) FROM July 96 to Aug 97
In the first query the command requests a search for the term
"cocaine poisoning" in the database of the list ADDICT-L. Search "cocaine
poisoning" constitutes the search rules component. ADDICT-L is the database optional
rules component. In the second query, the search rules content remains the same but I have
added the constraint where subject contains (treatment or death). This requests that the
search select out those records where the phrase "cocaine poisoning" occurs, but
only where either of the terms treatment or death appears in the Subject: line entry of
the message header as well. This appendage to the search rules, where subject contains
(treatment or death) falls under the heading of keyword optional rules. The third query
confines the search to a time band. This is referred to as the date optional rules
component of the search command.
The simplest search string consists of a single word: GATT, Paris,
Rousseau, punishment, etc. Frequently you will want to narrow the search further. If you
are searching a list that focuses on psychoanalytic thought, the search string Reich might
land a large number of files dealing with varying aspects of his work and life. Your
specific interest might, however, be with his notion of the social psychology of
authoritarian regimes. To locate such documents you need to employ search syntax in the
search rules that is similar to that described earlier in connection with advanced
searches using the Alta Vista search engine (Review pp. ). This could take the form
Search "Wilhelm Reich" AND ("race theory" OR
"organi*ed mysticism")
in PSYCHOANALYTIC-STUDIES
You can employ the logical operators AND, OR, and NOT, double inverted
commas for phrases, and parentheses, as illustrated in the example below.
Search oceans AND ("toxic waste" OR nuclear) NOT -
"uranium isotope" in ENVINF-L
You can make the query as complex as you like to narrow the focus of
your search. When working out how to formulate your query remember that operations in
parentheses are performed prior to those outside of them, as in arithmetic and algebraic
operations.
The default rule for a string of words not included in double inverted
commas is Boolean AND. In other words if you include the query repressed memory syndrome,
documents including all of these words will usually be identified. They will not, however,
necessarily be sequentially proximate in the documents retrieved. If you want to retrieve
documents in which the words appear in the same order and juxtaposed, you need to bound
them with double inverted commas, as in "repressed memory syndrome".
Unless a search string is bounded by double inverted commas, a query
including it will extract records that include it irrespective of case. If the search
string is BSE, the records retrieved will include, if available, not only those that
include BSE, but also bse, Bse, bSe, and other variants. In addition, the LISTSERV
database functions do not require that query terms be surrounded by blanks. Thus, records
that included absent, absent-minded, absentee, and similar, will also be retrieved. If you
want to restrict your query to a specific search string then it should be included in
double inverted commas to cut down on the numbers of records returned.
(2.2) The Optional rules
The optional rules component enables researchers to constrain their
search queries by date, keywords, and a database list, consisting of one or more named
databases (mailing lists).
(2.2.a) Date Rules
Dates can be specified using alternative date rules:
SINCE (e.g. SINCE JULY 1995)
FROM (e.g. FROM JULY 1996 TO APRIL 1997)
UNTIL (e.g. UNTIL MAY 1993)
A query might, therefore, take the following forms:
Search "potato famine" in H-ALBION SINCE FEB 97
Search "frustration aggression"AND (hypothesis OR
theory) in AGGRESS FROM JAN 1996 TO AUG 1997
Dates may be specified in a number of alternative formats:
TODAY
yy (96)
mm (04)
<dd><->month name<-><yy> (17-06-97)
mm/yy (04/95)
yy/mm/dd (97/04/17)
yy-mm-dd (97-04-17)
By default, if you specify the month without a date (e.g. July) the
records retrieved will include all those between 00:59:59, June 30, to 00:59:59, July 31.
(2.2.b) KEYWORD-RULES
The second component of optional rules is keyword rules. The term has
its origins in the fact that all messages to mailing lists have some common parameters.
They all have a list name, subject and sender information and a message header and body.
They all were sent at a specific time on a particular date, are entered into the data base
at a particular time, etc. These parameters can be utilised in the context of database
organisation. The words designating these attributes are referred to as keywords. The
format of keyword rules is:
WHERE/WITH keyword-expression
The keyword expressions that are likely to be primary interest to most
users are SUBJECT and SENDER. The former refers to the entry in the subject line of the
message header and sender refers to the author of the message. FROM can be employed as a
substitute for SENDER. The format of search commands that include keyword rules is
illustrated in the examples in Figure 4
The terms IS and CONTAINS are referred to as comparison operators for
WHERE/WITH clauses. The complete list of comparison operators is listed in Figure 5.
Figure 5
The operators that are included on the same line are synonymous. The
operator IS indicates identity (as in SENDER IS [identical to]
"j-crow@ucla.edu"). Conversely, IS NOT signifies absence of congruence. The
mathematical symbols only apply in expressions relating to numerical entities.
CONTAINS/DOES NOT CONTAIN are homologous with includes/excludes. The last two operators
SOUNDS LIKE/DOES NOT SOUND LIKE are employed for database searches in which you are
uncertain of the precise spelling of the search or optional rule variables.
The syntax that can be employed in the formulation of the search rules
component of the query can be used as well in the optional rules segment of the search
command. That is, you can employ Boolean operators, parentheses, double inverted commas
and a number of keywords. You could, for instance, formulate a search command like the
following:
Search "A Monetary History of the United States" in
EKONLIST
WHERE Subject CONTAINS "review of" and SENDER IS
((Smith or Jones) NOT Hutton))
If you were uncertain of the exact name of a sender, but thought that
it sounded similar to something else, you could try:
Search Fournier in MARXISM WITH SENDER SOUNDS LIKE `Johns'
This should retrieve records with Johns, Jones, Johnston, Johnson, and
similar, if they appear in the database.
(2.2.c) Database Lists
At last we have arrived back at something relatively simple. Having
composed the search query and constrained it with the optional rules, it is, of course,
necessary to specify the databases (mailing lists) to be searched. You can specify more
than one database to be searched. Remember, however, that both databases must have the
same address. That is, the same LISTSERV or other mail management package must manage them
both on the same server. You can establish which lists are managed at a particular address
by sending the command: List to the administrative address. Catalist
<http://www.lsoft.com/lists/list_q.html> provides a link which informs you of all
the lists that are included on the listserv of any list you select.
Having established the databases available you can compose the query to
include all those you consider likely to be relevant. A query could take the following
form:
Search <query> in <mailing list name 1> <mailing
list name 2>
<other optional rules>
Having got this far you will no doubt appreciate that although the
search procedure appears somewhat involved, it is also quite powerful in that you can
refine your search to focus closely on a combination of topics/authors/time periods that
interest you.
|
Summary of Search Query Procedures
(Command 1 in Batch Query Template)
(1) Copy the batch command template reproduced below to the body of the
message. (2) Compile your search query, taking into consideration the various factors
discussed in chapter. (3) Enter the name of the list you want to search on and decide on
other variables which form part of the optional rules that you want to employ. When
you have completed all that the database job should have the following form:
// JOB Echo=No
Database Search DD=R
// Rules DD *
Search Query, e.g.
[SEARCH "police brutality" AND Los Angeles IN RIOT-L
SINCE 1995 WHERE SUBJECT CONTAINS `Clinton']
<command 2>
...
/*
// EOJ |
Figure 6
Final Note: The whole of the search command must appear on the same
line. If you start a new line before the end of the query you must insert a - before
pressing the Enter/Return key. In Figure 6 there is a dash after RIOT-L. It is best,
therefore, to let the text word wrap and to only press the Return key when you want to
being a new line of search syntax.
(3) The Output Commands
As you can see in the example given in Figure 6, the standard batch database job
includes two commands, the first being the search command just discussed. The second
command relates to what you want done with the results obtained from submitting the search
command. There are two options here, INDEX and PRINT/GETPOST.
Ordinarily, when you send your first batch database job you will want to have returned
a list of all the files that include references to your query. If the mailing list is
being managed by Listserv release 1.8c or above, you will automatically be sent a list of
the files that correspond with the search command submitted without having to do anything
further. If the Listserv is release 1.8b or earlier, you substitute INDEX for command 2 in
the batch database job template. To establishing which version a list is using send a
message with the command RELEASE in the body of the message to the administrative address
of the list, leaving the Subject line empty. The batch database job
will now take the following form:
// JOB Echo=No
Database Search DD=Rules
// Rules DD *
Search "cocaine addiction" in Addict-L FROM FEB 1997
Index
...
/*
// EOJ
The file returned in response to the above database job is illustrated in Figure 7.
Figure 7
In addition to details relating to the file, further on in the document there will be a
few lines from the start of each message, which may give you some idea of its relevance to
your query. You now need to select which of the postings you want to look at. This is done
using the PRINT or other command as may be specified in the document that is returned. In
the document illustrated in Figure 3 it is specified that to retrieve a file GETPOST and
filename should be used, and an example of the syntax is provided.
In this particular instance the command would be of the form:
GETPOST ADDICT-L 016997 017017
If some of the file numbers are consecutive and you wish to see all those that fall
within a range, you can use the following syntax:
GETPOST ADDICT-L 016997-017009
If the file returned does not specify how to request particular documents, and the
GETPOST command results in an error message being returned, then you should use the PRINT
command, which is inserted as command 2 in the batch database job template. Taking the
above example again, the syntax would take the following form:
// JOB Echo=No
Database Search DD=Rules
// Rules DD *
Search "cocaine addiction" in Addict-L FROM FEB 1997
PRINT 016997 017017
...
/*
// EOJ
Summary and Conclusion
The database search functions of mailing list management packages are extremely
powerful. It is true that the more intricate your search requirements, the more
complicated the composition of an appropriate search command is likely to be. It is not
necessary, however, to jump in initially at the deep end. Frequently the information
required can probably be retrieved with a relatively simple query to an appropriate list,
involving a word or a phrase, perhaps with date and/or subject constraints. A limited
amount of practice with simple to middling complexity search commands will provide the
necessary experience to experiment with more elaborate searches when required.
It is, in any event, a mistake to accept uncritically some of the hype about the Web,
namely, that it is just a question of click and go. No social scientist with experience in
library based or field research should expect that online research using databases running
to hundreds of millions of megabytes of data will be either simple or problem free.
Mailing list database features, on the other hand, do have the advantage of being
extremely rapid in returning results once you acquire the requisite skills. The time
required to acquire these is not extensive.
Those who do persevere should consult some of the online documents on database
functions that are available from servers using LISTSERV and other mailing packages. My
discussion above has not been exhaustive of the commands that are available.
Finally, it is worth noting that there are an increasing number of lists that maintain
Web based archives. This means that instead of having to master rather arcane syntax, you
access the URL of the mailing list archive and then use the search engine provided to
select out messages that match your query. Simplicity, alas, is not everything. First,
only a small proportion of mailing lists currently maintain Web based archives (12 out of
134 history, 16/151 women, 1/35 economics listserv lists, 25 February 1998). Secondly, the
search facilities are frequently not as sophisticated as archives that employ the
electronic mail database functions. Thirdly, you frequently cannot establish whether the
list has a Web based archive before subscribing. Although CataList claims to indicate for
LISTSERV based mailing lists whether the archives are Web based or not, its data is not
always accurate. |