dcsimg
EEBO - Early English Books Online - Logo Chadwyck Healey - logo
Help
Home Search About EEBO Help Contents Back

What is this?

Welcome! You are using a new version of EEBO that makes it easier to search early modern texts.

This new functionality is the result of an ongoing research project by Professor Martin Mueller of Northwestern University to provide orthographic standardisation to a large archive of texts from Chadwyck-Healey and the Text Creation Partnership (TCP), including EEBO. The CIC CLI Virtual Modernisation Project is an initiative of the Center for Library Initiatives (CLI) of the Committee on Institutional Cooperation (CIC). It is being supported by ProQuest and the 13 member institutions of the CIC.

Virtual orthographic standardisation is now available to all users of EEBO and Literature Online. We welcome feedback and suggestions on the new functionality.

How does the Variant Spellings feature work?

The Variant spellings box appears on the Basic Search, Advanced Search and Periodicals Search screens. It is checked by default.

If you type a search term in the Keyword(s) box and the Variant spellings box is checked when you submit your search, you will automatically retrieve all instances of your search term and its early modern variant forms in EEBO. For example, if the box for Variant spellings is checked and you type the word murder in the Keyword(s) field, when you submit your search you will retrieve all occurrences of the word murder and its early modern variants murther, murdre, murdir and mvrder.

This will also work if you check the box and type search terms in other fields like Title keyword(s) and Imprint.

If you type a phrase in the Keyword(s) search field (for example Keyword(s): "so foul and fair a day") and the Variant spellings box is checked when you submit your search, your results will include instances of the phrase where the spelling varies (such as so foule and faire a day). Similarly, if you type a series of terms connected by Boolean or proximity operators in this field (for example jealous and green-eyed), your search will include all available spelling and typographic variants of each term (such as iealous, greene-eyed and greene eyd).

When typing a search expression that includes Truncation and wildcard operators (e.g. Keyword(s): je?lo?s*), you should uncheck the Variant spellings and Variant forms boxes.

How does the Variant Forms feature work?

The Variant forms checkbox appears on the Basic Search, Advanced Search and Periodicals Search screens. Unlike the Variant spellings checkbox, it is not checked by default. To activate the Variant forms functionality, you must click the Variant forms checkbox.

If you type a search term in the Keyword(s) box and the Variant forms box is checked when you submit your search, your search will be expanded to include inflected forms of your search term present in EEBO. For instance, if the box for Variant forms is checked and you type the word murder in the Keyword(s) field, when you submit your search you will retrieve all instances of the word murder together with its inflected forms murdered, murdering, murders etc.

Since the Variant forms feature is always used in conjunction with the Variant spellings feature, if you search with the Variant forms box checked your search will also automatically retrieve instances of early modern spelling variants of all the various inflected forms of your original search term, for instance murthred, murthrest, murdreth, murdring, murtherynge and murthers.

It is not possible to include Variant forms without also including Variant spellings. If you check the Variant forms box and uncheck the Variant spellings box, the Variant forms box will automatically uncheck itself.

The Variant forms feature will also work if you check the box and type search terms in other fields like Title keyword(s) and Imprint.

This process of expanding a search to include inflected forms of your original term is known as lemmatisation.

Check for variants

The Check for variants link appears to the right of the Keyword(s) search box on the Basic Search and Advanced Search screens. It is not available on the Periodicals Search screen.

The Check for variants display allows you to view early modern variant forms of your search term. It also allows you to lemmatise your term, showing you the inflected forms of a word or group of words together with all the early modern variant forms of those inflected forms. From this screen you can select specific variants of your search term to copy back into your search, allowing you to build a search query focussed on particular forms of your original term found in EEBO.

Example of spelling variants

It is important to note that there is a limit to how many manually selected search terms the EEBO search engine can process. If you have selected a large number of search terms using the Check for variants screen, your search is likely to be slow and may fail altogether. Where this happens it is advisable to try the search again using fewer search terms or to perform the search using the Variant spellings and Variant forms checkboxes on the Search page.

Are typographical variants included when I search using Variant spellings?

Early modern typographical conventions mean that in pre-1700 texts certain characters are often used interchangeably. For instance, the characters j and i are often exchanged, with the word juniper occasionally appearing as iuniper, and the word Ireland as Jreland. Similarly, u often appears as a v, and vice versa, such that the word love often appears as loue, whilst usurper sometimes appears as vsurper. The letter w is occasionally represented by both vv and uu, with worth appearing as both vvorth and uuorth.

When you search with the Variant spellings box checked, you will automatically retrieve instances of your search term(s) in which any of these simple substitutions (i for j and vice versa, u for v and vice versa, and uu and vv for w) have taken place. Thus a search for the term woman will retrieve forms of this word featuring variant typography such as vvoman and uuoman (along with other old spellings of woman such as womanne and vvoeman).

Note that it is possible that some purely typographic variants of your search terms will not be listed on the Check for variants display even though these variant forms are present in EEBO. This is because the word lists that appear on this screen only include early modern spelling and typographic variants that are present in the corpus of 13,000 keyed texts produced by the Text Creation Partnership; other typographic variants that are unique to the 100,000 bibliographic records in EEBO (i.e. typographic variants that are not present in the Text Creation Partnership collection) will not be displayed on the Check for variants display. However, when you search with the Variant spellings box checked, you will automatically retrieve instances of your search term(s) in which any of the typographic substitutions described above have taken place, regardless of whether these variants appear on the Check for variants lists.

Note that the Select from a list feature allows you to search and browse an alphabetical list containing every word found in the database, including all old-spelling forms and typographical variants.

More information about the CIC CLI Virtual Modernisation Project

From the time EEBO was first released in 1998, users and librarians have been concerned that the inconsistent spellings that occur in early modern English texts would cause users to miss many texts relevant to their research and thus limit their ability to use such resources to their full potential. Building on research being conducted by Professor Martin Mueller at Northwestern University, the mission of the Virtual Orthographic Standardisation Project is to develop a tool that allows both expert and non-expert users to search databases such as EEBO using modern English spellings and automatically retrieve instances of extant early modern spelling variants.

A 'standardised' spelling is typically but not always, a 'modern' word form. Thus louynge and loues maps to loving and loves respectively, but loueth maps to loveth, the standard spelling in which this archaic form appears in, say, the King James Bible.

Another key part of the Virtual Orthographic Standardisation project is the creation of lemmatisation data, which takes the process of standardisation one step further. Lemmatisation is the linguist's term for the practice of bundling the different forms of a word under the form in which the word is likely to appear in a dictionary. Thus loves, loved, and loving are forms of the lemma love. Lemmatisation allows users to look for all variant spellings of the standard spelling love or search for the lemma love (retrieving all variant spellings of the standard spellings love, loves, loveth, loving, and loved).

Work on the project began in the summer of 2005 with a group of Northwestern undergraduates and graduate students working under the direction of Professor Mueller. Work has now moved into a more formal phase and is being carried on as a collaborative project between Professor Mueller and staff of the Academic Technologies group at Northwestern University.

The project has also extended its scope to include part-of-speech tagging. Part-of-speech tagging is necessary to resolve ambiguities (bee, doe, etc.), but its benefits extend far beyond this practical application.

When completed, the project will offer virtual orthographic standardisation and part-of-speech tagging for approximately a billion words of written English from the late fifteenth through the nineteenth century, including the Text Creation Partnership's Early English Books Online (TCP) and the Chadwyck-Healey full-text collections of English Poetry, English Drama (including the Folio text of Shakespeare), Early English Prose Fiction, the King James Bible, Eighteenth-Century Fiction, Nineteenth-Century Fiction, and Literary Theory.

There are roughly three million distinct spellings in this collection of texts, including approximately 500,000 foreign words (mostly Latin and French) and approximately 250,000 names. It is estimated that 750,000 spellings account for at least 99% of all word occurrences. The current version of the functionality available to EEBO users focuses on mapping the spelling of English words to their standard forms. No effort has been made yet to map the spellings of names to standard forms, which has problems of its own.

Reports on work-in-progress are available as PDF files from http://panini.northwestern.edu/mmueller/vospos.pdf.

CIC Member Institutions

The following institutions are members of the Virtual Modernisation Project, which has supported the development of the functionality now available to all users of EEBO:

  • Indiana University
  • Michigan State University
  • Northwestern University
  • Ohio State University
  • Penn State University
  • Purdue University
  • University of Chicago
  • University of Illinois at Urbana-Champaign
  • University of Iowa
  • University of Michigan
  • University of Minnesota
  • University of Victoria
  • University of Wisconsin at Madison
  • Columbia University
Feedback and Questions

This new version of EEBO is now available to all users of the service. Your feedback is important and will help us to further develop and refine variant spelling functionality in EEBO.

Please contact the EEBO Webmaster if you have any questions or suggestions.

Acknowledgements

We are grateful for the efforts of the following individuals who have worked on the Virtual Modernisation Project and who have made possible the resulting enhancements to EEBO:

Martin Mueller, Professor of English & Comparative Literature, Northwestern University
Jeffrey Garrett, Assistant University Librarian for Collection Management, Northwestern University
Phil Burns, Academic Technologies, Northwestern University
Jeff Cousens, Academic Technologies, Northwestern University
John Norstad, Academic Technologies, Northwestern University