Tuesday, July 28, 2009

Perseus under PhiloLogic

Perseus under PhiloLogic
The Perseus Project at Tufts University is the foremost Digital Library for the classical world, if not for the Humanities in general. In its collection of Greek and Roman materials, readers will find many of the canonical texts read today. The Greek collection approaches 8 million words and the Latin collection currently has 5.5 million. In addition, many English language dictionaries, other reference works, translations, and commentaries are included, so that anyone with an internet connection has access to the equivalent of a respectable College Classics library. The Perseus site is further enriched by intricate linking mechanisms among texts (resulting in more than 30 million links).

You will here find the same texts, but the mechanism for browsing and searching the text is a different one. It is PhiloLogic, a system that was especially developed for large textual databases by the ARTFL project at the University of Chicago.

You can help us improve this site: If you encounter a problem, please use the "Report a Problem"User Manual gives a general introduction to searching under Philologic. This particular collection has its own special features, however. For a few quick hints to get you started, check out the link that you will find on the Results pages. In addition, we hope you will select the correct parses when you use the parse window. You will see your selection turn yellow; it will also be stored in the database. The Info and Help section on the full search forms

What is PhiloLogic?

PhiloLogic™ is the primary full-text search, retrieval and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago. This is a Free Software implementation of PhiloLogic for large TEI-Lite document collections. The wide array of XML data specifications and the recent deployment of basic XML processing tools provides an important opportunity for the collaborative development of higher-level, interoperable tools for Humanities Computing applications. The sophistication and power of the TEI-XML encoding specification supports the development of extremely rich textual data representations that encourage, if not require, development of sets of tools to exploit features of encoded text to perform particular tasks. It may be the case that one general tool will never fit all possible uses for encoded documents, but that a set of more specialized, interoperable tools for end-user applications will provide a mechanism for cost-effective deployment of end-user applications.

As the ARTFL Project's contribution to the collaborative development of these tools, PhiloLogic has been enhanced to support a wide variety of TEI-Lite (XML and SGML) encoded documents optionally using the Unicode character specification. We feel that Humanities Computing applications are particularly well suited to open source development by a community with wide ranging technical abilities that is not well supported by the commercial sector. Our goal is to provide as many features as possible while not requiring significant administrative or development work to use effectively.

Originally implemented to support large databases of French literature, PhiloLogic has been extended to support a wide variety of textual and hypermedia databases in collaboration with numerous academic institutions and, more recently, commercial organizations. PhiloLogic is a modular system, in which a textbase is treated as a set of coordinated or related databases, typically including an object (units of text such as a letter, scene, document, etc) database, a word forms database, a word concordance index mapped to textual objects, and an object manager mapping text objects to byte offsets in data files. Each of these databases is stored and managed using its own subsystem.

No comments:

Post a Comment