A statistical summary of the activities of the Vanderbilt Television News Archive is available here: Dec 2006, Jan 2007.
A cumulative statistical summary of the Archive's activities for 2006 calendar year is available here.
In January 2007 the Archive set an all-time record in the number of videotape loan requests fulfilled and in the income received for loan service fees.
In support of the project funded by the Kress Foundation to digitize the photographs from the Contini-Volterra collection, Marshall created a new version of the image management system. This version of the interface was significantly different than previous versions in that it uses MySql as the relational database instead of DB/TextWorks. This will be the first of the library projects to use this version of the infrastructure we use for digital projects.
Over the last year, we have seen a significant increase in the activity on the servers that support the TV News Web interface and database. Increasingly, we are seeing the underlying database engine, based on DB/TextWorks unable to keep up with the load. Database transactions that involve updating records at time fail to respond quickly enough and can result in the queuing of multiple transactions. We see some episodes where a given transactions holds the indexes open improperly, causing other transactions to stack up. Overall, we have become a victim of our success and need to migrate the system to a faster database engine.
We have long planned to replace DB/TextWorks with MySQL, on open source relational database management system that is well proven for these type of applications. The TV News applications, along with the others that use the perl-based digital library software, were designed from the beginning to as independent of any given database as possible. The application separates the database from the application by using the Open Database Connectivity specification, and by passing all database transactions using standard SQL syntax.
All database transactions are defined in a single file of perl routines. This approach compartmentalizes the database-oriented functions into a single file that can be adjusted as necessary to accommodate a different back-end database.
The transition from DB/TextWorks toMySQL, involved several steps:
The conversion of each of these database involves several steps:
Decisions had to be made whether to use the INNOB or the MyISAM storage engine for each of the tables. InnoDB, a faster database for rapid transaction was implemented for all the tables except for tvn. Only MyISAM supports the FULLTEXT indexing essential for searching the abstracts.
Indexes were created automatically in DB/TextWorks. Indexes needed to be specifically defined in MySQL and queries optimized to take advantage of them. A simple IP lookup, for example, takes 6 minutes to return against the tvnip table. Once indexed, results returned instantaneously. FULLTEXT indexes were created within tvn for the title and abstract columns as well as for Reporters, with BTREE indexes produced for Date, RecordHeader, HeaderLink, and other fixed-length fields.
Over 323 separate perl programs underly the TV News Web site. These programs produce the user interface for staff and public users of the site, management of digitization workflow, maintenance of the databases, and financial transcactions related to loan requests and subscriptions. Almost all of these programs were reviewed and modified in the process of migrating the Web site.
One of the significant differences between DB/TextWorks and MySQL involves dates. All dates in the original database have to be converted into the more standard form required by MySQL. MySQL performs more rigorous data typing and will not malformed data to be loaded. Several of the database required significant data clean-up as a prerequisite to conversion.
Once all the databases were created and loaded with the converted data, it was necessary to adjust almost all of the perl scripts that comprise the application. Only part of these changes were necessary to perform a lateral conversion, but we also wanted to take the opportunity to make a number of enhancements to the application. These enhancements included:
Other enhancements made during the transition included:
The migration to the new database will occur in two phases. On Friday, Feb 9, 2007, all staff activity switched to the new system. We anticipate conversion of the public side by Feb 16, 2007. In the mean time changes made on the staff side will not show up on the public server.
Marshall had previously met with Shanmuga Sundaram who is the person in ITS responsible for developing storage strategies and infrastructure. As a result of these conversations, ITS has agreed to provide TV News with 6TB of of storage to help use with our short-term storage needs. How longer term needs will be handled is still under discussion.
Marshall worked on some revisions to the Perl script that performs the extract of records from the TV News database in Dublin Core XML format. One of the problems that turned up involved ensuring that the records all passed XML validation as they were harvested into Primo. It took several tries, but Marshall was able to develop a perl script that performed the necessary transformations to ensure the production of valid XML documents.
Marshall participated in the site meetings with Ex Libris staff on Jan 15-16.
For part of December and January, Marshall's office was under rennovation to remove a wall that encased a 4ft X 10ft space, originally created for plumbing, that is no longer used. Removal of this wall allowed the creation of a new door opening into the OUL suite, which makes the adjacent office more usable. Marshall worked from an unoccupied office on the 9th level during the rennovation.
Marshall participated in regular committee meetings, including Strategy and Planning Council, Metadata Committee, TV News Staff.
Marshall attended the American Library Association Midwinter Meeting in Seattle, WA. Conference activities included participation on the LITA Top Technology Trends panel, convening the SirsiDynix Large Sites Users Meeting and many individual meetings with executives from library automation companies.
Marshall's regular column appeared in Computers in Libraries and he wrote several short articles for the October and November issues of ALA TechSource Smart Libraries Newsletter.