July 2003
| 525 | new customers registered on the website |
| 54,264 | total entries in activity log |
| 11,305 | views of the home page |
| 3,389 | views of the search page |
| 5,146 | searches executed |
| 2,324 | Calenadars viewed |
| 13,566 | individual records viewed |
| 6,215 | program listings viewed |
.
View the Cumulative Table of Statistics for the Archive's activities.
Throughout July, training and testing of the new digital off-air recording facility took place in preparation for the final cutover planned for August 5th. The videotape-based system continued to operate through July, using the video signals and time stamp generated by the old off-air system. During this period we are digitizing simultaneously as we record onto videotape. This gives the Archive staff the opportunity to become more familiar with the digital equipment in advance of the final switch over. This month a total of 281 hours of video were digitized, contributing to a total of 775.5 hours that we have digitized to date overall.
One of the components of the grant from the National Science Foundation involved acquiring storage equipment for dealing with digital video files. We ordered a set of 4 Dell PowerEdge 2650 servers that will together provide cumulative storage of about 4 TB (500GB each), a rack-mount cabinet to house the servers. We will install this equipment in the server facility on the 4th level of the General Library Building. These servers will be used to store the RealMedia files that we create, but will also be equipped with the Helix Universal Server software for streaming video files. These four Dell PowerEdge systems will form the initial part of the cluster of servers that comprise the video delivery system for the Archive. No content, of course, will be made available until we have legal agreements in place.
Transcoding Application. Marshall created a server-based distributed system for producing lightweight streaming video files in RealMedia format from our MPEG-2 masters. The design of the digital workstations of the new off-air recording facility includes the ability to perform this transcoding. Each transcoding operation takes about 1.5 times the length of the program and uses a high percentage of the computer's processing power. The transcoding part of the process proved to be somewhat of a bottleneck in the digitizing process.
In order to provide some relief to this problem, Marshall designed and implemented a server-based scheme for transcoding MPEG-2 files into RealMedia. This involved the creation of a database that provides a queue for files needing to be transcoded. An interface was created that allows the digitizing operator to simply select a file and submit it to the transcoding system. A record is created in the queue with a status of "Waiting." One or more servers participate in a distributed application for servicing the queue. Each of these servers is equipped with the Helix Producer Plus software that performs the transcoding. A set of Perl scripts constantly monitors the queue and submits a command to Producer if there is a file ready to be processed. Based on the data in the record in the queue, the script connects to the computer that submitted the request, retrieves the MPEG-2 file, and processes it with Producer to create the RealMedia file, which is deposited on the streaming media server. The application uses a set of status flags reflected in the database record, to control the flow of files and to indicate any errors. At any point a staff member can use the Web interface to monitor the jobs that are currently transcoding and see the status of those waiting, finished, or those that generated errors. There are currently two servers participating in the transcoding application. More will be added as needed.
Restructuring of TV News database. We implemented a significant change in the TV-NewsSearch database related to the way that records are organized that represent a whole broadcast. Previously, the multiple segment records that comprise a broadcast were simply gathered based on having the same record type, date, and network. While this scheme has proved to work well overall, it poses problems for the few cases where the collection includes multiple regular evening news broadcasts by the same network on the same day.
In order to accommodate these exceptions and provide a better organizational structure overall, a new record linking schema was introduced. For each broadcast, the initial record has always been defined as a header record, and carries information that pertains to the entire program. We created a HeaderLink field in which all the segment records in the broadcast other than the header record is populated by the record key of the header record. This structure provides an unambiguous linking of all the records in a program and avoids the issues we encountered by gathering the records by descriptive fields.
Creating the change in the database schema was relatively simple. Populating the entire database with the linking data was more complex since it involved changing almost all the 725,000 records in the database. The data for the linking fields was created by a Perl script that Marshall developed to pass through the entire database, year-by-year, and write out the appropriate HeaderRecord key associated with HeaderLink field for all the evening news segment records in the database. This script was run against a copy of the database on another server so as not to degrade performance of the production server. Once the file of HeaderLinks was produced, it was imported on the production server, using the batch import facility of DB/TextWorks. The HeaderLinks files were loaded one year at a time, each of which loaded in about 20 seconds, producing only a small impact on performance of the production server.
Once all the HeaderLink fields were populated, all the programs involved in displaying records were adjusted to use the new schema. It was also necessary to enhance the programs that load the abstracts into the database so that the records it creates conform to this new record structure. This script was also enhanced to calculate the program duration, to automatically set the Digitized flag to on, since all new programs are recorded digitally, and to set the DateDigitized field to the program date.
Another change in the database involved creating a field that indicates the length of each regular news broadcast. This piece of data did not exist when the database was initially created. Some of the display and electronic ordering programs calculate the length of the program on the fly by subtracting the EndTime of the last record from the BeginTime of the first record. The absence of this data field made it very difficult to calculate the total number of hours of regular evening news programs in the collection since we have a mix of half-hour and full-hour broadcasts. Since we had to make a systematic pass through the database to gather the HeaderLink data, it was a good opportunity to also add a ProgramDuration field to each regular news header record. As the program processed each set of records that comprise a broadcast to gather the HeaderLink data, it also used the BeginTime and EndTime data to calculate the number of seconds for each ProgramDuration field. This data was then imported into the production database for the 36,000 header records for evening news broadcasts.
Populating the ProgramDuration data in the TV-NewsSearch database made it possible to then create reports that quantify the amount of material we have digitized. According to these reports, we now have 893 hours of programs converted to digital format.
Marshall met with Paul and Doug Knight regarding the ongoing work of ETANA. This year NSF funded a project to develop an application for gathering and aggregating archaeological data. The grant was awarded to Case Western Reserve University, with much of the work taking place at Virginia Tech. While Vanderbilt will not be extensively involved in the development phase, we will host the application once it is completed, about two years from now.
Marshall taught a session of the Retirement Learning workshops on "Searching the Web"and attended meetings of the Web Development Task Force, Library Management Council, University Librarian's Advisory Group.
Marshall was on vacation from July 4 - 13.