Hebrew is at the Forefront of non-Latin Based Digitization Efforts

National Library of Israel and University of Haifa Library partner with JSTOR, a popular online library for academic journals, to establish a process for displaying content written in languages that use
non-Latin character sets.

The Judaica and Oriental reading room at the National Library of Israel; courtesy.

by Jeffrey Barken
Jewish news – JNS.org

Preserving scholarship, digitizing Hebrew text, and dramatically increasing access to archived scholarly materials written in Hebrew. Those are all likely outcomes following the National Library of Israel (NLI) and University of Haifa Library’s (UHL) recent entry into a contract with the popular online library for academic journals, JSTOR (short for Journal Storage).

For JSTOR, however, the project signals an additional opportunity.

Establishing a process for displaying content written in languages that use non-Latin character sets will facilitate JSTOR’s mission to disseminate high-quality scholarship produced worldwide. The organization has enlisted a digitization service provider, Apex CoVantage, which has assembled an international team tasked with developing digitization software that will meet JSTOR’s standards and support its collaboration with the Israeli libraries (dubbed the Hebrew Journals Project).

Digitization is the process by which print documents are converted to digital page images that can then support optical character recognition (OCR) resulting in full text files, enabling search engines to sift through and register the document’s core contents. Since text formats are ideal for researchers seeking keyword-specific articles, libraries around the world recognize the need for smart technologies that will facilitate fast and precise digitization of content in spite of language barriers. Israel is leading the way in this technological realm.

The Hebrew Journals Project is funded primarily by the planning and budgeting committee of the Council for Higher Education in Israel. The cost is estimated at $2.2 million. NLI and UHL, the Israeli libraries, first contacted JSTOR in 2008.

“We sought an experienced international partner that would provide a sustainable future for the Hebrew journals project and a wide distribution network to make the journals available for millions of users,” Oren Weinberg, director of the NLI, tells JNS.org.

Since JSTOR was founded in 1995, the organization has added more than 1,600 journals and over 1 million images, letters and primary sources from nearly 900 publishers and other institutions. The shared digital library has helped academic institutions lower storage costs and improve access to scholarly resources. In 2009, JSTOR merged with and became a service of ITHAKA, a non-profit that shares JSTOR’s original mission. Today, more than 7,600 institutions – including academic and public libraries, secondary schools and other groups based in 166 different countries – participate in JSTOR.

“In 2011 alone,” Sarah Glasser, associate director of marketing communications at ITHAKA, tells JNS.org, “JSTOR counted over 560 million significant accesses to content listed on the platform.”

Beginning in 2008, librarians from both Israeli institutions indicated four Hebrew journals to be digitized for the pilot project stage. That fall, representatives from NLI and UHL visited JSTOR’s offices in New York and Ann Arbor, Mich., to prepare for the pilot.

“One of the core things that we have accomplished is working together to define a set of digitization guidelines that are in line with JSTOR’s existing specifications, only specified further for Hebrew and for this project,” John Kiplinger, director of production at JSTOR, tells JNS.org. By early 2009, the pilot was underway.

To satisfy JSTOR’s specifications, and to develop a functional end product, it was necessary for JSTOR affiliates to be in regular contact with technical staff at UHL and NLI. Via conference call, technicians began considering and identifying the technical challenges posed by working with the Hebrew calendar as well as nuances in the Hebrew language that would complicate digitization. A librarian from UHL was sent to Ann Arbor in advance of the pilot to learn about JSTOR’s processes and to provide expertise on working with Hebrew.

Following this stage of development, JSTOR sought a vendor capable of processing Hebrew-language content and matching their professional digitization standards. Having worked with JSTOR in the past, Apex CoVantage is well versed in JSTOR’s procedures and therefore was the logical choice. The digitization service is now utilizing its staff and offices in Hyderabad, India, to adapt software—originally developed for the digitization of texts using Latin characters – to read Hebrew. Additionally, Apex has sent representatives to Israel to assess the pilot documents and to recruit a team of Israeli consultants who will oversee nearly half of the production.

This fall, Apex will produce several thousand pages of digitized content. In the interim, the principle objective is to choreograph communications and isolate recurring errors in the digitization process. JSTOR will conduct a final analysis of all digitized materials to ensure that each data batch meets its stated requirements.

“The Hebrew Journals Project is not going to change the publishing industry in Israel,” Weinberg acknowledges. However, he is encouraged by the project’s potential for improving educational systems around the world. When asked how adaptable the new technology and digitization process will be when applied to another non-Latin based language, Kiplinger is optimistic.

“We hope that the experience afforded to us through this project will make working with other character systems easier and more scalable for JSTOR,” he says. “Each character system will present new challenges that will have to be identified, analyzed and addressed. Consequently, we wouldn’t be ready to jump from Hebrew directly into Chinese or Arabic, but we also wouldn’t have to start from scratch.”