Duke Wiki  logo
Skip to end of metadata
Go to start of metadata

Describes the frequency of updates and documents material excluded from the Endeca catalog.

Types of updates:

There are two types of extracts from Aleph to Endeca, and a live-status update (which replaced the previous half-hourly extract):

* Delta (overnight) extracts: In order to save time on the processing and the re-indexing for Endeca which runs all TRLN libraries after midnight, the updated records are sent out in two extracts per day. The selection logic checks for the following types of changes:

  • Bibliographic records updated or deleted that day
  • Holdings records updated or deleted that day
  • Item records updated or deleted that day
  • Items with a change in circulation status (see below for details)

* Note, 20140624: Full extracts are generally not needed anymore.

Full extracts: these are currently being done whenever there is a change made to the extract logic that affects a large number of records. These are initiated by ILS staff and applied manually to the TRLN database

 

* Note, 20140624: The partial extracts were replaced by a live item status update, which checks for the same kinds of changes.

Partial (half-hourly) extracts: these run every half hour, on the hour and half hour. They only check for changes in circulation status, specifically:

  • Loans
  • Returns from loan
  • Renewals
  • Recalls and recall cancellations
  • Loan declared lost (i.e., long overdue and billed)
  • Item "claimed returned"
  • Loan deleted
  • Due date changed

 

Timing of Updates

Partial (half-hourly) updates: Replaced by live status updates, where an item’s actual current status is read from Aleph through an X-server call, whenever the catalog display is accessed.

* These run on our Aleph server on the hour and half hour, then are shipped to the TRLN server, where they are applied at 10 after the hour and 40 minutes after then hour. In other words, a change made at 11:29 would be picked up in the 11:30 run and appear in Endeca around 11:40. A change made at 11:01 would also be picked up in the 11:30 run and appear in Endeca around 11:40.

* Exception for overnight processing: half-hourly files generated between midnight and the end of the overnight processing are not applied until the overnight processing is complete. Currently, overnight processing finishes around 10:00 a.m. and the pending circulation status changes should show up around 10:10 a.m.

Delta (overnight) updates: This starts on our Aleph server at 12:01. Currently, two delta extracts are run per day. The first run, which contains the bulk of the changes for the day, and with a lot of editing activity, may take up to 2 hours, whereas the second and shorter run can take about 10 minutes. Each time, the file is shipped to the TRLN server, which begins processing all the TRLN updates at 2:15 a.m. Endeca also completely re-indexes the entire TRLN database. This process is finished and the new version made available around 10:00 a.m.

* 7 PM extract: Aleph extracts the bulk of the daily record changes, processes them and sends the resulting output to Endeca. Endeca starts processing the records through TRLN’s DUKE pipeline at 11:06 PM.

* 12 AM extract: Aleph extracts the remaining record changes from the earlier run, processes them, and sends the resulting output to Endeca. Endeca starts processing the records through TRLN’s DUKE pipeline at 12:20 AM.

* Note: Endeca “daily extract” runs at 2:03 AM, combining all the partner institutions’ daily record changes, and processing them through the main TRLN pipeline, and re-indexing.

 

Exclusions from Updates (deliberate)

Note, 20140624: The exclusions lists below are from 8/30/2012 and need to be updated.

Entire bibliographic records are excluded in the following circumstances:

* The bib record has been deleted

* The bib records is SUPPRESSED or from the CRASH period

* The bib record has an item with one of these item statuses:

  • 24 (Laptop Computer)
  • 29 (Digital Video Camera)

* The bib record has an item with one of these material types:

  • DVCAM (Digital Video Camera)
  • TRIPD (Tripod)
  • UMBRL (Umbrella)
  • LAPTO (Laptop Computer)

 

Individual item records are excluded in the following circumstances:

* The item processing status does not display in the Classic Aleph OPAC, i.e.:

  • CA (cancelled)
  • EX (expected serial issue)
  • LO (lost)
  • NA (expected)
  • NP (not published)
  • OP (out of print)
  • OS (out of stock)
  • SU (suppressed)
  • WI (withdrawn)

* The item status is:

  • 22 (Passwords - Ford)
  • 30 (Preservation Copy)

* The material type is PASS (passwords - Ford)

 

*If you need to re-create an extract of specific records, there is a script in the /home/aleph/endeca/test directory called print3_ids.pl that will pull the bib numbers from a file so you can pull a fresh extract of those records.

  • No labels