All the records, all the time

Morgan Strong's blog | Created 1 decade ago

One admirable decision that the WA Museum recently made was to publish all of the Museum’s records as a free resource on the website.

I had the personal responsibility of publishing 110 years worth of records onto the site (but fortunately not the responsibility of scanning more than 12,000 individual pages). Although it was a lot of work, it was very satisfying knowing that we were the first museum in the country to release all records, free of charge for download on the web (correct me if I am wrong – but I believe at locations most records still need to be requested).

The next step is to introduce a clever search engine that will search references, keywords and topics within the records.

All of the records have been published with OCR; thus, it would be fairly simple to index all the files with a search engine and then let it go. However, this isn’t very meaningful. We want to be able weight authors, rank relevant keywords in terms of their context, rank the relevancy of publish dates, and weight keywords in titles and headings verses in body content etc etc etc.

A meaningful search engine will significantly empower this new free resource; turning it from a document repository into a powerful scientific database in its own right.

We have already started developing a search candidate and plan to get it running in beta version within the next few months. Stay tuned.