[{"_path":"/blog/collections-as-data","_dir":"blog","_draft":false,"_partial":false,"_locale":"","_empty":false,"title":"Using Digital Collections as Data","description":"Collections as Data is a phrase bandied about the internet, and one that has recently grabbed our attention here at Wayne State. As far as I understand it, it means providing access to online digital library collections in a way that facilitates programmatic research. Here at Wayne State, we have been busy redesigning our digital collections platform, both its front-end display as well as the back-end data that powers the website (its API). In doing so, we couldn’t help but be a bit inspired by the collections as data movement and see if we could work toward tilting our collections to harness some of those concepts.","date":"May 5, 2017","minutes":"3 minute read","layout":"post","url":"collections-as-data","summary":"It's all about reducing barriers to access","body":{"type":"root","children":[{"type":"element","tag":"h1","props":{"id":"using-digital-collections-as-data"},"children":[{"type":"text","value":"Using Digital Collections as Data"}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"Collections as Data is a phrase bandied about the internet, and one that has recently grabbed our attention here at Wayne State. As far as I understand it, it means providing access to online digital library collections in a way that facilitates programmatic research. Here at Wayne State, we have been busy redesigning our digital collections platform, both its front-end display as well as the back-end data that powers the website (its API). In doing so, we couldn’t help but be a bit inspired by the collections as data movement and see if we could work toward tilting our collections to harness some of those concepts."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"A bit of info on how our digital collections work. The front-end display (aka website) show alls the public-available image and digital resources from our repository. This data (metadata, rights statements, links to images, etc) all comes from our API. When a page loads, it queries the API for the associated data and after a bit of arranging, loads the page. These are two separate systems, each with unique ways of working. Users are free to use the front-end display to browse normally or to use the API to see all of our metadata ins our repository; the problem is that we had different ways of interacting with each. You might type in "},{"type":"element","tag":"a","props":{"href":"http://OUR_URL/item/ITEM_ID","rel":["nofollow"]},"children":[{"type":"text","value":"http://OUR_URL/item/ITEM_ID"}]},{"type":"text","value":" to see the web page for an item record and "},{"type":"element","tag":"a","props":{"href":"http://OUR_URL/WSUAPI/?functions%5B%5D=singleObjectPackage&PID=ITEM_ID","rel":["nofollow"]},"children":[{"type":"text","value":"http://OUR_URL/WSUAPI/?functions"},{"type":"element","tag":"span","props":{},"children":[]},{"type":"text","value":"=singleObjectPackage&PID=ITEM_ID"}]},{"type":"text","value":" to see the API data for that same item. For two systems that are tied together, this is not very intuitive for anyone. And, if you are into the idea of Collections as Data, you have added a confusing layer for anyone who wants to use the larger metadata store underneath the digital collections. We had the idea that people who were interested in the API (and doing some programmatic research with our collections) would first navigate the website and get familiar with its content then switch to the API. Unfortunately, with the current setup, they would have to learn a completely new way of navigating with the API. This is where content-negotiation comes in."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"As part of the HTTP standard ("},{"type":"element","tag":"a","props":{"href":"https://en.wikipedia.org/wiki/Content_negotiation","rel":["nofollow"]},"children":[{"type":"text","value":"https://en.wikipedia.org/wiki/Content_negotiation"}]},{"type":"text","value":"), content negotiation is the ability to ask for different representations of the same information by specifying its content-type. This could be something as simple as asking for HTML (text/html), PDF (application/pdf), or even a structured data like JSON (application/json). Remember that I said earlier that we were rebuilding our API and our front-end display. We had determined that, regardless of the system used--the API or the website--we were basically showing the same information to people just in different display formats. Therefore, it made a lot of sense to make the systems mirror each other. Now, to be clear, the ultimate use of content-negotiation might have been to merge the two. First, you type in the url, for example, "},{"type":"element","tag":"a","props":{"href":"http://OUR_URL/item/ITEM_ID","rel":["nofollow"]},"children":[{"type":"text","value":"http://OUR_URL/item/ITEM_ID"}]},{"type":"text","value":". Next, if you wanted structured data (normally available from an API), you'd ask for JSON; if you wanted a webpage with arranged data, you'd ask for HTML, using that same url. We're unfortunately not there yet. We need two different systems because they accomplish more than just what I've described (aka, there's complications). But, a nice middle ground, and one to gets us closer to providing researchers accesss to our collections as data, is to make sure the API and the front-end do the same thing with the same URL patterns."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"Therefore, we have ended up with this: our API and our front-end are still two distinct systems. But, you can get the same information from them by using the same url pattern. If you want the api, you'll include the word api in the url, and if you want the HTML web page, you'll remove it. "},{"type":"element","tag":"a","props":{"href":"http://OUR_URL/api/item/ITEM_ID","rel":["nofollow"]},"children":[{"type":"text","value":"http://OUR_URL/api/item/ITEM_ID"}]},{"type":"text","value":" vs "},{"type":"element","tag":"a","props":{"href":"http://OUR_URL/item/ITEM_ID","rel":["nofollow"]},"children":[{"type":"text","value":"http://OUR_URL/item/ITEM_ID"}]},{"type":"text","value":"."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"We're currently working to put this redesign into production. We hope that this will get us that much closer to making our digital collections more transparent and usable to end users looking for the data that powers it all. Look for a new blog post about this new system after it debuts!"}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"More resources:\n"},{"type":"element","tag":"a","props":{"href":"http://digitalpreservation.gov/meetings/dcs16/AsDataExecutiveSummary_final.pdf","rel":["nofollow"]},"children":[{"type":"text","value":"http://digitalpreservation.gov/meetings/dcs16/AsDataExecutiveSummary_final.pdf"}]}]},{"type":"element","tag":"p","props":{},"children":[{"type":"element","tag":"a","props":{"href":"http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf","rel":["nofollow"]},"children":[{"type":"text","value":"http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf"}]}]}],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}},"_type":"markdown","_id":"content:blog:collections-as-data.md","_source":"content","_file":"blog/collections-as-data.md","_extension":"md"},{"_path":"/blog/ubuntu-16-vagrant","_dir":"blog","_draft":false,"_partial":false,"_locale":"","_empty":false,"title":"Developing with Vagrant and Ubuntu 16.04","description":"When upgrading my Vagrant-run development environment to Ubuntu 16.04 (xenial) from 14.04 (trusty), I encountered a few vexing issues. Here's what I did to fix them.","date":"March 3, 2017","minutes":"1 minute read","categories":["vagrant","virtualbox","ubuntu","16.04"],"layout":"post","url":"ubuntu-16-vagrant","summary":"Changing over to Ubuntu 16.04 in Vagrant isn't so easy","body":{"type":"root","children":[{"type":"element","tag":"h1","props":{"id":"developing-with-vagrant-and-ubuntu-1604"},"children":[{"type":"text","value":"Developing with Vagrant and Ubuntu 16.04"}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"When upgrading my Vagrant-run development environment to Ubuntu 16.04 (xenial) from 14.04 (trusty), I encountered a few vexing issues. Here's what I did to fix them."}]},{"type":"element","tag":"ol","props":{},"children":[{"type":"element","tag":"li","props":{},"children":[{"type":"text","value":"Make sure your versions of Vagrant and Virtualbox (if this is your provider) are up-to-date."}]},{"type":"element","tag":"li","props":{},"children":[{"type":"text","value":"The normal Ubuntu Vagrant box does not seem to work. In other words, changing your box to config.vm.box = \"ubuntu/xenial64\" is going to be open a world of hurt. See "},{"type":"element","tag":"a","props":{"href":"https://bugs.launchpad.net/cloud-images/+bug/1569237","rel":["nofollow"]},"children":[{"type":"text","value":"here"}]},{"type":"text","value":" and "},{"type":"element","tag":"a","props":{"href":"https://github.com/mitchellh/vagrant/issues/7155#issuecomment-228568200","rel":["nofollow"]},"children":[{"type":"text","value":"here"}]},{"type":"text","value":" for more. I've found success with the boxes made by Bento.\nSolution:\nChange Vagrant Box to"}]}]},{"type":"element","tag":"code","props":{"code":"config.vm.box = \"bento/ubuntu-16.04\"\n","meta":null},"children":[{"type":"element","tag":"pre","props":{},"children":[{"type":"element","tag":"code","props":{"__ignoreMap":""},"children":[{"type":"text","value":"config.vm.box = \"bento/ubuntu-16.04\"\n"}]}]}]},{"type":"element","tag":"ol","props":{"start":3},"children":[{"type":"element","tag":"li","props":{},"children":[{"type":"text","value":"With Ubuntu 16.04, there seems to be a recurring issue with running upgrades. With the version of I am using at the time of this post (bento/ubuntu-16.04 v2.3.1), it can trigger an upgrade screen for Grub. This breaks the ability to run a provisioning script non-interactively. The issue seems to pop-up occasionally; see "},{"type":"element","tag":"a","props":{"href":"https://github.com/chef/bento/issues/661","rel":["nofollow"]},"children":[{"type":"text","value":"here"}]},{"type":"text","value":"."}]}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"Solution:\n"},{"type":"element","tag":"a","props":{"href":"http://stackoverflow.com/questions/40748363/virtual-machine-apt-get-grub-issue/40751712","rel":["nofollow"]},"children":[{"type":"text","value":"Tell the system to handle any prompts non-interactively"}]}]},{"type":"element","tag":"code","props":{"code":"export DEBIAN_FRONTEND=noninteractive apt-get -y -o DPkg::options::=\"--force-confdef\" -o DPkg::options::=\"--force-confold\" upgrade\n","meta":null},"children":[{"type":"element","tag":"pre","props":{},"children":[{"type":"element","tag":"code","props":{"__ignoreMap":""},"children":[{"type":"text","value":"export DEBIAN_FRONTEND=noninteractive apt-get -y -o DPkg::options::=\"--force-confdef\" -o DPkg::options::=\"--force-confold\" upgrade\n"}]}]}]}],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}},"_type":"markdown","_id":"content:blog:ubuntu-16-vagrant.md","_source":"content","_file":"blog/ubuntu-16-vagrant.md","_extension":"md"},{"_path":"/blog/varnish-cache-midnight","_dir":"blog","_draft":false,"_partial":false,"_locale":"","_empty":false,"title":"Burning the Midnight Oil (with Web Caching)","description":"I recently delved into the world of web caches. In case you're not familiar, caching is the ability to save a part of website for quick and easy retrieval. Often developers will cache parts of sites that are slow and have fairly static content. Our digital collections site had for many years been quick and nimble; however, over the past year, it had started to show its age and become increasingly pedestrian. In order to remedy this, we first delved into our site's code to fix any inefficiencies that might be holding things up. But, after doing all that work, the site, though much improved, needed a bit more pep. We turned to web caching as a solution. Luckily, we weren't completely unfamiliar with the concept, as we had used it in part of our infrastructure already; however, now, we had proposed to apply it to our entire website--something we had never done before.","date":"March 3, 2017","minutes":"4 minute read","categories":["Varnish","midnight","cache","expiration"],"layout":"post","url":"varnish-cache-midnight","summary":"Setting a Varnish cache to expire at midnight","body":{"type":"root","children":[{"type":"element","tag":"h1","props":{"id":"varnish-set-cache-to-expire-at-midnight"},"children":[{"type":"text","value":"Varnish - Set Cache to Expire at Midnight"}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"I recently delved into the world of web caches. In case you're not familiar, caching is the ability to save a part of website for quick and easy retrieval. Often developers will cache parts of sites that are slow and have fairly static content. Our digital collections site had for many years been quick and nimble; however, over the past year, it had started to show its age and become increasingly pedestrian. In order to remedy this, we first delved into our site's code to fix any inefficiencies that might be holding things up. But, after doing all that work, the site, though much improved, needed a bit more pep. We turned to web caching as a solution. Luckily, we weren't completely unfamiliar with the concept, as we had used it in part of our infrastructure already; however, now, we had proposed to apply it to our entire website--something we had never done before."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"We chose Varnish, a modern and popular web caching software. This software stands in front of our website, kind of like a guard in front of a palace. If someone asks for something (a web page, an image, a piece of metadata), it checks to see if it has a copy of it already available. If it does, then it gives it back to the user immediately; if not, it asks the repository for the content--which is where the site can be slow...waiting for the content to be retrieved from the repository. As I mentioned previously, many people use web caching when their site is mostly static content; ours, however, is a bit more dynamic. New content is being added or updated fairly regularly. Therefore, if we want to realize any of the speed gains from web caching, we had to find a smart balance between serving content quickly and serving the most up-to-date results. For the most part, this is not a difficult thing to accomplish, but the tricky part comes in our search results page. This page is the most heavily used part of the site, and one that needs the most current data. Our solution, therefore, was to have the cache reset at midnight each night. The difficult part is in the implementation. Every time someone visits a new search results page, we have to activate a setting (known as a header) that lists the amount of seconds remaining until midnight. This header is what will ensure the cache works properly. What follows is the code (and a fairly in the weeds explanation) of how to accomplish this. My hope is anyone else faced with this challenge will find the information below useful."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"Configuring Varnish can be a tricky thing. Much of the learning curve here came from having to learn the Varnish Configuration Language (VCL). The "},{"type":"element","tag":"a","props":{"href":"https://varnish-cache.org/docs/trunk/reference/vmod_std.generated.html","rel":["nofollow"]},"children":[{"type":"text","value":"Varnish Standard Module documentation"}]},{"type":"text","value":" came in handy here, as I had to use built-in functions from std to do a bit of type juggling while I made my calcuations."}]},{"type":"element","tag":"p","props":{},"children":[{"type":"text","value":"You'll note that I used HTTP headers as a form of variable in this case, and that because neither standard VCL nor the vmod_std module supported the used of the modulus operator, my calculations are a bit long. In case you need to perform a similar calcuation, I've included the code below."}]},{"type":"element","tag":"code","props":{"code":"#Make sure to import the std module at the top of your .vcl file. Like so:\nimport std;\n\n\nsub vcl_backend_response {\n\n  # Set expiration date to expire at midnight\n  # First, calculate the amount of seconds that have occurred today; there are 86400s in a day\n  # Normally, this is the amount of seconds since Linux epoch % number of seconds in a day;\n  # however, vcl doesn't support the modulus operator (which would give us the remainder), so\n  # here's the long-hand version\n\n  set beresp.http.exp = std.integer(std.time2integer(now, 0) / 86400, 0);\n  set beresp.http.exp = std.integer(std.integer(beresp.http.exp, 0) * 86400, 0);\n  set beresp.http.exp = std.integer(std.time2integer(now, 0) - std.integer(beresp.http.exp, 0), 0);\n\n  # Now we need to calculate the amount of seconds we have left before midnight\n  # Subtract the amount of seconds in a day from the amount that we've already gone through in order to get the amount of seconds remaining\n  # Also, make that final number into a string because that's how ttl will need it\n\n  set beresp.http.exp = 86400 - std.integer(beresp.http.exp, 0) + \"s\";\n  set beresp.ttl = std.duration(beresp.http.exp, 1d);\n\n  # We're going to let Varnish respond with its expiration date (aka Expires Header)\n  unset beresp.http.expires;\n\n  # Now let's reset the Expires header based upon our current ttl\n  set beresp.http.Expires = \"\" + (now + beresp.ttl);\n}\n","meta":null},"children":[{"type":"element","tag":"pre","props":{},"children":[{"type":"element","tag":"code","props":{"__ignoreMap":""},"children":[{"type":"text","value":"#Make sure to import the std module at the top of your .vcl file. Like so:\nimport std;\n\n\nsub vcl_backend_response {\n\n  # Set expiration date to expire at midnight\n  # First, calculate the amount of seconds that have occurred today; there are 86400s in a day\n  # Normally, this is the amount of seconds since Linux epoch % number of seconds in a day;\n  # however, vcl doesn't support the modulus operator (which would give us the remainder), so\n  # here's the long-hand version\n\n  set beresp.http.exp = std.integer(std.time2integer(now, 0) / 86400, 0);\n  set beresp.http.exp = std.integer(std.integer(beresp.http.exp, 0) * 86400, 0);\n  set beresp.http.exp = std.integer(std.time2integer(now, 0) - std.integer(beresp.http.exp, 0), 0);\n\n  # Now we need to calculate the amount of seconds we have left before midnight\n  # Subtract the amount of seconds in a day from the amount that we've already gone through in order to get the amount of seconds remaining\n  # Also, make that final number into a string because that's how ttl will need it\n\n  set beresp.http.exp = 86400 - std.integer(beresp.http.exp, 0) + \"s\";\n  set beresp.ttl = std.duration(beresp.http.exp, 1d);\n\n  # We're going to let Varnish respond with its expiration date (aka Expires Header)\n  unset beresp.http.expires;\n\n  # Now let's reset the Expires header based upon our current ttl\n  set beresp.http.Expires = \"\" + (now + beresp.ttl);\n}\n"}]}]}]}],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}},"_type":"markdown","_id":"content:blog:varnish-cache-midnight.md","_source":"content","_file":"blog/varnish-cache-midnight.md","_extension":"md"}]