Hands-on: Understanding Applications in MarkLogic Server: Part 2
In this tutorial we will walk you through the creation of a simple web based Mark Logic application. This tutorial builds upon the foundation that we laid in Part 1. If you haven't completed that tutorial yet, now might be a good time as we are going to pick up where that tutorial left off and start building an application on the setup we performed in that tutorial. This tutorial isn't designed to teach you to be some AJAX wielding web ninja nor is it designed to make you an XQuery guru. What it will do is show you how to create a simple web base application that leverages the power of Mark Logic and along the way we'll pick up some best practices for building our applications. Enough already, on to the actual tutorial!
Loading the Data
In order to make any kind of interesting application we are going to need some data. We currently have only one document loaded into the server so let's go ahead and load a bunch of other data in order to give us a good amount of data to work with. We'll continue to work with the Shakespeare data that we've already downloaded. We're going to use the same process that we used to load data last time. Go to the admin interface for the Mark Logic server, find the Shakespeare database and then go to the "Load" tab. Once again we want to enter into the directory field the absolute path to where we unzipped the Shakespeare plays. This time instead of entering a file name into the filter field we're going to enter *.xml. This will tell the server to load all of the XML files in the directory. Don't worry that we've already loaded a_and_c.xml during the previous tutorial. It will simply be overwritten with the version that we just loaded.
Where to Put Our Code?
Now that we've loaded up all of our data we're pretty much ready to start creating our web application. As I'm sure you remember, Mark Logic is also an HTTP server and we configured an HTTP server instance (named Shakespeare) that is tied to our Shakespeare database during the Getting Started tutorial. As part of that configuration we chose a directory that the server will have read / write access to as the root directory for the HTTP server. I don't know what you choose, but I choose /Users/clarkrichey/Documents/MarkLogic/Education/Tutorials/Tutorial 2/xquery/. For the purpose of our discussion here, I don't care about the full path, only the name of the root directory which, in my case, is xquery. The actual name doesn't really matter much but as a matter of style xquery is probably a good choice as it clearly indicates that this is where we are going to be storing the XQuery code that we are going to write. I suppose that code or src would do as well but I like the fact that it's clear what kind of code is going to be found here - XQuery. This also gives us flexibility in larger projects where we may be working with Java or .NET code along side of the XQuery code.
Hello World!
At this point your xquery directory (or whatever you called it if you're doing your own thing here) should be empty. No files, no subdirectories. Zippy. So, let's try something. Let's go to http://localhost:8010 and see what happens. You should have been prompted for your login credentials and then you should have recieved a 404 Not Found error. This seems logical as we just verified that the root directory for our HTTP server is empty. So, let's fix that. Create a file called default.xqy in your xquery directory and paste the following code into that file:
Now, lets try this again. Lets go to http://localhost:8010. Ah ha! Now we see our requisite Hello World web page. Now that things are working we should take a moment to talk about what exactly is happening. First off, when the http server is presented with a request that does not specify a page (as in the request we just made) it automatically looks for a file named default.xqy in the HTTP server's root directory. Now that we have created just such a page it was processed and the results were returned in our browser. We would have seen the exact same results if we had instead gone to http://localhost:8010/default.xqy.
Creating Web Content
OK, that's all well and good but what was all that weird code we put into the file? Well, I'm glad you asked. The Mark Logic HTTP server provides us a way to dynamically create web pages much in the same way that you can with JSP or PHP pages. However, because Mark Logic is an XQuery platform and because we gave our file the .xqy extension, the server is expecting that this file will contain vaild XQuery code. More specifically, the server is expecting that the file will contain a main module. A main module is simply some code that can be directly executed as an XQuery program. It must include, at a minimum, a query body consisting of an XQuery expression (which in turn can contain other XQuery expressions, and so on). Our main module includes just a single XQuery expression which is the call to xdmp:set-response-content-type. This function is used to set the response encoding. We used this call to set a response encoding of text/html so that the browser would know to interpret the results as html as most browsers do not intrinsically know what to do with content ending in .xqy.
However, as you will note from the documentation, the call to xdmp:set-response-content-type returns an empty sequence. Clearly, an empty sequence is not what we want in order to create a valid web page. In order to get our HTML returned to the browser we have to include it as part of the sequence that is returned. We did that by using the empty sequence returned by xdmp:set-response-content-type as merely the first element in a larger sequence being returned to the browser. The ',' that we placed after xdmp:set-response-content-type("text/html") indicated that what followed next was the next element in the sequence. We then enclosed the information we wanted returned to the browser within parenthesis so that it would be treated as a single entry in the sequence. That entry contained the string DOCTYPE declaration followed by the HTML element that we wanted sent to the browser. I realize that all of this returning of sequences appended to sequences that contain yet more sequences sounds a bit daunting at first but I assure you that with just a little practice it becomes second nature in no time at all. Additional information on sequences as return types from XQuery expressions can be found in section 4.1 of the XQuery Reference Guide.
Dynamic Content
That was a good start but returning static HTML really isn't very useful for actually building applications. In order to really do something useful and interesting we need to return dynamic content. Well, as I alluded to earlier, we have the ability to include script that will be evaluated dynamically much as you can with JSP or PHP pages. The main difference here is that instead of embedding Java or Python in our pages we're going to embed XQuery to provide our dynamic functionality. Adding that functionality couldn't in fact be simpler. All we need to do is to take the XQuery code that we want evaluated and enclose it within {{}}. So, let's try that out by adding a very simple XQuery expression to our default.xqy page.
Now when we view this page in our browser we see the version of the Mark Logic server dynamically displayed as part of the HTML. This is due to the server evaluating the XQuery expression xdmp:version() that it encountered within the {{}} and returning the result as part of the HTML response. This ability to embed XQuery directly within our HTML will serve as the foundation for building up much more complex web applications. Let's continue our exploration of this capability by creating a small application that actually leverages the Shakespeare content that we went through so much effort to load into the server. Rather than having you go through all of the effort of copying and pasting some code, why don't you just download the simple application that I wrote so that we can discuss it in some more depth?
Let's get Modular with it
Go ahead and expand the zip file your just downloaded into the same directory where you placed the default.xqy file that we worked with earlier. What I want to do now is to take a little bit of time to talk about how this very simple application is structured. However, before we do let me provide a disclaimer. There is no single correct way to structure your XQuery application. However, there are some good fundamental practices and concepts that will help you to create a good structure for your projects. What we will be looking at now are some of those fundamental practices and concepts. So, after unzipping the application you should notice the addition of two new files, search.xqy and results.xqy as well as a new directory named modules. Let's ignore the modules directory for a moment and focus on those two new XQuery files. search.xqy is a very simple bit of code that just create a form allowing users to enter some text for the speaker they are searching for and then submits that form to the results.xqy page.
OK, clearly the results.xqy page is where a lot of the work must be happening. Let's dive in and see what's going on. A quick peek at this page shows that starting on line 10 we are looping through some sequence of SPEECH elements and displaying the LINE elements contained in each speech. Where did we get these search results from? Let's look more closely at line 10. Here we're calling some function called find-speech in the search-lib namespace. That sounds promising but what is that function and where did it come from? Well, if we look at the code on line 1 we see that we are importing a module in the search-lib namespace and that we expect to find the file containing that module at the relative path modules/search-lib.xqy. Hmmmmmm...that's interesting. Do you remember how we talked about main modules earlier? Well, there is another type of module called a library module and that is what we are importing. Library modules, unlike main modules, are not directly executable by the server. Instead they house reusable bits of code, typically functions, that we can access from elsewhere in our application as we did here. Think of library modules like JAR files in Java or DLLs in .NET. It's not exactly the same thing but the idea is close enough. So, according to that import statement we just looked at on line 1 we should be able to find this library module in a file called search-lib.xqy within the modules directory. Let's pop that file open and see what we find!
The first interesting things in this rather short and simple files appears on line 2 where we are declaring the namespace that is associated with this module (module namespace search-lib = "http://www.marklogic.com/tutorial2/search-lib";). Note that this is the same namespace we used when we imported the module into our results.xqy file. After that little bit of module housekeeping is taken care of we jump right into declaring functions to be used in this module. In this case we only have one function defined, search-lib:find-speech. This is the function that we called from our results.xqy page in order to find lines spoken by a particular speaker. As you can see, this function takes a single string as the search parameter and it returns a sequence of zero of more SPEECH elements. This query is accomplished in a single line of XQuery where we do a case-insensitive query (where we also allow for wildcards) to find all SPEECH elements with a child element, SPEAKER, that matches our search term. While powerful, this query is simple enough that we are able to easily accomplish it in a single line of code. Why then did we go through all of the hassle to put this very simple query into a module in a completely separate file that we then had to import in order to use? Surely it wasn't just a completely arbitrary example to demonstrate the use of modules, was it?
Of course, the answer is no. There is a much more important reason for why we separated out that search function and that has to do with the fundamental concepts of code modularity and reuse. Simply put, we are employing a technique to separate the implementation of our search (contained in the search-lib module) from the use of those results, which in this case is to display some very simple XHTML in our results.xqy page. This simple technique is going to allow us to reuse our search code from within other portions of application. Additionally, if we need to modify the way search works, perhaps by making the search case-sensitive, we have a single place to make that modification instead of having to track down everyplace we pasted the search code in order to maintain consistent search behavior. The concept behind this technique is probably not new to you if you have been programming for any period of time in any other language. The really important point here was to demonstrate how to implement that technique in XQuery.
Summary
Hopefully you learned a few things during the course of this tutorial. We covered the basic techniques for getting an XQuery based web application up and running. Along the way we talked about some best practices for laying out our project and we looked at how modules allow us to reuse portions of our code within our application while also making our code more maintainable. We will also build upon those concepts in an upcoming tutorial. In the meantime get out there and start building your own web based applications on MarkLogic!
