File search in Node.js
Since the German blog post Volltextsuche is the most clicked on this blog, I’ll explain, how I realised the full-text search of this website with the help of Node.js.
Prerequisites
You need to be on a Linux/Unix machine, since the search uses native command line commands like find
and grep
and you need to have Node.js (v0.4.0+) installed. Maybe there is a similar way to do it on Windows, too. I think you could install Git for Windows and add the /bin
directory to your system PATH
to use the same commands on Windows. If somebody has a better solution (maybe one that works out of the box) he/she may leave a comment.
Finding files on the command line
If you are in a Unix shell, you don’t have a nifty search box, that lets you search for files in your current working directory. Instead, you have the command find
, that lets you search for files by specifying various arguments.
For example with find . -name '*.txt'
you can search for all files (or directories) ending in .txt
in the current working directory and in any subdirectories.
But how would you search for the contents of a file?
This is quite a bit more difficult to find out. There’s another command, grep
, that allows you to filter lines from stdin
or from the contents of a given file.
Usually, Unix shells allow you to use the pipe character |
to combine commands. This way, you can use find
to first find all files of a specific type and the to filter out all the files that contain a specific search string, as follows:
The command xargs
makes sure, that the results of find are used as arguments for grep, so that it reads the contents of each file. As a result, the command returns a line seperated list of HTML files, that contain the string “example”.
To improve results and prevent errors, the command can still be improved.
This way, it doesn’t matter if file names are upper- or lowercase and the text search is also case insensitive. Binary content is ignored and warnings will be suppressed.
The basic code
Let’s have a look at the required JavaScript code.
What you need to write is a simple HTTP server, that parses the query part of the URL from a HTTP GET request. The search module only takes one parameter, s
, like in http://vorb.de/search.html?s=example
. This website (vorb.de) uses a small REST framework of mine, api, which enables me to register modules for different URLs, but I’ll only show how to do it without any framework. This way you can adapt the solution to your framework of choice (express, flatiron, etc.).
You need to load several packages during start-up:
After that, define some other values:
// timeout in ms for a single search
var timeout = 10000;
// specify the root directory, where the search will begin
var root = process.cwd();
Now you can write the server code:
var server = http.createServer(function (req, resp) {
// parse the request url
// the query is everything after the '?'
// the second param says that the query shall be evaluated
var query = url.parse(req.url, true).query;
// ensure both are != null or ''
if (query && query.s) {
// replace single with double quotes
query.s = query.s.replace("'", '"');
// run the search
exec("find . -iname '*html' | xargs grep '"+query.s+
"' -isl", {
timeout: timeout,
cwd: root
}, function (err, stdout, stdin) {
resp.writeHead(200, { 'Content-Type': 'text/html' });
if (err) {
resp.end('<p>Error on search</p>');
console.log(err);
}
// split the results
var results = stdout.split('\n');
// remove last element (it’s an empty line)
results.pop();
resp.write('<h1>Search results</h1>\n<ul>');
for (var i = 0; i < results.length; i++) {
resp.write('<li>');
resp.write(results[i]);
resp.write('</li>');
}
resp.end('</ul>');
});
} else {
resp.writeHead(200, { 'Content-Type': 'text/html' });
return resp.end('<p>No results</p>');
}
});
// listen on port 8080
server.listen(8080, function () {
console.log('Server running at http://localhost:8080/');
});
You may download the source file. You can start it on the command line with node file-search.js
. Then go to localhost:8080 try it. Of course you need to create some HTML files first, if you want to find something.
Here’s how the result will look like: