Setting up pdftotext and search_files on a shared host (Bluehost)

26 Jul 2008

Posted by CrashTest_

So this week I had to get search_files module for Drupal 6 running on a shared host, Bluehost, for one of my customers. I had promised that we would be able to search his PDF files, but I didn't realize that search_files, as well as search_attachments modules require a Linux command line utility named pdftotext to be installed.

I did request that Bluehost install it on my box and was told that anything requiring root access wasn't going to happen.

Fine, maybe I can run it myself, after all, I did get SVN running on Bluehost, how hard could it be?

Thanks to rasc on the Sphider PHP Search Engine forums for the solution, which I catered to work for search_files.

Setting up Bluehost

  • First, go to foolabs and grab yourself a hot copy of XPDF
  • Un-archive it, and get get rid of everything other than the pdftotext file
  • Rename this file to pdftotext.script
  • Create a shell script (flavor of your choosing) that calls this pdftotext.script file, for example:
/home/YOURBLUEHOSTUSERNAME/bin/pdftotext/pdftotext.script $1 -
  • As you can tell from the code there, you will need to change YOURBLUEHOSTUSERNAME to the correct name.
  • Log in via FTP or SSH, and create a /bin and a /bin/pdftotext directory from your home directory, NOT FROM public_html, but one directory above that
  • Make /bin/pdftotext writable by all, this is where pdftotext will save the temporary files it creates
  • Upload both pdftotext and pdftotext.script to the /bin/pdftotext directory, and make them executable (chmod 755 should work)
  • If you don't have one already, in your home directory (not public_html) create a .bashrc file, and add the following so that the web server knows where your executable files are:
export PATH=$PATH:$HOME/bin:$HOME/bin/pdftotext:.
export pdftotext_path=/home/YOURBLUEHOSTUSERNAME/bin/pdftotext/pdftotext

Setting up search_files in Drupal 6

  • Go to the search_files project page and download the module
  • Upload the module
  • Go to Admin - Site Building - Modules and activate the module
  • You may need to adjust your permissions to let you use the module, do that if needed
  • Go to /admin/settings/search_files/helpers page and click "PDF"
  • In the Helper Path* box, put in:
/home/YOURBLUEHOSTUSERNAME/bin/pdftotext/pdftotext %file% -
  • Click Update. Please notice the - at the end, it's needed. I have it in both the helper line and the script, it's probably not needed in both, but it DOES work this way
  • Now, find the Directories page (/admin/settings/search_files/directories) and start adding directories where you have those PDF files that you would like to have indexed, such as /home/YOURBLUEHOSTUSERNAME/public_html/files - making sure that you use the full server path

Run it!

I took advantage of this time as an opportunity to setup my cron job on Bluehost, and then cleared my cache in Drupal, ran cron, and watched it find hundreds of files for me.

You MAY need to create a new custom php.ini file for your Drupal installation (Bluehost has a utility to create a default one in c-panel) and increase the limits so that you don't run into memory allocation or timeout issues.

Hope this helps, if you have any questions, go ahead and leave a comment!

Sitewide Terms: