Home » Categories » Business » Web Development

PHP Script For PDF to HTML Conversion

PDF to HTML Conversion in PHP

PDF to HTML ConversionThis article will show you how to convert any PDF document to HTML format using PHP. This step by step tutorial will show you how to create html files from PDF with php.

You will need PDF to HTML Package to perform this task. Here is the url of the package - http://sourceforge.net/projects/pdftohtml/

After the installation of the package use the following command to execute it. You can also execute these commands from SSH or PHP script. We will focus at php script command execution. Let us consider the installed files are present in /usr/bin.
system('/usr/bin/pdftohtml /var/www/website/processed/example.pdf') // This will create a HTML file in the processed folder.
system('/usr/bin/pdftohtml /var/www/website/processed/example.pdf -') // This command will not create an HTML file but it will show the output of the file at screen.
Few Common Errors
  • BAD Color Error: This error usually appear if the package doesn't install properly.
  • Execution Error: It is also a common error that php doesn't execute the command and output file doesn't generate. To solve this error you need to confirm few things.
    • Confirm PHP is not running in the safe mode.
    • You must execute the command as root. So you also need to set the Apache Security Settings.

PDF to TEXT Conversion in PHP

Convert PDF to TEXTIt is quite simple to calculate characters of a pdf document. To accomplish this task, we will use pdf2html library. Please download and install pdf2html library.

PHP Code to execute PDF Conversion and Characters Calculation

Linux command execution to convert the pdf to text format.
'/usr/bin/pdftotext ' . $file_path; //File path must be the absolute server path.
PHP command execution to convert the pdf to text format.
shell_exec('/usr/bin/pdftotext ' . $file_path);
Complete code to upload a file to the processed folder in your root directory.
PHP Code to execute PDF Conversion and Characters Calculation

if(move_uploaded_file($_FILES[$filen]['tmp_name'],'processed/'.$_FILES[$filen]['name']))
{
$file_name=$_FILES[$filen]['name'];
$file_path=$_SERVER['DOCUMENT_ROOT'].'/processed/'.$_FILES[$filen]['name'];
$file_name=str_replace('.pdf','.txt',$file_name);
$output=shell_exec('/usr/bin/pdftotext ' . $file_path);
sleep(2);
$handle = fopen($file_name, "r");
$contents = fread($handle, filesize($file_name));
fclose($handle);
$file_count = strlen(str_replace(' ','',$contents));
}

Troubleshooting

  1. shell_exec function will not execute. If you don't have permission to run ssh commands and also if your php is running in the safe mode.
  2. This script will generate a text file with same name and directory where you have placed the pdf file. So if the file isn't create in that directory and your program will work you will able to track the file in the root directory. This means you have to correct your file path.
  3. Cannot count the calulation and upload the file. It is necessary to change the rights of processed folder to 777.
If you have further questions about this post, kindly post your comments.
Attachments Attachments
There are no attachments for this article.
Comments (4) Comments
Comment by Karthick on Fri, Nov 23rd, 2012 at 7:10 PM
Hi, Firstly I have to thank u for this good solution to my problem. By using this, I can convert PDF to text. But when I am trying to convert PDF to html, it is not working for me. Please help me to do that.
Comment by John on Tue, Oct 9th, 2012 at 11:05 AM
I have extracted the package in my root. and run the below code in php.shell_exec("D:/xampp/htdocs/pdftohtml/ D:/xampp/htdocs/pdftohtml/test.pdf"); but it displays blank screen. Please help.
Comment by Navneet on Thu, Jul 19th, 2012 at 6:20 PM
I installed “poppler-utils” package which also convert pdf to text, html and used system() function but problem is that generated html file does not include CSS. I mean generated html file is not formatted as PDF while it have both images and text but not as it is. Also same problem with “xpdf” package. Please suggest any solution to get formatted html file with images.
Comment by sandra on Fri, Jun 11th, 2010 at 1:34 PM
Hi,this seems to be the best solution for my problem but does it need ghostscript to be installed on the machine?
Related Articles RSS Feed
Pastie - Pastebin to Share Source Code with Syntax Highlighting
Viewed 4848 times since Mon, Jan 11, 2010
JavaScript Tree Menu Component - jsTree Menu
Viewed 5335 times since Sat, Jan 9, 2010
How do I setup and use SSH?
Viewed 1122 times since Thu, Oct 15, 2009
PHP For Android Project - Interview with Lead Developer
Viewed 7138 times since Mon, Jul 26, 2010
50 Free And Quality Web Icons
Viewed 2367 times since Tue, Jan 5, 2010
13 Useful Javascript Syntax Highlighting Scripts
Viewed 3320 times since Mon, Jan 4, 2010
New Tags Introduced in HTML5
Viewed 2834 times since Sat, Apr 3, 2010
25 Graph & Chart Solutions For Web Developers
Viewed 38327 times since Mon, Feb 1, 2010
Which hosting plans support IIS Mod-Rewrite?
Viewed 59528 times since Sat, Oct 10, 2009
jqPlot - jQuery Charts & Plotting Plugin
Viewed 3947 times since Sat, Jan 9, 2010