Webmaster Blog

Members blog at WebmaisterPro. Covering topics related to online marketing, SEO, web development as well as software reviews.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Team Blogs
    Team Blogs Find your favorite team blogs here.
  • Login
    Login Login form

PHP Useful Snippets To Convert PDF, TXT, HTML and Images

Posted by on in Web design & Development
  • Font size: Larger Smaller
  • Hits: 12060
  • 0 Comments
  • Subscribe to this entry

Recently worked on project that required few different file type conversions. To be honest, I had to brush my PHP a bit, but thanks to Internet I was able to discover these useful snippets that saved me tons of time.

Convert PDF to JPG images - PDF2JPG

This require Image Magic extension installed. Simple and very useful snippet that conver .PDF files in to image files .jpg format.

 /**
     * PDF2JPG script
     * with ImageMagick
     */
    
    $pdf_file = './pdf/_folder/example.pdf';
    $save_to = './image_folder/example.jpg'; //make sure that apache has permissions to write in this folder! (common problem)
    
    //execute ImageMagick command 'convert' and convert PDF to JPG with applied settings
    exec('convert "'.$pdf_file.'" -colorspace RGB -resize 800 "'.$save_to.'"', $output, $return_var);
    
    
    if($return_var == 0) { //if exec successfuly converted pdf to jpg
    print "Conversion OK";
    }
    else print "Conversion failed.<br />".$output;
        

html2ps - convert HTML to pdf

This is definitely very useful snippets, that convert HTML to PDF file. I have use it in few projects.

    /*******************************************************************************
     *
     */
    function convert_to_pdf($url, $path_to_pdf) {
    require_once(dirname(__FILE__).'/html2ps/config.inc.php');
    require_once(HTML2PS_DIR.'pipeline.factory.class.php');
    echo WRITER_TEMPDIR;
    //error_reporting(E_ALL);
    //ini_set("display_errors","1");
    @set_time_limit(10000);
    parse_config_file(HTML2PS_DIR.'html2ps.config');
    
    /**
      * Handles the saving  of  generated PDF to user-defined output file on server
      */
    class MyDestinationFile extends Destination {
    /**
      * @var String result file name / path
      * @access private
      */
    var $_dest_filename;
    
    function MyDestinationFile($dest_filename) {
    $this->_dest_filename = $dest_filename;
    }
    
    function process($tmp_filename, $content_type) {
    copy($tmp_filename, $this->_dest_filename);
    }
    }
    
    
    $media = Media::predefined("A4");
    $media->set_landscape(false);
    $media->set_margins(array('left' => 5,
    'right' => 5,
    'top' => 10,
    'bottom' => 10));
    $media->set_pixels(800);
    
    $pipeline = PipelineFactory::create_default_pipeline("", // Auto-detect encoding
    "");
    // Override HTML source
    $pipeline->fetchers[] = new FetcherURL;
    $pipeline->data_filters[] = new DataFilterHTML2XHTML;
    $pipeline->parser = new ParserXHTML;
    $pipeline->layout_engine = new LayoutEngineDefault;
    
    $pipeline->output_driver = new OutputDriverFPDF($media);
    //$filter = new PreTreeFilterHeaderFooter("HEADER", "FOOTER");
    //$pipeline->pre_tree_filters[] = $filter;
    
    // Override destination to local file
    $pipeline->destination = new MyDestinationFile($path_to_pdf);
    
    global $g_config;
    $g_config = array(
    'cssmedia' => 'screen',
    'scalepoints' => '1',
    'renderimages' => true,
    'renderlinks' => true,
    'renderfields' => true,
    'renderforms' => false,
    'mode' => 'html',
    'encoding' => '',
    'debugbox' => false,
    'pdfversion' => '1.4',
    'draw_page_border' => false
    );
    $pipeline->configure($g_config);
    //$pipeline->add_feature('toc', array('location' => 'before'));
    $pipeline->process($url, $media);
    }

   

Convert HTML to Text

If you are building simulator of search engine, based on text browser, I think this might come handy too.

 

        <?php
    // strip javascript, styles, html tags, normalize entities and spaces
    // based on http://www.php.net/manual/en/function.strip-tags.php#68757
    function html2text($html){
    $text = $html;
    static $search = array(
    '@<script.+?</script>@usi', // Strip out javascript content
    '@<style.+?</style>@usi', // Strip style content
    '@<!--.+?-->@us', // Strip multi-line comments including CDATA
    '@</?[a-z].*?\>@usi', // Strip out HTML tags
    );
    $text = preg_replace($search, ' ', $text);
    // normalize common entities
    $text = normalizeEntities($text);
    // decode other entities
    $text = html_entity_decode($text, ENT_QUOTES, 'utf-8');
    // normalize possibly repeated newlines, tabs, spaces to spaces
    $text = preg_replace('/\s+/u', ' ', $text);
    $text = trim($text);
    // we must still run htmlentities on anything that comes out!
    // for instance:
    // <<a>script>alert('XSS')//<<a>/script>
    // will become
    // <script>alert('XSS')//</script>
    return $text;
    }
    
    // replace encoded and double encoded entities to equivalent unicode character
    // also see /app/bookmarkletPopup.js
    function normalizeEntities($text) {
    static $find = array();
    static $repl = array();
    if (!count($find)) {
    // build $find and $replace from map one time
    $map = array(
    array('\'', 'apos', 39, 'x27'), // Apostrophe
    array('\'', ''', 'lsquo', 8216, 'x2018'), // Open single quote
    array('\'', ''', 'rsquo', 8217, 'x2019'), // Close single quote
    array('"', '"', 'ldquo', 8220, 'x201C'), // Open double quotes
    array('"', '"', 'rdquo', 8221, 'x201D'), // Close double quotes
    array('\'', ',', 'sbquo', 8218, 'x201A'), // Single low-9 quote
    array('"', ',,', 'bdquo', 8222, 'x201E'), // Double low-9 quote
    array('\'', '´', 'prime', 8242, 'x2032'), // Prime/minutes/feet
    array('"', '´´', 'Prime', 8243, 'x2033'), // Double prime/seconds/inches
    array(' ', 'nbsp', 160, 'xA0'), // Non-breaking space
    array('-', '-', 8208, 'x2010'), // Hyphen
    array('-', '-', 'ndash', 8211, 150, 'x2013'), // En dash
    array('--', '--', 'mdash', 8212, 151, 'x2014'), // Em dash
    array(' ', ' ', 'ensp', 8194, 'x2002'), // En space
    array(' ', ' ', 'emsp', 8195, 'x2003'), // Em space
    array(' ', ' ', 'thinsp', 8201, 'x2009'), // Thin space
    array('*', 'o', 'bull', 8226, 'x2022'), // Bullet
    array('*', '?', 8227, 'x2023'), // Triangular bullet
    array('...', '...', 'hellip', 8230, 'x2026'), // Horizontal ellipsis
    array('°', 'deg', 176, 'xB0'), // Degree
    array('EUR', 'euro', 8364, 'x20AC'), // Euro
    array('¥', 'yen', 165, 'xA5'), // Yen
    array('£', 'pound', 163, 'xA3'), // British Pound
    array('©', 'copy', 169, 'xA9'), // Copyright Sign
    array('®', 'reg', 174, 'xAE'), // Registered Sign
    array('(TM)', 'trade', 8482, 'x2122') // TM Sign
    );
    foreach ($map as $e) {
    for ($i = 1; $i < count($e); ++$i) {
    $code = $e[$i];
    if (is_int($code)) {
    // numeric entity
    $regex = "/&(amp;)?#0*$code;/";
    }
    elseif (preg_match('/^.$/u', $code)/* one unicode char*/) {
    // single character
    $regex = "/$code/u";
    }
    elseif (preg_match('/^x([0-9A-F]{2}){1,2}$/i', $code)) {
    // hex entity
    $regex = "/&(amp;)?#x0*" . substr($code, 1) . ";/i";
    }
    else {
    // named entity
    $regex = "/&(amp;)?$code;/";
    }
    $find[] = $regex;
    $repl[] = $e[0];
    }
    }
    } // end first time build
    return preg_replace($find, $repl, $text);
    }

   

Convert a PDF to text with Perl

This is the only exception in our list all other snippets are based on PHP, but this short script is written in Perl and simply converts the PDF 'demo.pdf' to plain text.
   

        #!/perl/bin/perl -w
    use CAM::PDF;
    use CAM::PDF::PageText;
    
    $filename = "demo.pdf";
    
    my $pdf = CAM::PDF->new($filename);
    my $pageone_tree = $pdf->getPageContentTree(4);
    print CAM::PDF::PageText->render($pageone_tree);
    
    #Note: I had to install CAM::PDF::PageText by hand, it was not installed by CPAN when I installed CAM::PDF.

 

I hope you will find the above snippets useful. I wanted to include DOC to PDF converter too, though PHP isn't much of  help in this case, however, I found few links that might be useful that can convert different file formats.

Converting documents (ODT, DOC to PDF) on PHP with Unoconv

I found this accidentally, though you need Unoconv installed. Python tool that utilizes LibreOffice libs (pyuno).

Here is the link to guide installation and usage:

http://tech.rgou.net/en/php/converting-documents-odt-doc-to-pdf-on-php-with-unoconv-libreoffice/

 Using LiveDocx with PHP 4 and PHP 5 NuSOAP

And another use of LiveDocx without need of Zend Framework, but but does require the SOAP library NuSOAP.

Here is link to guide: http://www.phplivedocx.org/articles/using-livedocx-with-nusoap/

Rate this blog entry:
Trackback URL for this blog entry.
About the author

Achievements

Overall Rating (0)

0 out of 5 stars

Leave your comments

Post comment as a guest

0 / 1800 Character restriction
Your text should be in between 15-1800 characters
Your comments are subjected to administrator's moderation.
terms and condition.
  • No comments found