Full Circle Magazine FR

Ceci est une ancienne révision du document !

I was recently given a product key for Able2Extract 12, a PDF converter & editor. Previously, I’ve always done these sorts of tasks using various command-line tools. As extracting text from PDFs, or editing them in any way, is not a task I do too frequently, I cannot promise that I’ve tested everything the software has to offer. That being said, here are my experiences and thoughts. Compatibility While the software offers packages for only Ubuntu and Fedora, I was able to create a PKGBUILD that correctly installed and runs the Ubuntu .deb file under ArchLinux. I did, however, run into an issue in Ubuntu 16.04, Ubuntu 17.10, and ArchLinux. Specifically, the application would crash with an error about the QT Fonts location. After contacting the company, we were able to resolve the issue. Apparently, the application requires the variable $QT_QPA_FONTDIR to be set to the root path. Instead of defining this system-wide in /etc/environment or in my user’s .bashrc, I instead created a bash script that sets the variable and runs Able2Extract. The script is: #!/bin/bash export QT_QPA_FONTDIR=/ /opt/investintech/a2ep/bin/Able2ExtractPro

On m'a récemment donné une clé de produit pour Able2Extract 12, un convertisseur et éditeur de PDF. Auparavant j'avais toujours fait ce genre de tâche avec divers outils en ligne de commande. Puisque l'extraction de textes des PDF, ou leur édition d'une quelconque façon, est quelque chose que je ne fais pas souvent, je ne peut pas promettre d'avoir testé tout ce que le logiciel propose. Cela étant dit, voici mes expériences et mes réflexions.

Compatibilité

Bien que le logiciel ne propose des paquets que pour Ubuntu et Fedora, j'ai réussi à créer un PKGBUILD qui a installé et lance correctement le fichier .deb Ubuntu sous ArchLinux.

J'ai néanmoins rencontré un problème dans Ubuntu 16.04, Ubuntu 17.10 et ArchLinux. Plus précisément, l'application se plantait avec une erreur concernant l'emplacement des QT Fonts (polices QT). Après avoir contacté la société, on a pu solutionner le problème. Apparemment, l'application nécessite que le réglage de la variable $QT_QPA_FONTDIR soit fait au chemin root. Au lieu de définir ceci sur tout le système dans /etc/environment ou dans le.bashrc de mon utilisateur, j'ai créé un script bash que règle la variable et lance Able2Extract. Voici le script :

#!/bin/bash

export QT_QPA_FONTDIR=/

/opt/investintech/a2ep/bin/Able2ExtractPro

I went this route because the Able2Extract package does not seem to add the bin to your $PATH variable, meaning it can be run only from the folder, or the .desktop file. After dropping the script into a folder on my PATH, I am able to run it as normal. This has the added benefit of not interfering with any other applications, should they want the same variable. Application Interface The layout of the application itself is very familiar (after having used software such as Adobe Acrobat), and it offers some helpful (non-intrusive) tips when starting it for the first time.

J'ai choisi cette voie parce que le paquet Able2Extract, ne semble pas ajouter bin à votre variable $PATH, ce qui signifie qu'il ne peut être lancé, soit du dossier, soit du fichier .desktop. Après avoir déplacer le script dans un dossier sur mon PATH, je pouvais le lancer normalement. L'avantage supplémentaire est que cela n'interfère pas avec d'autres applications dans le cas où elle voudrait la même variable.

Interface de l'application

La disposition de l'application elle-même est très familière (après avoir utiliser un logiciel tel qu'Adobe Acrobat) et elle propose quelques conseils utiles (et discrets) quand vous le démarrez la première fois.

Features The application allows you to create, edit and convert PDF files. Part of the conversion process utilizes OCR technology in order to convert PDFs to editable files such as documents (.odt), or presentation slides. It also offers the ability to create spreadsheets, CSV, HTML, images, and AutoCAD files. I tested the Word, Excel, and HTML modes on a few recipe scans I have. Some of these files were taken via smartphone camera, and others were scanned on an actual flatbed scanner. The OCR system worked well for most of the files I tried, although one particularly badly photographed image had a few gaps where light reflections obscured the text. That being said, I could have filled in the blanks using logic, or adjusting the contrast of the image to make it more legible for myself. I was most impressed by the HTML results, as it actually added plenty of styling to the text to make it look clean and legible. If you were planning on turning PDFs into unstyled HTML files to add to a website, you should have a plan in place to strip out the inline styles. I did not see any options for the HTML converter.

Fonctionnalités

L'application vous donne la possibilité de créer, éditer et convertir des fichiers PDF. Une partie du processus de conversion utilise la technologie de reconnaisse de caractères pour pouvoir convertir des PDF et fichiers prêts à l'édition, notamment des documents (.odt) ou des diapositives de présentations. En l'utilisant, vous pouvez également créer des feuilles de calcul, CSV, HTML, images et fichiers AutoCAD.

J'ai testé les modes Word, Excel et HTML sur quelques scans de recettes de cuisine. Certains de ces fichiers avaient été créés avec la caméra d'un smartphone ; d'autres furent scannés sur un véritable scanner à plat. Le système de reconnaissance de caractères a bien fonctionné sur la plupart des fichiers que j'ai essayés, bien qu'une image, de très mauvaise qualité, avait quelques lacunes où des reflets de lumière ont obscurci le texte.Cela étant dit, j'aurais pu combler ces espaces de façon logique ou ajuster le contraste de l'image pour le rendre plus lisible. Les résultats HTML m'ont impressionné le plus, car le logiciel a en fait ajouté au texte pas mal d'éléments de style, pour le rendre propre et lisible. Si vous envisagez de transformer des PDF en fichiers HTML sans style pour les ajouter à un site Web, vous devrez prévoir comment enlever les styles en ligne. Je n'ai vu aucune option pour le convertisseur HTML.

The conversion options offered do allow you to handle things such as missing or unrecognized glyphs, or to set the file format for Word and Powerpoint conversions (on my system, it defaulted to OpenOffice). You can also do some document styling such as margins. The creation tool selects an image file and turns it into a PDF - I did not see an option to select text documents or word documents (though you could create PDF files using a PDF printer or something like LaTeX). The editing tools include things like adding stamps, highlights, text, comments, etc. They also include things like redacting sections of files, deleting PDF pages, extracting specific pages, and adjusting text styles. The text style adjustment appears to work only on some PDFs - in my tests these options were grayed out. They probably work only on PDFs that were created from a text document, as opposed to image-based scans.

Results

As noted in the previous section, almost every attempt I made yielded a complete copy of the PDF. In some cases (low contrast, poorly lit, etc), there were some gaps in the resulting file. These could relatively easily be corrected or filled out (especially if you have access to the original document). The worst result came from a recipe that was in 3 columns - while the OCR system managed to correctly separate the columns (I’ve experienced some that treat 3 columns as 1 line), the character recognition of the actual text was not that impressive. The font in the PDF file was very small, and quite faint, which could have added to the lack of accuracy. The resulting file would have definitely needed proofreading and correcting (though most OCR files should be checked before deeming it finished).

Overall, the results I’ve experienced using Able2Extract 12 rivals any other OCR software I’ve ever used, and is much better than other Linux-based alternatives I’ve tried so far. Is it always perfect? No, but in every test I ran, it yielded a file that would have reduced the effort required to copy the file by hand by at least 50-60%. In most cases it would have required only a few small corrections.

Conclusion

If you do a lot of PDF work (splitting documents, OCR scans, etc), and don’t have an application for Linux to do this in, I would highly recommend giving Able2Extract a shot. Even if you have an application you use, you may not be happy with the OCR results - and then I would recommend you try Able2Extract as well.

It’s almost a perfect score - if the package worked out of the box, and if there were extra options for HTML conversions, I’d be happy to give it a 5.