Discover Top Posts Tagged with #gimagereader

Use gImageReader to Extract Text From Images and PDFs on Linux - It's FOSS

Use gImageReader to Extract Text From Images and PDFs on Linux

gImageReader is a front-end for Tesseract Open Source OCR Engine. Tesseract was originally developed at HP and then was open-sourced in 2006.

Basically, the OCR (Optical Character Recognition) engine lets you scan texts from a picture or a file (PDF). It can detect several languages by default and also supports scanning through Unicode characters.

However, the Tesseract by itself is a command-line tool without any GUI. So, here, gImageReader comes to the rescue to let any user utilize it to extract text from images and files.

See https://itsfoss.com/gimagereader-ocr/

#technology #opensource #PDF #OCR #gImageReader

https://itsfoss.com/gimagereader-ocr/

#technology #opensource #PDF #OCR #gImageReader

How to let Linux read out your books

If you have an ebook and are too lazy to read, let Linux read it aloud to you. If you have a PDF with selectable text then use the program pdftotext. If the PDF contains images, then use OCR (optical character recognition) with gimagereader or tesseract. To read use espeak and mbrola.

http://sourceforge.net/projects/gimagereader/files/0.9/ apt-get install python-imaging-sane apt-get install tesseract-ocr-deu apt-get install espeak apt-get install mbrola # convert each PDF page to an image: gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=p%03d.jpg your.pdf # let tesseract get the text tesseract p177.jpg page177 -l deu # if the PDF contains text use: pdftotext # let linux read out espeak -vmb-de5 -p30 -s 180 -f page177.txt

#ocr #tesseract #gimagereader #espeak #mbrola

#gimagereader

Trending Tags

Recently Viewed Tags

#gimagereader