Searchable PDFs/HP41 manual/Slightly Off Topic



#15

Hi All,
I've been experimenting with a web site (not free) that allows one to upload a PDF of a scanned document - and then does OCR on the the scanned/PDF images so that you can actually search through the document. I'm sure that there are a lot of tools that can do this. The particular site I'm using is called www.evernote.com.

Unfortunately, the HP41 scanned document I have is 50M in size but the site only allows documents 25M or smaller to be uploaded. Is there an easy way to take a 50M PDF and break it into 3 equal size pieces - hopefully at a Chapter boundary?

I tried creating a zip of the file - but - surprisingly it was not any smaller than the origianl. I guess PDFs of scanned documents may already employ some compression...

Thanks,

Kevin


#16

http://www.pdfsam.org/

It is free and easy to use.


#17

Ah - that pdfasm program was just perfect - what a jewel - this will be very useful at work as well.

Kevin

#18

try PrimoPDF Its a free pdf converted that works by printing a document. So you can open your pdf, prnit the first n-pages using primopdf and voila, you are good to go.

HTH

Cheers

Peter

PS - cool site! I scanned and OCRed the VASMs a little while ago to make them searchable. Would be great to hear how successful your conversion was. Maybe we can pool together - everyone scans & uploads one manual of general interest and then we all share. A few suggestions with regards to manuals

  1. HP41 user Manuals
  2. Keith Jarret - Extended Functions Made Easy
  3. Keith Jarret - Synthetic Programming made Easy
  4. HP Advantage Module
  5. CCD Module
  6. PPC Module
  7. Hepax Manuals
  8. Ken Emery - MCODE for Beginners
  9. Wlodeks 'Red bible' - Extend Your HP-41
  10. Paul Dodin Inside the HP-41
  11. Article Index from the PPC journal (this would awesome! There are literally 1000s of pages of PPC Journal and its very hard to find something on a particular subject. I went once through the exercise and organized all PPC, HPCC, CHUU etc articles into themes by printing them out, reading all of them and organizing them by theme. Better, but still very cumbersome. There are some good indices which cover a few years of articles so if we have those OCRed one can search for articles on a particular subject)

Just a thought...


#19

On version 7 of the DVDs, which will be out in the fall, nearly all manuals are searchable. That includes many/most of the above. (all the documents I've managed to get permission for.)


#20

w00t!
Thanks Dave!

#21

OMG, that is absolutely awesome!!! and for next version I'll pass on the searchable VASM to you as well.

Cheers

Peter


#22

Quote:
OMG, that is absolutely awesome!!! and for next version I'll pass on the searchable VASM to you as well.

Even more awesome.
#23

Quote:
On version 7 of the DVDs, which will be out in the fall, nearly all manuals are searchable. That includes many/most of the above. (all the documents I've managed to get permission for.)

Gee,I wonder where Dave got the idea? ;)

#24

Thanks for the news! I was starting to wonder about the fate of all our scans of this spring already.

#25

Quote:
11. Article Index from the PPC journal (this would awesome! There are literally 1000s of pages of PPC Journal and its very hard to find something on a particular subject. I went once through the exercise and organized all PPC, HPCC, CHUU etc articles into themes by printing them out, reading all of them and organizing them by theme. Better, but still very cumbersome. There are some good indices which cover a few years of articles so if we have those OCRed one can search for articles on a particular subject)

For what it is worth, all of the PPC Journal, CHHU Chronicle, HPX Exchange and HPCC Datafile issues have already been scanned into pdf images, over several years. The PPC Journal, CHHU Chronicle and HPX Exchange scanned issues available at http://www.pahhc.org/ppccdrom.htm include all the article indexes which were in the issues, plus some additional ones which were not printed in their time frames. The HPCC Datafile issues are all indexed at their club web site, at http://www.hpcc.org/datafile/. Since the U.S.-newsletter material consists mostly of pasted-up articles typed on regular typewriters, the task of OCRing those indexes might be a challenge.

Jake Schwartz

#26

Quote:
Is there an easy way to take a 50M PDF and break it into 3 equal size pieces - hopefully at a Chapter boundary?

If you (or a friend) have Adobe Acrobat Professional (not the free reader which everybody has), you can extract arbitrary pages, or range of pages, from a pdf file. Then you save what you extracted (under a new name!), and go back and extract some more. Conversely, you can delete what you don't want.

Can you point on the web to the original pdf you have, or put it on a web site somewhere? If so, I would be glad to break it up into chunks (and put the pieces on my own site).

#27

Try this here:

http://portableapps.com/apps/office/pdftk_builder_portable

#28

An excellent online version of the HP 41C manual is here


Possibly Related Threads...
Thread Author Replies Views Last Post
  Updated PPC DVD Version 2.10: HP-41 Searchable Program Files and Scannable Barcode Jake Schwartz 3 1,532 09-27-2013, 09:51 PM
Last Post: Olivier (Wa)
  Slightly OT: Remember the extraction of the HP-35 ROM? Frank Boehm (Germany) 1 985 02-06-2013, 05:53 PM
Last Post: aurelio
  Only slightly OT: HOW TO Setup SVN+SSH on Windows® Marcus von Cube, Germany 3 1,329 12-24-2012, 04:53 AM
Last Post: Walter B
  [OT] Completely off-topic but couldn't resist Valentin Albillo 11 2,648 10-08-2012, 09:51 AM
Last Post: mike reed
  it's not off topic... db (martinez, ca.) 8 2,006 09-20-2012, 03:10 PM
Last Post: Juergen Keller
  Off topic: Scientist Mob Frank Boehm (Germany) 6 1,767 09-06-2012, 02:53 AM
Last Post: BruceH
  Off Topic but Very Relevant Richard Garner 3 1,208 08-25-2012, 06:18 PM
Last Post: Namir
  Not just slightly ahead of its time. Matt Agajanian 17 3,807 08-16-2012, 01:17 AM
Last Post: Steve Leibson
  Reference Book, Way Off Topic Les Koller 21 4,103 07-27-2012, 07:18 PM
Last Post: BobVA
  off-topic question about linux photo software db (martinez, ca.) 7 2,096 07-01-2012, 06:35 PM
Last Post: Jeroen Van Nieuwenhove

Forum Jump: