Searchable PDFs/HP41 manual/Slightly Off Topic



Post: #2

Hi All,
I've been experimenting with a web site (not free) that allows one to upload a PDF of a scanned document - and then does OCR on the the scanned/PDF images so that you can actually search through the document. I'm sure that there are a lot of tools that can do this. The particular site I'm using is called www.evernote.com.

Unfortunately, the HP41 scanned document I have is 50M in size but the site only allows documents 25M or smaller to be uploaded. Is there an easy way to take a 50M PDF and break it into 3 equal size pieces - hopefully at a Chapter boundary?

I tried creating a zip of the file - but - surprisingly it was not any smaller than the origianl. I guess PDFs of scanned documents may already employ some compression...

Thanks,

Kevin


Post: #3

http://www.pdfsam.org/

It is free and easy to use.


Post: #4

Ah - that pdfasm program was just perfect - what a jewel - this will be very useful at work as well.

Kevin

Post: #5

try PrimoPDF Its a free pdf converted that works by printing a document. So you can open your pdf, prnit the first n-pages using primopdf and voila, you are good to go.

HTH

Cheers

Peter

PS - cool site! I scanned and OCRed the VASMs a little while ago to make them searchable. Would be great to hear how successful your conversion was. Maybe we can pool together - everyone scans & uploads one manual of general interest and then we all share. A few suggestions with regards to manuals

  1. HP41 user Manuals
  2. Keith Jarret - Extended Functions Made Easy
  3. Keith Jarret - Synthetic Programming made Easy
  4. HP Advantage Module
  5. CCD Module
  6. PPC Module
  7. Hepax Manuals
  8. Ken Emery - MCODE for Beginners
  9. Wlodeks 'Red bible' - Extend Your HP-41
  10. Paul Dodin Inside the HP-41
  11. Article Index from the PPC journal (this would awesome! There are literally 1000s of pages of PPC Journal and its very hard to find something on a particular subject. I went once through the exercise and organized all PPC, HPCC, CHUU etc articles into themes by printing them out, reading all of them and organizing them by theme. Better, but still very cumbersome. There are some good indices which cover a few years of articles so if we have those OCRed one can search for articles on a particular subject)

Just a thought...


Post: #6

On version 7 of the DVDs, which will be out in the fall, nearly all manuals are searchable. That includes many/most of the above. (all the documents I've managed to get permission for.)


Post: #7

w00t!
Thanks Dave!

Post: #8

OMG, that is absolutely awesome!!! and for next version I'll pass on the searchable VASM to you as well.

Cheers

Peter


Post: #9

Quote:
OMG, that is absolutely awesome!!! and for next version I'll pass on the searchable VASM to you as well.

Even more awesome.
Post: #10

Quote:
On version 7 of the DVDs, which will be out in the fall, nearly all manuals are searchable. That includes many/most of the above. (all the documents I've managed to get permission for.)

Gee,I wonder where Dave got the idea? ;)

Post: #11

Thanks for the news! I was starting to wonder about the fate of all our scans of this spring already.

Post: #12

Quote:
11. Article Index from the PPC journal (this would awesome! There are literally 1000s of pages of PPC Journal and its very hard to find something on a particular subject. I went once through the exercise and organized all PPC, HPCC, CHUU etc articles into themes by printing them out, reading all of them and organizing them by theme. Better, but still very cumbersome. There are some good indices which cover a few years of articles so if we have those OCRed one can search for articles on a particular subject)

For what it is worth, all of the PPC Journal, CHHU Chronicle, HPX Exchange and HPCC Datafile issues have already been scanned into pdf images, over several years. The PPC Journal, CHHU Chronicle and HPX Exchange scanned issues available at http://www.pahhc.org/ppccdrom.htm include all the article indexes which were in the issues, plus some additional ones which were not printed in their time frames. The HPCC Datafile issues are all indexed at their club web site, at http://www.hpcc.org/datafile/. Since the U.S.-newsletter material consists mostly of pasted-up articles typed on regular typewriters, the task of OCRing those indexes might be a challenge.

Jake Schwartz

Post: #13

Quote:
Is there an easy way to take a 50M PDF and break it into 3 equal size pieces - hopefully at a Chapter boundary?

If you (or a friend) have Adobe Acrobat Professional (not the free reader which everybody has), you can extract arbitrary pages, or range of pages, from a pdf file. Then you save what you extracted (under a new name!), and go back and extract some more. Conversely, you can delete what you don't want.

Can you point on the web to the original pdf you have, or put it on a web site somewhere? If so, I would be glad to break it up into chunks (and put the pieces on my own site).

Post: #14

Try this here:

http://portableapps.com/apps/office/pdftk_builder_portable

Post: #15

An excellent online version of the HP 41C manual is here


Forum Jump: