Bandwidth
#1

I did a quick check on bandwidth before I decided on this. I have a 500Meg limit on disk space and a 20G bandwidth limit.

My software will automatically resize all photos (after this weekend) to about 500 x 375. This produces approximately a 50K file. Also, since I also create a thumbnail that is much smaller, that will help reduce bandwidth.

I would like to also mention that this is not going to be a photo hosting site for anyone to upload photos for any reason. This is meant for those that have actual collections. There would be no remote linking to other sites. Everyone that wants to post photos will have to go through the registration process and all uploads are logged.

But if I had 5,000 50K files, that is only 250Meg. I doubt that I'd see 5,000 files uploaded but depending on how many will want to do this, there may be a limit on how much one could consume on this.

If I had 100,000 views of a 50K file, that is only 5G. And I hardly think I'm going to have that many views per month.

Anyway, it's a project in works.

#2

Thanks Mike. I wish your project much success. Collectors will likely start offering each other trades and such on the basis of the online looks.... I better start polishing. ;-)

Check if there is a way to hide your pages from web-crawlers (These can bitecha in the bandwidth).

Looks like a very good idea/service to me.

#3

I've been wondering how to get around these crawlers. I have done some simple things but see that they are on quite a lot.

An idea I have is to block view whenever I see them. I need to build this into the software. They are easy to spot by inspection so I imagine that I could to it in software.

Often I see things like XXX.YYY.ZZZ.TTT viewing forum such and such. When it's a crawler, I see this 10 or more times with only TTT changed. I might add some software this weekend to check for multiple views from XXX.YYY.ZZZ and return a short "blank" page.

The neet thing with SQL database software is that all of this is logged and easy to QUERY on. Returning a blank page to multiple of nearly the same IP is easy.

Thanks for jogging my memory that this is something I need to get on.

#4

I know that there are some HTML tags you can place on your page that "responsible" crawlers will honor and not creep all over your site. I don't know what the format is, but I'm sure somebody out there in HPLAND does...

#5

It is rather easy:

Robot Exclusion Standards

Hope this helps,
Jürgen

#6

That is, it doesn't work with ill-behaved web crawlers. I do have this file set up. But there are many web crawlers that ignore that list. I can see when crawlers are on the site and they simply go where they like, ignoring the list.

It is only a voluntary thing that makes that robots.txt file work.

Since all my pages are created on-the-fly, I'll just send blank pages to crawlers. The only problem is sorting out the crawler from the normal user. But I don't think that is going to be too difficult of a challenge.

Thanks anyway,

#7

Because there are so many good people on this forum I forgot that there are some bad people (or crawlers) which do not care about rules :-(

Another idea: The User-Agent field of an HTTP request header identifies the client software. I don't know if this data is available when you create the pages on the fly, but if it is, it may help distinguish "normal" users and crawlers.

Good Luck!

#8

To block webcrawlers, create a robots.txt file in the root directory of your web server. Google for the format - there are lots of sites that explain it.

Best,

--- Les Bell [http://www.lesbell.com.au]

#9

This has already been suggested. Of course I already have that. The robots.txt file does not stop a great many crawlers, that just ignore the file.

#10

Did I read the other replies? Only afterwards. Next time, I might not bother at all.

The only other way I can think of putting rogue crawlers off the track is by using link obfuscation in Javascript. And the chances are, anything that would put a crawler off, will upset legitimate browsers, too.

Best,

--- Les Bell [http://www.lesbell.com.au]



Possibly Related Threads…
Thread Author Replies Views Last Post
  OT - Bandwidth usage Charles Oxford 1 847 08-02-2008, 03:47 AM
Last Post: DaveJ
  Hey Dave! HP Museum Bandwidth Theft Mike 11 2,682 06-04-2003, 01:25 AM
Last Post: James M. Prange

Forum Jump: