PDA

View Full Version : Manga Search Engine


dummey
01-27-2009, 06:58 PM
So I am taking a class on search engines and data retrieval where most of the workload is centered around a project. For this project, we are to choose a corpus and design a method of parsing and searching through this data.

My desired corpus is, of course, the mangas on one manga. The goal of the project would be to allow for users to search for strings of text and return chapters or pages of mangas that show the most relevance.

Now my question to all you fellow manga addicts is:
- Is this useful in anyways, I don't recall many instances where I had a quote and was dying to find where it came from
- If it is, what search features would be desired? Ie, allowing for searches only on top half of pages.

Spade
01-27-2009, 08:17 PM
I don't think a scientific research on an illegal website is a good idea. Ô.o

Another thing, how are you planning to enable searching text from the Manga pages, which are basically images with no text-data whatsoever inside them? It would be incredibly awesome to search long-running series like One Pice for a specific quote but I don't think that's possible unless you put all the dialogues into the page's site's code or something...

dummey
01-27-2009, 11:08 PM
I don't think a scientific research on an illegal website is a good idea. Ô.o

Another thing, how are you planning to enable searching text from the Manga pages, which are basically images with no text-data whatsoever inside them? It would be incredibly awesome to search long-running series like One Pice for a specific quote but I don't think that's possible unless you put all the dialogues into the page's site's code or something...

It's not so much a scientific research as it is a programming assignment with a heavy emphasis on learning techniques for sorting and ordering information. And who says this site is illegal =P

I happen to have quite a bit of experience with OCR (object/character recognition) so parsing the pages isn't the biggest problem, though still one that I will have to get past.

Matruskan
01-28-2009, 01:59 AM
It would help those people who like to do this:
The Great Big Guide To One Piece!
Gotei 13 Captains Stats & Character Appearances
Naruto Databook Stats

But I'm not sure if I would use it...
However, I bet you would learn a lot about data retrival and OCR since there are so many different kinds of mangas!

Spade
01-28-2009, 07:32 AM
And who says this site is illegal =P

But... but... it is. Ô.o
It's uploading copyrighted material and offering them for free, that's illegal in every way.

Lainemaa
01-28-2009, 10:44 AM
I'm with Spade, doing school projects on illegal or dubiously legal material is not a good idea.

There are loads and loads of totally free and legal webcomics out there of varying forms of quality. Some have been running for a really long time too so there's lots of material. Maybe do your project on one of them and then it could be applied to mangas too.

It would be nice to find a chapter or a page based on a remembered quote. And it would be nice to be able to find the chapter where some bit-part character first appeared.

dummey
01-28-2009, 11:02 PM
Okie, so I am changing my corpus to comics in general. I will be personally using onemanga is a basis for testing, but will not state it in my final report.

There are a couple reasons why I am not leaning towards the webcomic side is because I would have to visit roughly 250 of them, and extract 40 images from each of these sites, a fairly daunting task. Not to mention, slightly even more illegal.

Lainemaa
01-29-2009, 08:28 PM
Keenspot?

Zezinho
01-30-2009, 08:37 AM
I did a keenspot crawler around the time I did the onemanga one that downloads all the strips, in this case, from http://www.stripteasecomic.com

usage:

java keenspot http://www.stripteasecomic.com/d/20000930.html (this would start from this day-onwards)

the program is quite simple, and since it is for a school project I don't mind sending you the source... if you don't mind it being in java and "hammer" coded (made to work, not be pretty - it was for my own personal use anyways ;) ). It can easily be altered to support most keenspot webcomics...

download: http://balance.no.sapo.pt/keenspot.rar