MovieChat Forums > General Discussion > IMDB archive project - where is it?

IMDB archive project - where is it?


Hi everyone,

This team of people have been working (as have the people who run this site also) to do a full archive of most of the IMDB boards content - I've been checking the page but it doesnt seem to explain WHERE this archive is or when it will appear...

http://archiveteam.org/index.php?title=The_Internet_Movie_Database

anyone got any word or clarity on this?

Thanks!

reply

@9, the download itself contains the instructions in a text file called "Letter from Col Needham". (the uploader is facetiously going by the name of IMDb founder Col Needham). Here's what written in the text file:
_______________________________________________
Alright, lads. I'm quitting Amazon since Jeff didn't give me the yachts he promised me for closing the IMDb message boards.

Now that IMDb is dead, I'm releasing a website dump for the IMDb message boards to the public for your offline perusal. This includes all discussion threads for all titles with at least 100 votes listed on IMDb, including movies, TV shows, videos and even games. All upcoming titles in 2017 are also included. Of those that didn't make the cut, almost none had anyone discussing it except the few ones titled "where can I find this?" So for all intents and purposes, this is a complete dump of the per-title boards. In total, this database contains 1,914,799 discussion threads for 86,581 titles. What it doesn't contain are the general boards and the celebrity boards. I'm afraid you have to find them elsewhere.

The threads are preserved in their original format, including stylesheets, emoji's, and signatures that are indistinguishable from the posts themselves, in all their early-2000s glory, minus the ads and brandings.

A Python script is provided to extract the discussion board for a title so you can read it in a browser. You can run it on all major operating systems. To use the script, first install the newest version of Python 2 (NOT Python 3) from https://www.python.org. Then, open a command window in the directory this database resides in (On Windows, Shift+Click in the directory to find this option) and run:

vintagedb-extract.py IMDbTitleID

where IMDBTitleID looks like ttXXXXXXX, which you can find on the URL of the title's page on imdb.com. For example, the ID of Carmencita (1894) is tt0000001.

If you remember the name but not the ID of a title, you can also search for it:

vintagedb-extract.py "game of thrones"

The database itself is built on sqlite3. Anyone interested is invited to take the database and do what you like with it. You can find the database schema within the script. Mine some data out of it. Build a website out of it. Maybe even write a better viewer for it. Or simply indulge in reading what online trolls had to say about your favorite movies.

Please seed.
Col Needham
2017-02-17
_______________________________________________

reply

The threads from that torrent really should be added to this website.

reply

Yeah, has there been any movement on that (this thread is 2 months old now)....

reply

I believe they all have been added to the site since then.

reply

-

reply

I'm sorry, but I've never seen this pastebin before, how do I get the torrent link from that link provided? http://pastebin.com/p2Tw4esX

reply


@ lorelei

I haven't seen it before either, so I improvised. I copied all the text in "raw data" and pasted it in "add torrent from url" box in the top left part of BitTorrent. And it worked.

reply

@ Apis, thanks! I'm gonna try this

reply

Chiming in to say everything from that torrent not already here certainly needs to be added, perhaps after first making some suggested format changes. When it is all added up please let everyone know. I want to check the final result against the PDF files I made of threads in which I posted.

reply