MovieChat Forums > Computers and Software > In order to index the boards...

In order to index the boards...


...you first have to build a list of them. For the topical boards, this is somewhat trivial, as you'd just visit /boards/. (Some people have already done this to construct clones of the topical board root.) For the boards titles (works) and names (persons), it's another story. While /interfaces could be used, the files might not contain URLs or information relevant to your purposes, so here is set of indexes by decade:

Titles:
/search/title?count=250&release_date=2010,2019&sort=release_date,desc&view=simple
/search/title?count=250&release_date=2000,2009&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1990,1999&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1980,1989&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1970,1979&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1960,1969&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1950,1959&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1940,1949&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1930,1939&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1920,1929&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1910,1919&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1900,1909&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1890,1899&sort=release_date,desc&view=simple
/search/title?count=250&release_date=1880,1889&sort=release_date,desc&view=simple

Names:
/search/name?count=250&gender=male&view=simple
/search/name?count=250&gender=female&view=simple

You'll notice that the first in the list of URLs indicates that there are 1,524,459 titles with 250 title references per page of the search results. This means that there are 6,098 pages (the last one having 209 results) just for the results of that one search query. It's going to take a while download all those pages, and problems will arise if too many people try to do it at once and too quickly. So, the real tricks lay in parsing and the application of things like regular expressions. The abstract of "/title/tt???????/" must be isolated and converted to "/title/tt???????/board/threads/".

The original thread: /board/bd0000001/view/265787442.

Here are some other topics for the time:
* https://en.wikipedia.org/wiki/Internet_forum#Discussion
* https://en.wikipedia.org/wiki/Comparison_of_Internet_forum_software
* https://en.wikipedia.org/wiki/Internet_hosting_service
* https://en.wikipedia.org/wiki/Social_software
* https://en.wikipedia.org/wiki/MySQL

reply

I know, right? I just want to fix the black screen problem I have with YouTube.

Whatever you are, be a good one.

reply