In the beginning there were low bit rate 16-2400 baud telephone modems and dial in city bulletin boards that were commonly accessed from Commodore 64s and IBM compatible machines. Eventually a network was built called the Internet, and it connected local networks together worldwide. A method of finding content on these connected machines became necessary. Yahoo!, AOL, and MSN each had different ways of indexing content on remote machines using user submitted directories, much in the way Kickstarter shows you projects.
Eventually two Stanford students, Sergey Brin and Lawrence Page, developed a ranking system that hinged on HTML anchor tags in web pages called PageRank based on Rankdex. It would recursively sort web pages indexed by the Googlebot crawler into a logical hierarchy based on what pages were linked to it and what pages it linked to, assigning it a score. The higher the score, the more prominent the position in search results.
WHY CHANGE WEB SEARCH, WHY NOW?
There are problems with PageRank and Rankdex in that pages linked to by high ranking websites may not be indicative of a positive reference to that linked page or document. For instance, you can have a link from your high scoring website saying, I really hate x,y,z and that linked page will now inherit a higher rank from the page even though that was not the intention. The other problem is UGC or user generated content, where a website may have a high ranking score based on its popularity, but the authors are random people who arbitrarily join the website, and those people who have little or no history with the score instantly inherit the ranking mechanism. Another stinging problem with the PageRank system is that false or misleading information can rise to the top of search results due to the purely automated nature of the sorting of World Wide Web content.
HOW CAN IT BE DONE?
CDNPAL search aims to provide a better mechanism for sorting content on the World Wide Web. We have a mechanism which insures that pages and documents on the web are weighted with their intended weight. Our system does not limit voting rights to content creators, but lets everyone have an opportunity to weigh in on which content is more important and what is more prominent and relevant as a search result. We do this through the use of multi-browser plugins, and iOS and Android applications where users can opt in to sending their data to the system and that data is used to calculate the importance of web pages.
CDNPAL engine is also different and special in that it's not simply a visualization engine which just shows users end search results. It's like a Lego set where all the data collected is available via a REST API, and not just as web search results. As a user, you can pull the entire WWW structure of websites in an entire city for instance as JSON and pull down the indexed hierarchy of their websites or online documents. You can do this because our crawler not only collects information about web pages and their hierarchies but also collects all the network properties and geo-locational properties as well while it's indexing web content.
Lastly, CDNPAL is different in that it re-indexes web pages as Open Graph objects you can use in social graphs in conjunction with your own social information or to use in any way from presentations to applications.
SO WHAT'S ALREADY THERE?
The project already has some important resources such as Amazon AWS 3 year EC2 reserved instances including high CPU instances in both the Virginia, and Northern California regions, and some of the base modules required for a search engine are partially complete.
A modern search engine consists of 3 parts. The crawler, the link map and the index. You can check out some alpha source code to a reduced functionality, single process version of the crawler. It is not meant to be run in a production environment and some basic functionality like robots.txt processing is not included in this download version.
The basic components have already been coded, but we need to wire them together and finish a working build of the entire search engine and API clients for mobile devices and the web. We also need to polish and make sure that the code is working properly at a much larger scale than the skeleton framework we have working now.
WHAT WILL THE MONEY BE USED FOR?
The money will be used to pay for costs associated with writing the remaining project application code, and to pay for Amazon Web Services for costs associated with indexing web documents. A small amount will be used towards creating schwag for rewards like hats, posters and picture books.
This project aims to deliver the source code and schwag for the search engine to project contributors and to bridge the divide between the online resources the public uses, and those the public has transparent access to.
I STILL DON'T UNDERSTAND THE CONCEPT :(
What if TV was guided not by Nielsen Ratings, but by TV shows mentioning other TV shows. That's Google. We want to introduce Neilsen rating style sorting of normalized web documents with OpenGraph being the normalization factor and Hadoop as our sorting mechanism.
1. We make various documents on the web normal, by formatting their characteristics into OpenGraph objects.
2. We store those objects in a big, huge scalable database
3. We poll users and sort references to those objects based on what users want.
4. We show the most popular for any given type or category to users that are looking for something.
5. The web is a hit TV show.
We noticed that Kickstarter is mainly a community of gamers, so we have added a new project reward at $1. We are avid users of Hype for Mac HTML5 animation creator. As an extra reward we will create an animation game in the same theme as the Hype HTML5 movie on cdnpal.com where you get to shoot our mascot rabbit Duck Hunter style. This is not a huge challenge for us, so we will have this ready by June 1, 2012 for you to play.
So if you don't like search engines, social graphs, or what not, this is a really good reason to contribute to the project.
At this point, we have modified the crawler to only grab Open Graph information, or create it from document data, for later compilation. The crawler also records network properties such as the location of the remote website server, and the contact information such as geo-location of the OG content by business address, or other location hints. By focusing only on what we want to achieve and leaving traditional search behind we have a greater chance at giving users something brand new.
Here is a simplified flow chart:
WE NEED DEVELOPERS RIGHT NOW
For whatever reason, you don't want to help us financially?
There's another way you can help. Please contact us to help us write the code.
of our large problems is the high cost of educated and or experienced
Java programming labor in Southern California and the legal overhead of
having employees and the paperwork. So we have some programmers we work
with out of the country, but ultimately we need people here that we can
do status meetings with every day. We are also Java programmers and need
to make our team bigger.
Show up to our office in sunny Los
Angeles, California, and have a seat at at a lovely workstation we will
provide you with dual monitors so you can help us finish our working
prototypes. There are no set work hours, and you can do whatever you
think is helpful to the project. This would be a good opportunity for a
college student on summer vacation.
Your resume will shine the word "impressive" after you get done interning for this project!
As a plus, you will also be in close proximity to Las Vegas and Burning Man and we can help you find accommodations.