We'd like to find someone who can either give us a proven Psaudo-code, or better off - send us a working code in C# that does the following:
1. The code gets a document (preferably HTML, but not a must).
2. The code finds the best matching document, from a list of pre-defined documents (possibly that the calculation had already ran through).
3. It returns a list of N documents ordered by best match first.
For example -
If I read this article on Engadget - http://www.engadget.com/2011/12/27/toshiba-thrive-7-review/
Which is about the Toshiba Thrive android tablet, then the "best match" documents would be the ones that are also talking about the same topic, and after them in order documents that talk about generic/other android tablets, then after that possibly documents about either tablets in general (iPad?) or Android in general and so on and so forth.
Possible places to look at are opencalais, td-idf, etc...
There are many algorithms out there, so we need something that will do the job well.
The test of the algorithm should be by taking ANY document, and finding the best matched from a pre defined list of other documents.
Once this is done, and we have an algorithm we can try, we will need to open another bid, to implement the algorithm on an existing C# based system (Like SharePoint) with many medical documents on it (but the algorithm must not use the fact that it's medical data, at least, not at first).