Always open for the right person…
(all disciplines: dev, test, pm)
Interested? Questions? Click here:
Data Mining is a hot, new area, and we want a talented, highly motivated individual to join our growing Data Mining group. With all the data from the World Wide Web, we have endless potential to uncover patterns to help us improve our Search Engine, delight our customers, and confound our competition. This is an opportunity to use all kinds of leading-edge technologies, including machine-learning (Neural Networks, Support Vector Machines, Hidden Markov Models, etc.), Natural Language software, Parallel processing, and very large databases.
You must be highly customer focused, and have several of the following qualifications: proven experience with C++; proven experience with object-oriented design; very solid coding/debugging skills; solid algorithmic skills; knowledge of SVMs, Neural Networks, HMMs, Decision Trees, etc. with in-depth knowledge of and practical experience with at least one or two of these; demonstrated success at dealing with ambiguous problems; and the ability to make solid progress when the solution is not well defined. Actual experience doing data mining is desirable, but not required. Basic knowledge of SQL is required. Masters CS or equivalent.
Spam is one of the top killers of relevance for any search engine. Try the queries: valentine, big island or cialis to see this firsthand. Low quality, low relevance sites use several techniques to spam their way into search results. Spam adversely affects all areas of search -- crawl, data extraction, link-analysis, results ranking.
We need someone to kill spam dead. As in the data mining area, this is an opportunity to use all kinds of leading-edge technologies, including machine-learning (SVMs, etc.), parallel processing, graph theory to tame this problem. The ideal candidate will combine strong software engineering skills with a solid background in one or more of these disciplines. Enthusiasm for reviewing the latest research, inventing new techniques, and doing rigorous experimental validation is required.
The Ranking team develops the components that predict in a fraction of a second which of our 5 billion web documents will best answer a user's query. It is one of the highest impact and most technically challenging projects you will find anywhere in our industry. In collaboration with Microsoft Research, we explore cutting edge techniques from statistics, information retrieval, machine learning, and computational linguistics to attack this problem. The ideal candidate will combine strong software engineering skills with a solid background in one or more of these disciplines. Enthusiasm for reviewing the latest research, inventing new techniques, and doing rigorous experimental validation is required. Foreign language skills are also a plus.
Do you want a search engine that can answer your questions instead of returning a list of documents? This is one of our most technically ambitious projects, and we need a few exceptional SDEs who can make it a success. In collaboration with Microsoft Research, we will take a promising prototype and add innovations that dramatically improve its accuracy, coverage, and language portability. In addition to strong software engineering skills, the ideal candidate will have strengths in statistics, machine learning, and/or computational linguistics. Foreign language skills are also a plus.
Web Structure Analysis
The web has a complex structure that gives us valuable information about the popularity and authoritativeness of documents in our search index. Success in web search depends on harvesting as much information as we can from this massive source of data. This area is filled with fascinating technical challenges from distributed graph algorithms to pattern recognition, and because people are constantly trying new ways to manipulate search engine rankings, there are always new challenges. Candidates for this position should have a strong software engineering and computer science background that includes graph theory, distributed computing, performance optimization, probability and statistics, and machine learning.
Enabling Engineering Excellence
We are a growing team, with a growing v1 codebase. We are looking for someone to help us build tools to ensure that this is the best engineering team at Microsoft. You will be responsible for managing all aspects of our engineering excellence work -- laying out our future source management strategy, our build infrastructure, our branching methodology, the whole works…. You will take pride in raising development efficiency across the team, and in being the enabler of great search technology.
You must have prior experience working in world-class build environment. You are a perl and sd gearhead. You take pride in your scripts, and in your ability to tame complex dependency problems.
Grepping the web
The index serve team is chartered with doing the 'search' in the web search, and doing it faster than a grep on a local file, and doing it for thousands of queries a second, over billions of documents over thousands of servers. Help us create, refine, innovate, and deploy software that defines the ability to provide user answers fast and reliably. We are responsible for the infrastructure that makes it possible to reliably and efficiently manage and process hundreds of terabytes of information. Along with query serving, this team also provides the platform to support relevance and data mining.
You have at least a BS in CS or equivalent with several years of software development experience, a solid background in software development on multithreaded, high scale server systems. You should be comfortable working on a first generation, ambitious project with rapid development iterations and high reliability and performance standards.
Running the super computer
The autopilot team builds an infrastructure for MSN Search and other distributed applications. The main challenge is turning unreliable hardware and software into a reliable cluster with 99.9% uptime, only 9 to 5 operations support, and less than 1 operations person for maintaining 1000+ machines.
Here are some of the problems you would help to solve: early detection of hardware and software failures; performance monitoring and analysis for the large volumes of computers; distributed applications scheduling and load balancing; messaging and data transfer protocols. Bottom line – we want to build a system that would let 10,000 commodity PCs work as a supercomputer.
You have at least a BSCS or equivalent (MS/PhD preferred); 3 years of software development experience using C, C++ or C#; deep understanding of object oriented design and practical experience at dealing with ambiguous problems.
Hand crafted results
When all else fails, and the ranking algorithms do not pass the confidence threshold, we fall back to delivering handcrafted results. Working on a team of approximately 132 other handcrafters in 26 worldwide markets, you will receive a user query, use all the available search engines to quickly scour the web for results, pick the top 10 results for this query, and send it on to the user. Successful handcrafters can typically find top 10 results for a real-time user’s query in less than 3.8 seconds. This is an opportunity to truly connect with customers, because the queries that get routed to you are precisely the ones that the engine cannot answer well. We will have adequate staffing to allow generous coffee and bathroom breaks.
If you are an expert at using at least 3 different search engines, well versed with American English/colloquial usage, and can type at > 149 words/minute as measured by the Simia-Lico method – come join us and delight users real-time!
Large scale web based multimedia search (Images, video, audio) has been stagnant for five years, so there is an opportunity to leapfrog the competition. In this area, ranking and relevance s very challenging since:
web page static rank analysis (e.g. PageRank, MSNRank) doesn’t help much
user input is still a text search box, but target content is not text
UI experience is difficult because of copyright limits on direct linking from result pages to multimedia content
text descriptions of multimedia content on the web are notoriously bad
no good understanding of what users really want
this list goes on….
If you are capable of working on an ambiguous pre-v1 project that touches all parts of the search engine, and want to enable a killer multi-media search which is very cheap to operate, then this is the job for you!
© 2005 Microsoft Corporation. All rights reserved.