Research Articles

NLP Disambiguation - Lesk Algorithm

Thursday, February 5th, 2009

One of the problems associated with user generated content is sense disambiguation. Many times a user will type in “resume” but meant to have typed “résumé” or as it is commonly spelled in the United States: “resumé” — this is a problem. The dictionary definition of “resume” means “to continue” or “to start something again” not some type of document that job seekers submit when applying for a new position.

The purpose of the Lesk algorithm is determine the meaning of a word based off of other words nearby. What I hope is that the Lesk algorithm will be able to distinguish the multiple uses of the word “resume” if give it a sentence that says “I am graduating from college next semester and will need to write a resume for the first time” verus another sentence such as “I appologize for interrupting what you were doing, please resume.”

As the amount of user generated content that I have available continues to grow, I should be able to employ the Lesk algorithm to help me decide which portions of text I need to look at when determining the meaning of keywords as well as finding semantically related keywords.