-
Notifications
You must be signed in to change notification settings - Fork 99
Homework 4
Given a pair of words (e.g., king and male), your task is to find the most similar pair (e.g., queen and female) using word vectors and their cosine similarities.
-
Login to your azure account.
-
Install
numpy:sudo apt-get install python-numpy -
Download the following word vectors:
wget http://www.mathcs.emory.edu/~choi/courses/cs329/dat/w2v.bin -
Download the following vocabulary list:
wget https://raw.githubusercontent.com/emory-courses/cs329/master/src/distributional_semantics/vocab_100_verbs.txt -
Create
hw4.pyby modifyingw2v.pysuch that: -
Construct a diff vector for each pair of words (e.g.,
v = v1 - v2). Do not create diff vectors from the same words (e.g.,v = v1 - v1). -
For each diff vector, find the top-k similar diff vectors, where
k = 5. All 4 words in the diff vectors must be different (e.g.,w1 : w2 = w3 : w4, where none of thew1,w2,w3, andw4are the same). -
Save your results to
hw4.txtas follows:word1 : word2 = word3 : word4 ... -
There are about
10,000combinations, which means your output file should contain10,000 * 5lines. You need to write less than 20 lines to complete this homework, although it will take a while to run. Please be wise and plan ahead to complete; no extension is allowed for this homework. -
Create the
cs329/hw4directory and submithw4.py,hw4.txt, and a report showing the top-20 most interesting analogy pairs.
Copyright © 2016 Emory University - All Rights Reserved.
