Set-up¶
Install¶
Download this repository from GitHub:
$ git clone https://github.uio.no/arthurd/in5550-exam
$ cd in5550-exam
The dataset is part of the repository, however you will need to give access to word embeddings.
You can either download the Norwegian-Bokmaal CoNLL17 a.k.a the 58.zip file from the nlpl website,
or provide them from saga server.
Make sure that you decode this file with encoding='latin1.
Baseline¶
To run the baseline.py script, use device=torch.device('cpu') as the cuda version was not implemented.
$ python baseline.py
Additional arguments can be provided, as:
--NUM_LAYERS: number of hidden layers for BiLSTM--HIDDEN_DIM: dimensionality of LSTM layers--BATCH_SIZE: number of examples to include in a batch--DROPOUT: dropout to be applied after embedding layer--EMBEDDING_DIM: dimensionality of embeddings--EMBEDDINGS: location of pretrained embeddings--TRAIN_EMBEDDINGS: whether to train or leave fixed--LEARNING_RATE: learning rate for ADAM optimizer--EPOCHS: number of epochs to train model