We hypothesize that this is beneficial because it prevents the model from forgetting pri- ors since no parameters are dropped or randomly initialized for different datasets.
pre-finetuning (Agha- janyan et al., 2021) and finetuning.
During model training time, both encoders are updated for every iteration at both stages, which helps to align the token embed- ding space and the label embedding space.
earned label encoder is only required to produce label representations once.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.