bill noble meaning in flux


NAACL papers I'm excited for

Based on their titles alone...

I just found out my company is sending me to NAACL! Thank you, Networked Insights šŸ˜. The list of accepted NAACL papers came out about a month ago and more and more papers are showing up on arXiv every day.

Here are 10 interesting-sounding papers and what I could find out about them.

A corpus of non-native written English annotated for metaphor

Beata Beigman Klebanov, Chee Wee (Ben) Leong and Michael Flor

Interesting to note that Beata Beigman Klemanov (what an iconic name by the way) works at ETS. It makes sense to me that the people who make those tests would be into this kind of annotation. Itā€™s also cool that itā€™s non-native English because you donā€™t see a lot of datasets that are explicitly that and I donā€™t know what percentage of English speakers are non-native but Iā€™m sure itā€™s a lot*.

* Holy #*@%! L2 English speakers outnumber L1 speakers by a ration of 3 to 1!

Dear Sir or Madam, May I introduce the YAFC corpus: Corpus, benchmarks and metrics for formality style transfer

Sudha Rao and Joel Tetreault

Joel Tetreault works at Grammerly which makes software that does what they call writing enhancement which surely includes spelling and grammar, but perhaps also some tooling around formality?.

Sudha Rao is an early PhD at UMD advised by Hal DaumƩ which makes me incredibly jealuous. She also interned at Grammarly last summer which is surely where this work comes from. Also, her thesis project includes grounding which is a favorite topic of mine.

ATTR2VEC: Jointly learning word and contextual attribute embeddings with factorization machines

Fabio Petroni, Vassilis Plachouras, Timothy Nugent and Jochen L. Leidner

Iā€™m always interested in work that tries to capture contextual information, especially when itā€™s giving structure to word embeddings, which it sounds like this might. The authors have released their code on github but the readme is all business and I donā€™t know enough about the topic to tell from the code what is going on, so Iā€™ll wait.

Deep contextualized word representations

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer

I was a little skeptical of this work at first, I mean hasnā€™t this been done already? But Iā€™m not sure any of those embeddings proved to be so useful for a broad variety of tasks, as this approach claims to. Also theyā€™ve put both code up in both TensorFlow and PyTorch with tutorials and everything. Way to go AllenNLP šŸ˜Š

Diachronic usage relatedness (DUREL): A framework for the annotation of lexical semantic change

Dominik Schlechtweg, Sabine Schulte im Walde and Stefanie Eckmann

These folks take a different approach from the diachronic semantics papers I wrote about a few weeks ago. Instead of comparing embeddings, they compare usage directly. How do they do that?

  1. Take a bunch of usages of a word (with context) from one time period
  2. Pair them up with usages from another time period
  3. Make some measure of comparison on usages
  4. Look at the average of this mesaure across all pairs
  5. Profit???

The authors come up with two different measures to do this dance with. And they claim one of them can tell the difference between innovative and reductive semantic change.

Attentive interaction model: Modeling changes in view in argumentation

Yohan Jo, Shivani Poddar, Byungsoo Jeon, Qinlan Shen, Carolyn Rose and Graham Neubig

When I read this title I knew immediately that they used /r/changemyview. An earlier paper by some of the Cornell computational sociolinguistics folks comes with a really nice dataset for it (with annotations)! They are so good about releasing data thatā€™s well documented, easy to use and not 404ā€™d. (arXiv)

Author commitment and social power: Automatic belief tagging to infer the social context of interactions

Vinodkumar Prabhakaran and Owen Rambow

I took a look at the authorā€™s thesis. One of the main ideas is that you can predict social power based on their level of belief in their utterances. ā€œi.e., whether the participants are committed to the beliefs they express, non-committed to them, or express beliefs attributed to someone elseā€. Sounds pretty interesting and I think thatā€™s what this paper is going to be about.

Deconfounded lexicon induction for interpretable social science

Reid Pryzant, Kelly Shen, Dan Jurafsky and Stefan Wagner

Iā€™m very interested in this enigmatically titled paper. What is this lexicon and what does it mean for it to be deconfounded? In what way is it being induced? And weā€™re doingā€¦ what with it? Interpreting social science!?! Fascinating. Unfortunately, all I could find is this broken link.

There is also this Github project but but my mother told me to poke around in a code repository I know nothing about.

Colorless green recurrent networks dream hierarchically

Kristina Gulordava, Marco Baroni, Tal Linzen, Piotr Bojanowski and Edouard Grave

Dad joke level title, but atually pretty descriptive. The authors want to see if RNNs learn abstract hierarchical syntactic structure. In other words, do they understand the ways in which words build into phrases and phrases build into sentences, and how theyā€™re connected once they do?

To test this, they test if the RNN language model can get syntactic number agreement right in meaningless (but syntactically correct) sentences like ā€œThe colorless greed ideas I ate with the chair sleep furiouslyā€. In this example, if the RNN predicts sleep, which agrees with ideas rather than sleeps, then thatā€™s evidence the RNN understands the syntactic structure.

Deep dungeons and dragons: Learning character-action interactions from role-playing game transcripts

Annie Louis and Charles Sutton

Well this is clearly awesome. Iā€™m not sure exactly what it means for a character to interact with an action, but Iā€™m excited to find out. Couldnā€™t find so much as an abstract online, and the authorsā€™ past work doesnā€™t have any big hints so weā€™ll have to wait to find out.

Thatā€™s all for now! But there are many many more papers that sound interesting. Iā€™ll follow up in June with another blog post about what stood out at NAACL šŸ˜Š

Measuring semantic shift with word embeddings

A short summary of two recent papers.

I just read two interesting papers by William L. Hamilton, Jure Loskovec, and Dan Jurafsky about uisg ā€œdiachronic word embeddingsā€ to measure semantic shift.

A word embedding (at least the kind theyā€™re talking about here) is a way of associating words with a vector (a list of numbers) that captures something about the wordā€™s meaning in relation to other words. Here, ā€œdiachroticā€ means theyā€™re calculating the vector representation with respect to at least two different time periods.

First paper

Diachronic word embeddings reveal statistical laws of semantic change arXiv

In this paper the authors are trying to find a way to capare embeddings across time. The dias is to use a wordā€™s vector representation to see how (or at least how much) itā€™s changed.

The tricky part is you canā€™t come right out and compare vectors from separate word embeddings since the vector spaces might be totally misaligned. So they have two ideas for how to get around this.

  1. Meausre change in similarity between pairs of words. E.g., for two different time periods you measure the semantic similarity between ā€˜catā€™ and ā€˜dogā€™. Then you can see if that relationship has changed or stayed the same without having to directoly compare across time periods.
  2. Align the vector spaces and compare the wordā€™s two vector representations directly. This is the more intuitive solution but requires some tricky math.

What's a procrustes?

Iā€™ll just trust this worksā€¦ šŸ™„

Actually the authors were good enough to test their method out emprically on some well-documented semantic shifts so we can see roughly that it works without decrypting the math šŸ˜¬

Take that, Neil DeGrasse Tyson!

They use these methods to make two observations about semantic change.

  1. Words that are more polysemous (have multiple meanings) tend to change more and adopt more new meanings over time, and
  2. words that are used more frequently tend to change less over time.

I think these conclusions make intuitive sense and theyā€™re pretty well aligned with oldschool linguistic theories of semantic shift, so thatā€™s cool.

Next paper!

Cultural shift or linguistic drift? Comparing two measures of semantic change arXiv

Here they define two measures of semantic change that roughly correspond to the two ideas of how to compare vectors across time from the previous paper.

  1. The ā€œlocal measureā€: Measure the distance between the word and a smallish set of closely related words and see how much those relatioships have changed
  2. The ā€œglobal measureā€: Measure the distance between the word and its past self directly.

Just like before the first measure is agnostic to the incidentals of the two vector spaces (since you only need to directly compare word vectors from the same time period). The second measure requires vector space alignment between the time periods.

This time though theyā€™re after something a little more subtleā€¦ It turns out the global measure is more sensitive to changes in verbs, whereas the local measure reacts more to changes in noun semantics. Why could that beā€¦?

Well, combine this with an established dactrine that says nouns tend to change more for real-world cultural reasons and verbs tend to change more by predictable linguistic processes and you getā€¦ possibly a way of detecting the difference between these two kinds of change!

One thing Iā€™m a little skeptical about is that there is a clear-cut distinction between these two kinds of semantic change. To me, cultural shift seems more like a ā€œreasonā€ for semantic change while the ā€œregular processesā€ seem more like, well, processes. I wonder if verb semantics might change more predictably, but still in reaction to cultural shift, for example.

Anyway, this was some really interesting work! And the authors have been kind enough to make their code and pre-trained vectors available on the project website. I havenā€™t played with it much yet, but it looks pretty coherent, which is nice.