28 Sep 2018
This is the first blog post I’m making as an official PhD student!
I’m still figuring things out,
but so far I know my topic has something to do with semantic change—how
word meanings shift over time.
Semantic change is part of a broader category of phenomena I’ll call semantic variation.
When the same word means two different things,
that’s an example of semantic variation.
Naturally, change is just variation over time.
When you think about the same word meaning two different things,
polysemy (or homonymy) is probably the first thing that comes to mind.
Originally I included polysemy in this list,
but I actually think it’s different from
the kind of thing I’m talking about in an important way.
Polysemy is when the same word can be used to mean two different things.
Take the word bank, which has several possible meanings.
If I say to you “let’s have lunch down by the bank”,
I could be referring to a
or I could be talking about a
bank-bank—somewhere you keep money.
But I’m interested in variation in the semantics of a word as a whole.
The distinction I’m making here is related to Herman Paul’s
occasional meaning vs. usual meaning,
or perhaps more closely related to the more recent notion of
situated meaning vs. meaning potential.
Polysemy represents variation in situated meaning,
whereas I’m talking about differences in meaning potential.
But I’m not too familiar with these concepts yet…
blog post for another day, perhaps.
Anyway, here’s my list of (the?) four kinds of semantic variation:
- Diversity – differences between speech communities
- Style – differences between individuals
- Change – changes in speech communities
- Adaptation – changes made during a dialogue
One - Semantic Diversity
Here’s where I say something about common ground,
an idea developed by the psycholinguist Herbert H. Clark
and others in the ’90s.
Common ground is a body of knowledge shared among some group of people.
It’s related to the concept of
where everyone knows a thing,
and also everyone knows that everyone knows it,
and everyone knows that everyone knows it
(and so on).
Common ground needs to be based on something—there
needs to be a reason for everyone to believe we’re all on the same page.
if you and I were to watch a tennis match together,
such as who won the match,
are grounded (i.e. part of the common ground) between us.
The event of watching the match serves as a basis
for this common ground.
This is what Clark calls a perceptual basis.
There’s another kind of basis called a communal basis,
which grounds knowledge between individuals
based on their belonging to the same community.
a nurse may assume that a doctor knows difference
between a humerus and a femur
based on their joint membership
in a community of medical professionals.
Communal bases are what we rely on
to ground linguistic knowledge,
such as (surprise!) the meaning of words.
In this framework,
the jargon shared by communities of practice,
the unique set of inside jokes shared with your immediate family
or a close circle of friends—all
these things are the same kind of thing
as far as common ground is concerned.
So, communities come first
(conceptually, at least)
and languages are just the linguistic common ground
of the speech community.
And each of these communities
(sometimes nested, sometimes overlapping)
may have different or similar understandings
(or no understanding at all)
of the meaning of any given word.
Two - Semantic Style
Usually when people talk about linguistic style
it has to do with word choice,
or how an individual tends to construct sentences syntactically—the
sorts of things that contribute to tone and register.
But I think there’s good reason to believe
that style is also involved in the meaning of words.
Here’s an example from Barbara Johnstone’s The Linguistic Individual:
I once stayed with my sister during a week when, just having moved to
a new town, she made a number of phone calls inquiring about goods
and services. During one of these conversations, I heard her begin a
response with “Aaahh,” a drawn-out /a/ made lower and further back in
her mouth than the /a/ she used in words such as father or hot and
uttered at a low and very slowly falling pitch. I at once thought of our
father, who makes exactly the same noise in the same conversational
slot in phone conversations of the same sort. Aaahh is a small but
unmistakable feature of his individual way of sounding. It means “I think
I understand what you’ve said, and, if I’ve understood you correctly,
I’m disappointed. Aaahh is the beginning of a rejoinder to a statement
like “We do carry folding directors’ chairs , and they’re $175 each” or
“We’ll be able to collect your bulky trash in two weeks or so.”
Now, Johnstone understood her sister’s Aahhh immediately
because she belongs to a speech community where
its meaning is common ground (i.e., her immediate family),
but how did the person on the other end of the phone understand it?
A naive view of common ground
might lead you to think it wouldn’t be understood at all.
For one thing, Johnstone describes how Aaahh always comes
with a translation when her father uses it—a
“synchronic repetition, in a more conventional form”.
So, while there may be no common ground about
listeners understand what he means by it
because there is common ground around
the meaning of other expressions that accompany it.
There’s an interesting question here about whether semantic style
is truly variation in meaning potential,
or if it takes advantage of meaning potential
to produce a different situated meaning.
My feeling is that stylistic variation
can have its source in meaning potential,
but something to think about more later…
Three - (Historic) semantic change
New words are coined, and the meaning of existing words change over time,
even within the same speech community.
There’s a lot to say about how and why this happens—we
know, for example,
that various social and communicative pressures
contribute to the evolution of word meaning,
and that there are certain patterns of semantic shift
that words tend to follow.
In a previous post
I described some recent work that measures how words changing over time.
I’m particularly interested
in how other kinds of semantic variation
influence historic change.
it’s common far a new interpretation of a word
to start in a smaller speech community
and gain more widespread usage over time.
Likewise, facets of an individual’s linguistic style
may be picked up by others
(like how Johnstone’s sister uses her dad’s Aaahh)
and in that way contribute to historic change
within a community.
Common ground can be useful
for thinking about historic semantic change, too.
What it means for historic semantic change to have taken place
(in a given community)
is that the new meaning is available to speakers
without having to use the dialogue itself
to create it.
This brings us to the final kind of semantic variation.
Four - Semantic adaptation
Semantic adaptation is change that occurs over the course of a dialogue.
Every dialogue starts off with some shared semantic understanding—both
people speak the same language, say.
But the meanings available in their communal common ground
may be rather imprecise.
Or it perhaps there’s no built-in way of
referring to some concept that’s important
in the current discussion.
As they converse,
participants collaborate to refine what words mean
in service of the present dialogue.
Here’s an example from a psychology-style experiment
where participants were asked
to participate in a collaborative challenge
where they had to refer to some pictures of objects.
A: A docksider. B: A what?
B: Is that a kind of dog?
A: No, it’s a kind of um leather shoe, kinda preppy pennyloafer. B: Okay okay got it
In the remainder of the experiment,
both A and B referred this card as the pennyloafer.
A pennyloafer is (I think) a different shoe from a docksider.
Perhaps A even knows this,
but because of the collaborative nature of communication,
the two agree on a new semantics for pennyloafer which,
for the purposes of this dialogue,
refers to the photo in question.
These kind of agreements—explicit and implicit—let
us adjust the meaning of words
in big ways and small.
Sometimes those adjustments persist
after the current dialogue.
And they may be again introduced
in future discussions
with some of the same people,
and perhaps some different people,
and in this way semantic adaptation,
can become historic change.
08 Aug 2018
I’m moving to Gothenburg, Sweden at the end of the month to take a PhD position at the University of Gotherburg’s Centre for Linguistic Theory and Studies in Probability (CLASP).
It’s a big move, so I thought I’d take some time to write about what I’ll be doing there before I leave.
Basically, I’ll be taking classes, doing research, and probably teaching (or TAing) some, too.
By the end, I’ll have written a dissertation based on my research and (if all goes well) I’ll get a degree.
All this will take four or five years.
What am I researching?
My research will focus on semantic change.
Semantics is the part of linguistics concerned with meaning.
This includes lexical semantics (what do words mean?) and compositional semantics (how do words combine to make meaningful sentences?).
I think my work will mostly focus on lexical semantics.
I’m particularly interested in looking at the social context of language change.
Sociolinguists have this concept of a speech community – a group of people who share a set of linguistic conventions, such as the meaning of certain words.
Speech communities come in all sizes, from as large as speakers of English to as small as a pair of close friends.
Speech communities can overlap or even “eavesdrop” on each other, resulting in the spread of novel linguistic conventions.
Change in a larger community affect the language spoken in sub-communities.
Conversely, changes made to the language in a sub-community are sometimes adopted by the larger community.
Linguistic variation (differences between communities) and linguistic change (differences within a community over time) are two sides of the same coin.
Both are necessary for understanding questions like where does semantic change come from? and why are some changes adopted more broadly, while others aren’t?
Why does it matter?
So, I obviously think this is all really interesting. I also think it’s really important.
At the risk of going all now more than ever on you, I think it’s clear to anyone who’s visited The Internet in the last 5 or 10 years that we’re far from consistent when it comes to maintaining a productive discourse.
And while I wouldn’t try to deny that there are significant—at times insurmountable—differences of opinion and values out there, I think it’s often the case that we can’t even get at those differences because we’re totally talking past each other.
Vox’s Ezra Klein had a nice example of the kind of thing I’m talking about in his piece about the recent Twitter controversy surrounding some old tweets by journalist Sarah Jeong.
A few years ago, it became popular on feminist Twitter to tweet about the awful effects of patriarchal culture and attach the line #KillAllMen. This became popular enough that a bunch of people I know and hang out with and even love began using it in casual conversation.
And you know what? I didn’t like it. It made me feel defensive. It still makes me feel defensive. I’m a man, and I recoil hearing people I care about say all men should be killed.
But I also knew that wasn’t what they were saying. They didn’t want me put to death. They didn’t want any men put to death. They didn’t hate me, and they didn’t hate men. “#KillAllMen” was another way of saying “it would be nice if the world sucked less for women.” It was an expression of frustration with pervasive sexism. I didn’t enjoy the way they said it, but that didn’t mean I had to pretend I couldn’t figure out what they meant. And if I had any questions, I could, you know, ask, and actually listen to the answer.
Here’s the other thing, though: all that was happening inside my community, which both inclined me towards generosity, and gave me more context for what was going on. If I had been on the outside of it, perhaps my ultimate reaction would’ve been different, perhaps I would’ve let my initial offense drive my interpretation.
The internet lets people from across the world connect, find shared experiences, and build communities.
It also enables interaction between people who have very little in common and may sometimes feel at opposition with each other.
I think that access can be a great force for good—and is something worth preserving—but it can also conttribute to discord with serious real-world implications.
When we’re interpreting language, context is key.
Social context is especially key.
We often don’t get that social context on the internet.
It’s foreign, or faded from memory, or (willfully or otherwise) stripped away.
When a new lexical item like #KillAllMen pops up, it’s in response to communicative need—in this case, the need for a tong-in-cheek way to point out problems caused by the patriarchy.
People who don’t possess that need (e.g. men with no interest in pointing out the patriarchy) may not intuitively interpret it that way.
With a better understanding of how semantic change spreads (particularly on the internet), I think we can make better informed choices about how to organize discussion and build communities in a way that makes diverse voices accessible to everyone.
For example, we can find ways to make the social context necessary for generous interpretation available, while exposing bad-faith provocateurs who feign outrage (i.e. trolls) for what they are.
I don’t want to make it sound like “oh if only we understood each other all our problems would go away”.
There is clearly no shortage of non-linguistic social division in the world today.
But if we can better understand where semantic change comes from, and what accounts for the differences between communities, I think we can give people better tools for contextualizing each other’s speech and at least start discussing those divisions across linguistic communities.
So what does this “research” actually consist of?
I’m planning to take an interdisciplinary approach, grounding my research in cognitive and sociolinguistics theory, and applying tools from computational linguistics and social network analysis.
A lot of my research will using computer models to find and probe phenomena such as semantic change in large-scale corpuses of social media and other internet discourse.
I think it will be a good practice to update this blog regularly as my studies and research progress, so if you’re interested, stay tuned here or follow me on Twitter.
10 May 2018
On Tuesday, Google demoed an update to its virtual assistant
that calls up businesses on your behalf to book appointments and make reservations.
In the demo, Duplex makes two calls, one to make a hair appointment, and another to reserve a table at a restaurant.
The reservation call is especially impressive because when it turns out the restaurant
will only take reservations for five or more people, Duplex politely takes no as an answer and knows to ask
if there’s likely to be a wait.
It neither conversation did Duplex identify itself as non-human. It even goes so far as to introduce speech disfluencies (ummms and aahs and at least one sassy mmm-hmmmmm)
which Google acknowledges are mostly about sounding more human.
I wouldn’t mind that kind of thing if I know I’m talking to a computer, but it feels like an especially harsh betrayal
to be manipulated into believing an AI is human in that way.
Why did Google, with their armies of PR people, think it was ok to demo a product that misrepresents itself as human
to unwitting service workers?
They knew that this demo would cause a backlash, they just didn’t know how big.
And now they’re getting to find out. So far we have:
- Tech Crunch - Duplex shows Google failing ethical and creative AI design
- PC Mag - Google Duplex is classist: Here’s how to fix it
- The Verge - The selfishness of Google Duplex
- Slate - Am I speaking to a human?
As of a few hours ago CNET is reporting that Google has already started “clarifying”:
We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified. What we showed at I/O was an early technology demo, and we look forward to incorporating feedback as we develop this into a product.
But that doesn’t really answer any of the specific questions I want to have answered:
- does the assistant identify itself as non-human up front, or only when asked?
- can businesses opt out of receiving calls from Duplex?
- what action does Duplex take when the conversation fails?
Instead, it looks like Google is stalling for time. They’re “incorporating feedback”.
In other words, they’re waiting to see what we decide is okay and then they’ll see what they can get away with.
This kind of tactic can be used to shift the overton window on what behaviors we’re ok with from our technology.
Maybe that’s already what’s happening here, even.
One thing is for sure, there are plenty of other questions like “Can computers misrepresent themselves as human? that AI companies want answered.
By presenting their answer at a product demo, they get to set the terms of the discussion.
The tone of that demo (here it is by the way) was clearly projecting this is fine.
If that’s where the conversation starts, maybe we end up closer to that position that we would have otherwise.
That’s clearly what Google’s hoping, anyway.
But it doesn’t have to be that way.
In ethics classes, in technology journalism, in science fiction, in conversations with friends, on social media!
We can be talking about these questions. We can decide what we’re ok with and what we’re not.
And we don’t have to accept that just because it seems like some thing already is some way that it always has to be that way.
28 Apr 2018
I just found out my company is sending me to NAACL! Thank you, Networked Insights 😁.
The list of accepted NAACL papers came out about a month ago and more and more papers are showing up on arXiv every day.
Here are 10 interesting-sounding papers and what I could find out about them.
Beata Beigman Klebanov, Chee Wee (Ben) Leong and Michael Flor
Interesting to note that Beata Beigman Klemanov (what an iconic name by the way) works at ETS. It makes sense to me that the people who make those tests would be into this kind of annotation. It’s also cool that it’s non-native English because you don’t see a lot of datasets that are explicitly that and I don’t know what percentage of English speakers are non-native but I’m sure it’s a lot*.
* Holy #*@%! L2 English speakers outnumber L1 speakers by a ration of 3 to 1!
Dear Sir or Madam, May I introduce the YAFC corpus: Corpus, benchmarks and metrics for formality style transfer
Sudha Rao and Joel Tetreault
Joel Tetreault works at Grammerly which makes software that does what they call writing enhancement which surely includes spelling and grammar, but perhaps also some tooling around formality?.
Sudha Rao is an early PhD at UMD advised by Hal Daumé which makes me incredibly jealuous.
She also interned at Grammarly last summer which is surely where this work comes from. Also, her thesis project includes grounding which is a favorite topic of mine.
ATTR2VEC: Jointly learning word and contextual attribute embeddings with factorization machines
Fabio Petroni, Vassilis Plachouras, Timothy Nugent and Jochen L. Leidner
I’m always interested in work that tries to capture contextual information, especially when it’s giving structure to word embeddings, which it sounds like this might. The authors have released their code on github but the readme is all business and I don’t know enough about the topic to tell from the code what is going on, so I’ll wait.
Deep contextualized word representations
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer
I was a little skeptical of this work at first, I mean hasn’t this been done already? But I’m not sure any of those embeddings proved to be so useful for a broad variety of tasks, as this approach claims to. Also they’ve put both code up in both TensorFlow and PyTorch with tutorials and everything. Way to go AllenNLP 😊
Dominik Schlechtweg, Sabine Schulte im Walde and Stefanie Eckmann
These folks take a different approach from the diachronic semantics papers I wrote about a few weeks ago. Instead of comparing embeddings, they compare usage directly. How do they do that?
- Take a bunch of usages of a word (with context) from one time period
- Pair them up with usages from another time period
- Make some measure of comparison on usages
- Look at the average of this mesaure across all pairs
The authors come up with two different measures to do this dance with. And they claim one of them can tell the difference between innovative and reductive semantic change.
Attentive interaction model: Modeling changes in view in argumentation
Yohan Jo, Shivani Poddar, Byungsoo Jeon, Qinlan Shen, Carolyn Rose and Graham Neubig
When I read this title I knew immediately that they used /r/changemyview. An earlier paper by some of the Cornell computational sociolinguistics folks comes with a really nice dataset for it (with annotations)! They are so good about releasing data that’s well documented, easy to use and not 404’d. (arXiv)
Author commitment and social power: Automatic belief tagging to infer the social context of interactions
Vinodkumar Prabhakaran and Owen Rambow
I took a look at the author’s thesis. One of the main ideas is that you can predict social power based on their level of belief in their utterances. “i.e., whether the participants are committed to the beliefs they express, non-committed to them, or express beliefs attributed to someone else”. Sounds pretty interesting and I think that’s what this paper is going to be about.
Deconfounded lexicon induction for interpretable social science
Reid Pryzant, Kelly Shen, Dan Jurafsky and Stefan Wagner
I’m very interested in this enigmatically titled paper. What is this lexicon and what does it mean for it to be deconfounded? In what way is it being induced? And we’re doing… what with it? Interpreting social science!?! Fascinating. Unfortunately, all I could find is this broken link.
There is also this Github project but but my mother told me to poke around in a code repository I know nothing about.
Colorless green recurrent networks dream hierarchically
Kristina Gulordava, Marco Baroni, Tal Linzen, Piotr Bojanowski and Edouard Grave
Dad joke level title, but atually pretty descriptive. The authors want to see if RNNs learn abstract hierarchical syntactic structure. In other words, do they understand the ways in which words build into phrases and phrases build into sentences, and how they’re connected once they do?
To test this, they test if the RNN language model can get syntactic number agreement right in meaningless (but syntactically correct) sentences like “The colorless greed ideas I ate with the chair sleep furiously”. In this example, if the RNN predicts sleep, which agrees with ideas rather than sleeps, then that’s evidence the RNN understands the syntactic structure.
Deep dungeons and dragons: Learning character-action interactions from role-playing game transcripts
Annie Louis and Charles Sutton
Well this is clearly awesome. I’m not sure exactly what it means for a character to interact with an action, but I’m excited to find out. Couldn’t find so much as an abstract online, and the authors’ past work doesn’t have any big hints so we’ll have to wait to find out.
That’s all for now! But there are many many more papers that sound interesting. I’ll follow up in June with another blog post about what stood out at NAACL 😊
18 Mar 2018
I just read two interesting papers by William L. Hamilton, Jure Loskovec, and Dan Jurafsky about uisg “diachronic word embeddings” to measure semantic shift.
A word embedding (at least the kind they’re talking about here) is a way of associating words with a vector (a list of numbers) that captures something about the word’s meaning in relation to other words. Here, “diachrotic” means they’re calculating the vector representation with respect to at least two different time periods.
Diachronic word embeddings reveal statistical laws of semantic change arXiv
In this paper the authors are trying to find a way to capare embeddings across time. The dias is to use a word’s vector representation to see how (or at least how much) it’s changed.
The tricky part is you can’t come right out and compare vectors from separate word embeddings since the vector spaces might be totally misaligned. So they have two ideas for how to get around this.
- Meausre change in similarity between pairs of words. E.g., for two different time periods you measure the semantic similarity between ‘cat’ and ‘dog’. Then you can see if that relationship has changed or stayed the same without having to directoly compare across time periods.
- Align the vector spaces and compare the word’s two vector representations directly. This is the more intuitive solution but requires some tricky math.
I’ll just trust this works… 🙄
Actually the authors were good enough to test their method out emprically on some well-documented semantic shifts so we can see roughly that it works without decrypting the math 😬
They use these methods to make two observations about semantic change.
- Words that are more polysemous (have multiple meanings) tend to change more and adopt more new meanings over time, and
- words that are used more frequently tend to change less over time.
I think these conclusions make intuitive sense and they’re pretty well aligned with oldschool linguistic theories of semantic shift, so that’s cool.
Cultural shift or linguistic drift? Comparing two measures of semantic change arXiv
Here they define two measures of semantic change that roughly correspond to the two ideas of how to compare vectors across time from the previous paper.
- The “local measure”: Measure the distance between the word and a smallish set of closely related words and see how much those relatioships have changed
- The “global measure”: Measure the distance between the word and its past self directly.
Just like before the first measure is agnostic to the incidentals of the two vector spaces (since you only need to directly compare word vectors from the same time period). The second measure requires vector space alignment between the time periods.
This time though they’re after something a little more subtle… It turns out the global measure is more sensitive to changes in verbs, whereas the local measure reacts more to changes in noun semantics. Why could that be…?
Well, combine this with an established dactrine that says nouns tend to change more for real-world cultural reasons and verbs tend to change more by predictable linguistic processes and you get… possibly a way of detecting the difference between these two kinds of change!
One thing I’m a little skeptical about is that there is a clear-cut distinction between these two kinds of semantic change. To me, cultural shift seems more like a “reason” for semantic change while the “regular processes” seem more like, well, processes. I wonder if verb semantics might change more predictably, but still in reaction to cultural shift, for example.
Anyway, this was some really interesting work! And the authors have been kind enough to make their code and pre-trained vectors available on the project website. I haven’t played with it much yet, but it looks pretty coherent, which is nice.