“Please call Stella. Ask her to bring these things with her from the store: six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need a small plastic snake and a big toy frog for the kids. She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.”
You might be wondering where this paragraph came from. A theatre play? A novel? A manual for English learners? Not quite. While the paragraph may not be famous, it has actually been read and recorded more than 3,000 times. All you have to do is visit the Speech Accent Archive to listen to each and every single one of those clips recorded by individuals of 175 different countries and 381 languages. The purpose? To compare accents.
Maria Inês Teixeira speaks with the creator and administrator of the Speech Accent Archive, Professor Steven H. Weinberger from the George Mason University in Virginia. The Archive has been around for 20 years now and attracts linguists, actors, researchers, language enthusiasts and phonology students who cannot wait to learn more about how different accents compare.
We were wondering if you could start by introducing yourself, telling us about what you do and a little bit about your work.
Sure! My name is Steven Weinberger and I’m a linguist. Mostly I do phonology: I study sound systems of the world’s languages. I have been a professor at George Mason University in Virginia for 30 years now. I teach graduate school. I was the director of the program for the last 15 years. I’m now no longer director, but I’m still teaching, and I teach classes to our students who are very interested in teaching ESL (English as a Second Language). We have a very theoretical linguistics program, so we developed the phonetics class about 24 years ago, to teach teachers how to instruct non-native speakers about the sounds of English.
I do work in foreign accents—I think that foreign accents are amazingly interesting! I tell my students that, to me, and linguists like me, a good accent is really a bad accent. It tells you a lot about the speakers’ native language. So… it’s interesting that people have accents. And there’s probably a reason for why everyone has an accent. It’s about identity, it’s probably about something called the critical period, about how people can’t possibly become a native speaker after a certain age, but they get very good and they can communicate. But there’s this flavor about someone’s speech that’s remarkable and linguistically interesting.
You believe accents can tell us a lot about someone.
Yes, yes. Particularly linguistically. Particularly about what language they speak natively. Anyway, everyone has an accent, so… there’s no perfect accent, even if we’re native speakers. I do work on that, and I also do work on weird kinds of languages. I like to work on alien languages from science fiction. I look at sound systems, how people develop alien languages by looking at films, and reading books. It turns out that there’s no perfect alien sound system either. (laughs) So that’s all about the work I do. That’s about me…
About the Speech Accent Archive, how would you describe it to those who don’t know it?
This year it’s the 20th anniversary of the Speech Accent Archive. It’s a place you can go to online. It’s been up and running on the internet for 20 consecutive years—which is sort of an amazing thing in itself–and it’s a place for people to listen to speech accents in English.
Everyone is reading the same exact paragraph. The paragraph is short, it’s only 69-words long, but it has virtually all the sounds of standard American English. Not all of them, but virtually all of them. So if you want to hear what a Kiswahili speaker sounds like speaking English, you can do that. Or what a Romanian speaker sounds like. You can listen to what a Portuguese speaker from Portugal sounds like, what a Portuguese speaker from Brazil sounds like… you know, almost 3,000 samples from more than… currently, we have more than 381 languages represented by 175 different countries. So, almost all around the world people are speaking English and they all sound different.
We ask them a set of nine questions about their background: where they were born, what their native language is, how old they are, what their gender is… and so we have a list of attributes that each speaker possesses, you can listen to them, and for most of them you can see a phonetic transcription of their speech. It is very useful if you can understand that kind of system. You can search for things, it has a search facility, and anyone can send a sample to us.
There’s a place to send a sample. We only accept really good quality samples, so people should know that—there’s instructions. And we’ve been doing that for 20 years now. And it’s slowly growing. Our own graduate students and undergraduate students at George Mason contribute to the archive. The transcriptions are done carefully, very narrow transcription. So it’s a great resource for anyone doing research, it’s been used for more than 150 research projects, honest research projects, over the last 10 years. People write their dissertations with the data, people work on speech recognition… because you know, everyone talks differently and machine understanding needs to understand our native and non-native speaker sounds. But most of the people are ESL teachers, and people who just want to play and listen to accents!
How did you first get the idea for this platform?
It started out as an assignment in 1999 for my phonetics class. I had students record a non-native speaker and bring in the tape. We had tapes back then. (laughs) They brought in all different kinds of tapes, little tapes, big tapes, CDs… but then they were difficult to manage and the quality was variable. So we systematized it, we made it a uniform system and we put it on the internet when it was brand new!
This paragraph is so small because the bandwidth in 1999 was difficult, everyone used dial-up modems… and we’ve kept the sample paragraph since then! And students just love it, they find a speaker, they solicit his or her participation, they record them adequately reading the paragraph, they analyze speech, they compare speech to a presumably native speaker of English, and they find the issues they wanted to attend to. So it’s usually a semester-long project for each student or groups of students.
People use the platform for research and sometimes for fun. Are there other specific goals for the Archive?
We want to keep it running, gather languages we don’t have. We only have about 400 languages or so, but there are 6,000 languages in the world. There are a lot more to do.
Who runs the Archive?
(Smiles and raises hand) My students and I run it. We run it on a shoestring. It’s about due for a remake. A facelift. So we want to make it a little more computational, we want to make an app for the smartphone so people can just use their smartphone and send us their samples immediately. But this takes a large bit of funding, so we are searching for funding to remake the archive.
We’ve already developed some very good tools that are now available. We’ve developed a computational tool that will compare two transcriptions. So you can take a Romanian sample and, let’s say, an English sample from London, and this computational machine will essentially overlay one transcription on top of the other, and you can find the differences automatically. So you can make some predictions, it’s good for ESL or language assessment, and even forensic linguistics… perhaps. But we’re not quite on that level yet.
It sounds like hard work!
Yes! And the hardest work is transcribing. You know, getting it right. You need three people, three different individuals to transcribe. So we’ve had to develop a second tool to crowd source the transcriptions. We send them out to transcribers all over the world. It’s very collaborative! That’s one of the biggest things about it. Very collaborative.
Already in 2011 the Speech Accent Archive got over 1 million hits in a month…
Yes! In a month! Yeah, it was doing very well back then. I think the number of visits is somewhat lower now, but people are still checking in a lot. We get thousands of hits a month still. And when we’re doing maintenance and it goes offline for a few minutes, we get a slew of emails asking “Why isn’t it on? Why isn’t it online? When is it going back online?”— so people are paying attention.
What do you think attracts people to accents and this type of project?
Well, you know, one of the basic human abilities is to listen to people. Listen to our colleagues, listen to our friends, listen to people we meet. If we’re hearing and speaking species, for the most part, that’s the first thing we pay attention to—how someone says something. We always have an idea, we have a bias, we make a judgement about something as soon as you open your mouth. As soon as you open your mouth I know you’re not from my hometown, right? So I ask you, “Where are you from?” It’s an interest that’s built into us. So we want to make it less mystical. Accents have a reason. You can look at them scientifically. The judgements that people make, the biases that people have about someone’s speech, should melt away.
Can you give us specific examples of unusual ways in which people have used the Archive?
A few years ago we had a fellow from Ireland who got an Irish government grant to write some music to go along with the Speech Accent Archive. He wrote an entire saxophone suit— saxophone!—going along with people speaking their scripts on the archive. It’s quite beautiful! People have done art shows in Washington state, at a university inside Washington state… I think they called Please call Stella, an art exhibition. They had speakers in the area and some video, and they had people listening to archive data as they walked through the art exhibit. We can’t control what people do with the archive! (laughs) It’s open to anybody!
How can people contribute to the Archive?
You visit the Speech Accent Archive website, go to the “How to” page, click on “Submit a sample”, read the instructions, then use your smartphone [to record your speech]… we only accept CD-quality recordings, and we answer all questions if people have questions about if they are doing it right. Eventually, they have to confirm that they sent it and we send them a “thank you” note. The speaker is anonymous—nobody knows who’s doing the speaking. It’s a university… human subjects sanction, so it’s a real research project. You have to be 18 years old or older to participate.
Are there entries from native speakers of endangered languages, for example?
Yes, we’ve recently gotten some from Alaskan Yupik, which has a small population. People have said we could use many more Native American languages and contributions from places in Brazil, we don’t have very much. We also have American Sign Language, so you can hear the accent of a deaf individual.
What do you think your accent says about you?
(laughs) Well, I was born in Pittsburg, and I don’t think I sound like a Pittsburger anymore, but if I go and stay in Pittsburgh for a few months I suppose my accent would come back. Yeah I definitely have an accent, but we’ve transcribed my accent as well. It’s somewhere around the Archive. Just like anybody else!
Anything else you’d like to add?
The work is never done. It grows slowly. Every week we get more submissions… we’re gonna be putting out another call to phonetics instructors and ESL (English as a Second Language) instructors to help us with the crowdsourcing part of this. We sent out little small pieces of the paragraph for students to transcribe, and it’s a fun little project. They learn how to transcribe. By learning to listen to people’s speech, I think we can become more understanding of different kinds of variety of language.
Any aspects of research regarding accents that you think can still be developed and that you are interested in?
We’re trying to figure out what makes German speakers sound German. Or what makes French speakers sound French. What listeners listen out for. What makes a listener of a speech variety feel that one accent is different from another accent? What are the characteristics? How is someone making their vowels? How is someone making their consonants? And we’re measuring those things. We’re figuring out: what makes a Mexican Spanish accent sound the way it is?