Video Deep Dive: Google’s utilization of crowdsourced data
Mike Blumenthal


This is our Deep Dive Into Local from September 11, 2017. In our Deep Dive series, we take a closer look at one thing in local that caught our attention and deserves a longer discussion.

If you have a special topic you would like us to discuss for the Deep Dive in Local, please reach out to us. If you would like to be on one or the other of our segments, reach out and send us the topic and your availability.

If you are interested in sponsoring this weekly show also please let us know.

Our weekly discussions are also now available as a podcast as well. SUBSCRIBE TO THE PODCAST HERE.


Mike: Hi, welcome to Deep Dive in Local. This week we have Joel Headley, ‎PatientPop. I think your title is Head of Local Search or something. I don’t know what your title is.

Joel: I don’t care.

Mike: Previously, he ran support teams in Google Local, he was with Google in their local effort for many, many years. He’s also worked on their Knowledge Graph product and has a certain sense of the history of Google. So we’re going to talk about Google’s utilization of crowdsourced information as a data point or more than a data point in local, and how it came to be the way it is, and what we’re seeing, some of the recent developments.

Joel: Yeah. So this grew out of a conversation about talking about Local Guides, and how Google is really trying to leverage Local Guides today to get more data into Google so that it can ultimately create new product lines or feature lines in their local search products. And in terms of…

Mike: Feature lines like accessibility?

Joel: Like accessibility. Even simple things like filters, going beyond just “open now,” like does this place accept pets on their veranda? Whatever it might be, being able to filter out and find more information about a business quickly. And it’s hard to remember, but Google’s first foray into user-generated content was the Google Local Business Center back in 2005, and that happened shortly after the maps launch at Google.

Mike: Which launched in 2004 I think, right?

Joel: Well, 2005. February 2005.

Mike: Oh okay. And there was a Google local listing…

Joel: There was local in 2004. And then they combined them, yeah. And what happened there, and I think this is fairly true, that businesses weren’t engaged with the Local Business Center at all…

Mike: And for those that did engage it was set and forget it?

Joel: It was so painful. Yeah, it was difficult…

Mike: Because you’d go in once, you’d never go back.

Joel: Yeah. And it took six weeks for something to update at least, right? If something went wrong, you’d have to restart the process and it would take 12 weeks, maybe, or even longer. So, the fact that it was just a kind of a poor data experience, made it really hard to suggest that — and they didn’t really have the usage in local certainly like they do today. But even half a decade ago they didn’t have that usage. And I think they started developing community edits as a way to say, “Hey, we do have users. Let’s get…we need to get better data about our businesses somehow, about the businesses on Google. Let’s just ask those users, because the business owners aren’t engaging.” And so that’s where you had the first, and I think they called them “Community Edits.” Mike’s infamous edit on Community Edits was something…is that…

Mike: I turned Microsoft into perhaps an escort service…

Joel: Escort service. Yeah, that was great. That made me feel good when I was working at Google, I was like I don’t…

Mike: That was when I was young and sort of impetuous.

Joel: So, Community Edits was this first way of trying to get data from Google. At the same time, you had a team in India that realized there was no map data provider for India that we could go and buy maps from. That’s tradition that Google could go buy maps from. That’s traditionally how Google got their map data. They went to Navteq and TeleAtlas, which I don’t think those brands exist anymore, but…

Mike: Navteq became Nokia and TeleAtlas became TomTom.

Joel: And they would buy the map data, and that wasn’t available in India, plus India was changing so quickly. Streets would change, move around, go all over the place. So they wanted to create road mapping data for India very quickly, and they thought they could crowdsource it.

Mike: Also in Western Europe and in United States there was government geography data as well that was readily available.

Joel: Yeah, limited but available. Absolutely. And so those efforts went underway, and as people realized, you know, you could draw roads on top of satellite imagery, Google actually developed its own internal tool, and you might have seen something called Atlas. I think they did a few interviews about this tool, this kind of massive tool that Google used to develop its own in-house mapping data. And as those tools began to develop, they realized moderation is a problem, we need… And of course one of the earliest ways to get user-generated content was reviews. We forgot to mention that. And again, reviews…

Mike: So the reviews was 2007, I think or 2008…

Joel: 2006.

Mike: 2006. It was after GMB, after maps, after GMB, after local. I thought it was 2008, but is sometime in that timeframe, 2007.

Joel: Might have been. It was…Jess Lee was the product manager. So, I thought it was fairly early. She went to that Polyvore site. And this is when the schism happened between Google and Yelp because they saw it as a competitive product, but really Google was trying to create — Yelp wasn’t available in all the markets Google wanted to be in. And so when we wanted reviews for Indonesia, for restaurants in Indonesia, Google realized they had to create their own platform for people to give reviews because Yelp was not scaling to those markets quickly enough. And it wasn’t really about creating a competitive product, and they tried to make concessions at the time — obviously didn’t work out with Yelp, but…. And those, again, reviews didn’t really stick either.

Mike: I’ve noticed that the guy who runs Yelp isn’t the easiest guy in the world to get along with. Just saying.

Joel: Search my…yeah. And so you have these tools that now Google is collecting data and realizing, “Oh we have a problem here because we don’t know what’s right or wrong. So this tool, Atlas, became now a tool that they could start moderating things with. Reviews had a different moderation pipeline, but they began to look at existence data and “ground truth” as they called it.

Mike: So, did Atlas also look at Community Edits that were occurring separate from MapMaker?

Joel: Yes. Not the original Community Edits product, but eventually they did work in some of the community…what later became Community Edits into Atlas…

Mike: What I liked about the original Community Edits product, it was real time, and I could make an edit — no matter how crazy — and get it approved in real time. There was no moderation. So there was no moderation, it was real time, it was like I had a blast.

Joel: That’s an experiment, right? Google calls it a beta, an experiment. But ultimately they saw all this data coming in, and a lot of it was actually very, very good. There were lots of corrections about businesses, lots of updates. And so they started moderation teams, and then you also had spam. So they developed moderation teams to…whenever there were edits, the edit was the signal that something needed to be looked at sometimes. They also used other…they also looked at listings that weren’t being edited. But they did use at least that as a version of a signal of what needed to be looked at. So those Community Edits, not just through MapMaker but also Maps Front End, started getting funneled into a single moderation tool. MapMaker always kind of stood separately — they could be edited, they could kind of be moderated in both tools, but it still was separate. And when you had someone…and they realized we need to develop community programs, and we saw something…we saw two things come out of that. You saw the MapMaker Guru program, and then…

Mike: Three things actually, MapMaker Guru…

Joel: And then Hotpot.

Mike: Hotpot, and then also top contributors in the forums, right?

Joel: Yeah, yeah.

Mike: I mean, that was essentially similar idea that you would take those people that were really active and try to embrace them at some level.

Joel: Yeah, I think the top contributors one happened first. I wonder…

Mike: Probably did, I don’t remember exactly when, but…

Joel: Well, certainly there was a version of top contributors even before it became a formal program. I had to…when Mike was making escort listings, I had to argue it’s still better to have them in the fold. I remember that.

Mike: Keep your enemies close…keep your friends close, but your enemies closer.

Joel: There you go. So as these…

Mike: It was a great pleasure to me like publishing an article one day, and seeing the problem disappear the next day. You don’t know how much pleasure that gave me. I once wrote an article on big boobs bounce to the top of Google search, and the next day the search listing was gone. Like, “Yes!”

Joel: It was so easy to remove spam, and spam was so much more straightforward back then.

So you see Google now realizing that community…if they’re going to ask users for data, they need to build community around it. And again, they tried to do this a little bit in Hotpot, they tried to develop a community program, they had meet-ups and stuff, but it just didn’t really stick without that gamification, that aspect that they brought into Local Guides, and as they…

Mike: They had to place these guides sort of in between…like post Hotpot. That was where to place these guides or something where they tried to create communities. But it didn’t scale very well either.

Joel: That was the same program, yeah, the same people.

Mike: Yeah, but it didn’t scale very well. I mean, it was like…and it was very labor intensive from my point of view.

Joel: Yeah, they didn’t have any tools to connect the editing activity to this kind of community program. It was like separate groups doing this thing. And local guides really said, “Hey, we can have some engineering effort, some product management, plus some community all under one house, and Google…oh, wow, we can have operations plus a product management, an engineering team all together trying to do one thing.”

Mike: Right. One of the ironies of it though is just that it provides this free resource to Google in terms of massive data editing suggestions, and obviously there’s still a need for moderation because some percentage X is still done fairly nefariously. Although, I’d be curious to know like…I read somewhere it was 10,000 edits an hour or something were coming in through Community Edits. Some big, huge number. So I guess the question is, like you said, is it just a signal that this needs to be looked at, or is it a signal for actual data information?

Joel: And depending on who you are or what the edit is, it can be both. So, I think Google is really trying…needs to take the right combination of machine moderation to human moderation and figure out that balance. And they did pretty well on some of the map data stuff. I think when you’re getting into more this opinion-based stuff, certainly on reviews they have far ways to go. On the Q and A, they haven’t even started. But on the factual information…and from Google’s perspective I think you saw…you saw them giving a tool that people wanted. I mean, people would say, “This phone number is wrong. How can I tell you that?” Right? It wasn’t like Google was like, “Oh, we’re going to take advantage of all these users,” in my view. Right? They just said, “Hey, these people are trying to tell us something. Let’s give them a tool to let them tell us what they’re trying to say.

Mike: So, do you think that the Local Guides program precipitated the demise of MapMaker in some way, or was just that MapMaker was old technology and its day had passed?

Joel: It was old technology and they did have an issue of scaling that moderation platform in MapMaker. I think ultimately though when you had a failure of your automatic moderation system, showing the Android being on the Apple, that was kind of the nail in the coffin, right? They just said, look, we need to do it a different way, and that’s why they used maps program because they realized that type of edit can be made today. And so they’ve taken away…like you said, they’ve taken away those road editing abilities and you can’t do boundaries and stuff, and through…this is part of the reason. Because it’s…that stuff’s hard. I mean, you have a whole team of people…

Mike: It is hard, but there still should be a reporting mechanism that’s easier, right? I mean the bulk of people, I think, I don’t know what the percentage is, it would be interesting to know the percentage of nefarious edits in Local Guide program. But I mean, local guides have access to edit a lot, right? Business name, business address, phone number, they can report photos, they can add photos, they can make suggestions as to attributes, they can suggest roads. But the road editing is a current failure in my mind because roads change and Google doesn’t stay up on all of them, and certainly the bulk of the edits I assume… Obviously, I hang out on the trenches, so I see the bad stuff in the trenches. But I assume that that’s a very small percentage of the total. I have no idea what that amounts to.

Joel: When they think of totals, they don’t think of the number of roads, how many are bad — they think of how much is being used, and I think you mentioned this, right? Do you get 1 report out of Olean or 300 reports out of New York City? I mean, there’s some good logic behind that. If we’re showing something to a million people, we care a lot more than something we show to three people. Right? And they prioritize…

Mike: Prioritize, right. Got it.

Joel: Prioritize. Thanks, sorry about that. The things that are being viewed most often, which makes a lot of sense, right?

Mike: It does, when you have to pick and choose. But we’ve talked about this also, that the algo though it’s predilection for volume doesn’t deny the humanity or the need of the smaller group, right?

Joel: Yeah.

Mike: And the need may be even more burning, it’s just not as many people. So there’s no way to make that sort of intelligent distinction, right?

Joel: Yeah.

Mike: So, two or three businesses may be really suffering as a result of Google losing a town, right? I mean, I don’t know if you remember when they lost towns in Florida in 2000-whatever, I was…

Joel: I’ve still seen it, actually. I’ve seen it recently.

Mike: Yeah. So there was one where I actually was acting as a media consultant to the mayor and the local florist in a town that was lost, and I said, “Look, to get their attention, you’ve got to do X.” That was the time when the mayor went on NBC, where he called Google, and he tried to get to support, and all he got was a phone number. I was like behind them saying, “Here’s what you might try.” It got fixed really quickly.

Joel: That’s always pops, yeah.

Mike: But, I know that there’s a volume thing, but also…

Joel: But that’s just gaming the algorithm, right?

Mike: What’s that?

Joel: More people saw it, that’s gaming the algorithm.

Mike: Well, that’s true. Although there seemed to be an algorithm that measured public sentiment about a problem. Is that the case? I mean, was there an algorithm that measured public sentiments same way?

Joel: No, there was no algorithm there. I mean, someone saw it and went, “Oh crap, we should fix it.” I’m just saying that if it’s based on how much visibility is, something gets PR, that is one of those things that gets an immediate fix, if something gets negative PR, or can get an immediate fix, depending on the nature of the problem. PR is just one of those things.

But I think ultimately Google, and from an insider’s perspective, it’s not saying we have data and we want more data, they’re saying, “Hey, users are telling us stuff, let’s make it easier and do it in a way that we can scale it effectively.” And they’ve done a lot of misses and they’ve tried a lot of different ways to do this, and I think for better or for worse, the Local Guides program seems to be that program that they’re going to put their weight behind. I hope that they can really, like you said, put a more human effort into it so that ultimately we can get not just better data, but a better interaction with the system so that people’s voices can really be heard.

Mike: Yeah, it’s sort of an odd mix, right? I mean, there’s some cheating going on in the Local Guides program, where people are swapping their “local guide power” for nefarious purposes, there’s that. There’s also this thing where Google is doing wrong incentives. Like there’s a lottery to get a trip to California. That’s a bad incentive because it incents people to participate in this lottery in weird ways. And so there’s some of that, too, that they haven’t… and it’s not clear to me either, once a local guide achieves a certain stature, that if they are in fact abusing the program, that there’s enough checks and balances in place to put that abuse in its place. So, there’s a danger of rewarding volume over quality, I think is part of the nature of that gamification.

But it’s interesting to me that it’s essentially…I mean, you look at what local guides can do, and I came out of the MapMaker world where I was a regional lead, but I also was early in the Local Guides program because I always liked reviews. So I early on sort of worked towards being a recognized local guide, and that has afforded me a certain privilege as it were, so that when I ran across really bad listings, I can fix them a little easier than the next person. Right? Which is nice. I mean, it’s nice for me, although again it’s one of these things that potentially can be abused, and it’s not clear that the abuse checks are in place fully on this program.

So where do you see… so there’s two big areas where there’s user contributions, one is in the top contributor program in the forums. Many forums are not even staffed with professionals, with Google staff, right? I mean, local is unique in that we’ve got a couple…

Joel: Two.

Mike: Two working on that, which is…and not only that, local is unique in that there’s been a formal pathway created for escalation so the top contributors, if a problem has been vetted by a top contributor, there’s a reasonable chance that through this escalation we can actually have an engineer look at it. So that’s unusual as well, right? So that these problems…and in some ways that’s a recognition of the humanity of the problem, right?

Joel: Yeah. And it’s also a recognition that they know that these are problems that really impact people, right?

Mike: Right.

Joel: This isn’t just like, I don’t like how my name’s appearing in the search results, this is like…

Mike: Maybe that’s because you added 743 extra characters to it.

So here’s a question finally, maybe you can answer this. Of all the thousands of users, of all the thousands of good data they get or thousands of entries, first question is, how much of it is repetitive? How much of it occurs in the say top 20%? In other words, 2,000 edits on one thing. I mean, is 80% of the edits occur on 20% of the entities? That’s the first question. Sounds like that or something along those lines.

Joel: I don’t know… I don’t remember the answer. I’m pretty sure I’ve seen that answer, but I don’t remember it. But I know that the highest-viewed listings, there are listings that constantly get edited. So you’ll go in one day and there will be edits on that day, or the next day there’ll be edits again. And sometimes there’s just misunderstanding of…and this is part of…goes back to businesses…users misunderstand what the business is trying to do, whether that might be a URL that doesn’t quite look like what a user would expect the URL to be. And you can think of these hotel chains that might be doing something funky with a branded URL versus like this local URL when they have these strange relationships.

Mike: So, all of these edits we see that your listing’s been revised or whatever the phrasing is in the GMB? These are all coming from user edits, or are they coming from third party trusted sources?

Joel: I don’t think they’re coming from third parties either, they’re coming either from Google’s moderation teams or through users, but always through a Google moderation team.

Mike: Well, there must be…like I had a problem where I was using Moz for a listing, and then in Google GMB I was adding a campaign URL to it, and it would regularly get overwritten by the Moz data. So…

Joel: Did Moz have the specific URL for Moz?

Mike: Well, it didn’t have campaign URL, it just had the base URL for the business, and that would continually overwrite my campaign URL. So there appeared to be some data providers…

Joel: I would love to see you put a tracking parameter on the Moz URL to see if it’s actually Moz data. I would be surprised if it’s actually the Moz data that’s pushing it there.

Mike: Well, I stopped doing Moz and it stopped happening.

Joel: Really?

Mike: And I asked Moz, and they said that they thought it could be the problem. The reason I didn’t want to put the tracking on is because I wanted to track just Google, and if I put it at Moz, it goes everywhere. And so then…

Joel: Well, you put the Moz URL at Moz, and see where that ends up, and then overwrite to Google. But yeah, I understand.

Mike: So, it appeared to me that there is some automation. So you’re saying now it goes through…even that, you’re saying it triggers moderation? Even an automated input of data triggers some moderation? But it’s machine moderated, it’s not human moderated.

Joel: Yeah, if you look at moderation, that’s…yeah, some moderation is machine moderation. Yeah.

Mike: Well, one thing I’d ask for, I think I’ve asked for this before, I would ask for some of this incredible intelligence to be applied to the review corpus.

Joel: I think I’ve heard that before.

Mike: I assume you have, I’ve said it a few times, I’ll say it again. Bears repeating. Although it would be interesting to open up review quality to crowdsourcing.

Joel: Yeah. And they don’t… The current flagging mechanisms, that’s not how those systems work today.

Mike: Could be because there’s…

Joel: And they are outside the realm of geo, the reviews system…

Mike: What is geo?

Joel: Like the Google Maps team. It’s outside that team because the review system is also used like the Play Store, all those reviews. All that is the same system.

Mike: And which happens to be separate from the HSA reviews — Home Service Ad reviews?

Joel: Yeah, that’s weird.

Mike: That is weird, why would they use a separate review system?

Joel: Yeah, I don’t know how those reviews work, actually.

Mike: Okay. All right. Well, thank you Joel for joining us, thank you for discussing Google’s community moderation and how it works, and how it’s evolved over time. Thanks for joining us.

Joel: Thank you.

Posted in ,