This is our Deep Dive Into Local from October 30th, 2017. In our Deep Dive series, we take a closer look at one thing in local that caught our attention and deserves a longer discussion.
If you have a special topic you would like us to discuss for the Deep Dive in Local, please reach out to us. If you would like to be on one or the other of our segments, reach out and send us the topic and your availability.
If you are interested in sponsoring this weekly show also please let us know.
Mike: Hi, welcome to “Deep Dive in Local.” This week, besides Mary and myself, we have Darren Shaw and Nyagoslav Zhekov from Whitespark. We’re hoping to go into depth about their Local Ecosystem chart that they recently produced. And I wanted to welcome both of you to the show.
Darren is going to be speaking at Local U Advanced in Santa Monica November 16th, and what we don’t cover today, he’ll go into great detail there. So with that, let me just start out and ask both of you that … I’m just curious if you’d go over the origin story of the Local Ecosystem Graph for those people that haven’t been in the industry for 15 years.
Darren: Sure, Nyagoslav, do you want to talk about the history of it? You wrote that quite well in the post.
Nyagoslav: Okay, so when the first version of the Local Search Ecosystem was released back in 2009, I wasn’t really a part of the Local Search community. So, I’m not perfectly familiar with how the backstory goes, how it came to enter into be in the end of the day. But the first version dates back from 2009, and it was relatively simplistic. It just features data that likely comes from the distribution lists of the three main data aggregators of that time, Acxiom, Localeze, and Infogroup. So I guess that later on when…of course the creator of the first version was David Mihm. I’m guessing that once he moved to Moz and became the Head of Local at Moz…so they decided to cooperate and create an updated version. So, the updated version came out around 2012, and it featured a lot more data. The most interesting change to the new version — except for, of course, the additional data that was added to the infographic — was the addition of Yext, that were getting quite popular at that time. So that was pretty much around the time when Yext launched their PowerListings service. I have written an article recently on Whitespark’s blog about what I think about adding agencies of any kind to the infographic. I don’t necessarily agree with this practice. So you might notice that in the current version of the Local Search Ecosystem, there are no agencies or any kinds of alt-meta tools. Basically, in my opinion, the difference between alt-meta tools such as Yext or Moz Local or Neptune or Rio SEO, there are a bunch of others there or agencies that have some kind of special relationships with some or all of the data aggregators of which there are literally tens. So, there is pretty much no difference between them, I would say. Of course, some of them have more special relationships, but we decided to exclude them from the current version of the infographic. And then…
Mike: Howard Lerman’s going to haunt you in your sleep, Nyagoslav. So let me ask you this question. Obviously the layout and design changed a lot. How did you develop the new one, and more critically to me, how did you confirm or develop your hypotheses of the various relationships between them? Between these entities.
Darren: So I’ll talk about the design and then layout of the new one, and then, Nyagoslav, it would be great if you talked about some of the empirical research that’s behind the new ecosystem. So the design went through a number of iterations. Taking the opportunity to redo it, we really wanted to see if we could make sense of the spaghetti chart that is the Local Search Ecosystem. Because it is a complicated thing, and that’s almost the most striking thing about it. It was always like, “Here’s the Local Search Ecosystem. It’s this crazy thing.” And everyone was like, “Well, that looks like Tokyo subway map. What is that? I don’t understand that.” So…
Mike: I used to show it at Local U just to convince people who attended Local U that they needed a consultant.
Darren: Exactly. Basically, we wanted to try and make some sense of it and come up with a way of conveying all the different things that it should convey without it being so complicated. And so the first version was bad. That didn’t work at all. We tried another version. But David is very particular, and every version that we suggested, he would shoot it down. And he always had very good points about why he was shooting it down. Because, version two didn’t properly demonstrate the importance of, the search engines. Like, Google was just a little icon on all of them, so you didn’t really get the sense of which sites were important.
So eventually he suggested the donut chart thing. So he had an example. He’s like, “Well, what if we did something like this?” And then that was, I thought, brilliant. And so it was great, because at the top of the infographic, you’ve got the primary data aggregators, and the sides of each segment on the donut chart indicates or how important the site is. And then data kind of flows down from the aggregators down to the bottom, which we have Google, the biggest one, Bing and Apple flanking it, and then you’ve got some other sites next to the aggregators, which are still very important. So you’ve got things like Yelp and Facebook and DNB. These are sites that are also quite prominent and important in the ecosystem. And then you’ve got all the smaller sites. So the donut works really well to sort of convey importance of site and also to flow the data so you could see how things go. And then so I came up with the idea of making it dynamic, so you can just mouse over the pieces, and I think it does a pretty good job. Because now, if you want to see, “Well, I want to see just the sites that Infogroup feeds,” you can just click that and you only see the Infogroup. So it’s really nice to be able to see that data. And so the new design, that’s how it came about. Now the data that’s behind it, I’ll let Nyagoslav speak to that. So there was the traditional way that David had always collected the data. So that was the going out to all of the sites, getting their distribution lists. Lots of times, they will mention, “Data provided by Acxiom or Infogroup.” They’ll have that in their footer or their Terms page or on their About page. So that’s a way to understand where sites are getting their data from. And then we also have some empirical research that Nyagoslav certainly will talk about.
Mike: Just a note about David, just in that, he and I and several others would often discuss it, and we would include knowledge that we had garnered through experience. So he did have some empirical data as well as resources. But go ahead, Nyagoslav.
Darren: In certain things, yes.
Nyagoslav: All right. Sure. So I mean, basically, what Darren mentioned — we complemented the previously available data that definitely David and, as mentioned, yourself and probably others collected. So it included the distribution lists from the three main data aggregators. It included different types of attributions that were available, not just in different, like, Legal pages or Terms of Service pages or something like this, but many of them are actually available on the listing pages themselves. So we decided that, although all of this was great, and as you mentioned, it seemed like something was missing. And I think what was missing was actually for us to… Okay, you have the theoretical part. So you have how the ecosystem is supposed to work because you have the data aggregators’ distribution lists and you have all the websites’ attributions. So this is how it’s supposed to work. But we wanted to discover if, in practice, this works like this and they’re not, after we did all the empirical research that Darren mentioned, that in practice what happens within the ecosystem is quite different than what should actually happen in the theory.
So we completed a couple of empirical researches, and we did something additional. We actually directly contacted each of the business directories that were included in the Local Search Ecosystem infographic. Unfortunately, that didn’t go too well, because we received replies from just…I think less than 10 of them. And just a few of them actually gave any useful information. So of them just mentioned that they cannot share with us their partners or something. I have no idea how the legal side of this works, but we didn’t have a lot of success with this.
So the empirical researches included the data aggregators’ research that was completed in 2016. What we did together with Darren is we set up basically fake businesses on all of the data aggregators and we waited for the data to get distributed across the ecosystem. And based on obviously this distribution, we were able to, ascertain that particular relationships actually did exist and were pretty strong. So we had an interest … Darren actually had an interesting conversation with Localeze after we released the infographic. So they mentioned one caveat in our research and our findings and our observations. And I agree with that, by the way. So what Localeze sort of mentioned was that what all of the sites that are within Localeze’s distribution list…and I believe that applies to all the other data aggregators…all of these sites do purchase information from Localeze. However, it’s up to the sites…the platforms or business directories, it’s up to them if they are going to be using this information or not. So if they would actually generate a listing or just keep the data in the backend database or something like this and potentially use it at a later point or never use it, it’s up to the platforms themselves.
So it’s not possible, it’s pretty much confirmed, that Google buys data from likely all of the major data aggregators — Acxiom, Localeze, and Infogroup. I don’t think they have a relationship with Factual. So I believe they do buy the data, but if the only place in the whole web universe that this particular business information is found is on, let’s say, Localeze, Google might choose to actually not generate a listing. So that doesn’t mean that Localeze doesn’t have a relationship with Google. It just means that Google needs stronger proof that this business actually exists.
When we were actually analyzing the data when putting together the final infographic, we decided to have three types of relationships between the data aggregators and the business directories in order to show how strong or how likely a particular relationship between a data aggregator and a business directory was. So that’s why you can see that some of the lines are actually dashes. Some of them are dotted, some of them are solid. And these represent different levels of relationships between the data aggregators and the directories. And the other research we did was…so we had this clean-up service, and during this service, those details additionally for all our clients. So we used quite a number of citation audits. We analyzed them in detail, and we tried to figure out where there were overlappings between any of the data on a side that might potentially be sending the information and the side that might be receiving the information. And this was not related. So this was basically purely, field research. It was not done in laboratory conditions. So it’s possible that a lot of data might have been tainted or, you know…that’s why actually we analyzed quite a large number of audits, in order to be sure that, the data is as accurate as possible. But just a small part of this data actually made it to the final version of the ecosystem.
Mike: Got it.
Mary: We got a little bit of a peek of this at MozCon Local in Darren’s presentation. I’m wondering how long have you guys been working on this? I know it’s been at least a year, but has it been even longer than that?
Darren: The research goes back to 2015, I think. That’s when we started it. And then I did present about the data aggregator stuff then, at MozCon Local 2016. And then MozCon Local 2017 is when I presented the Local Search Ecosystem. So it’s a few years running, yes. Two, two and a half, I guess. We kinda started this stuff in October 2015 or something, yes.
Mary: Well, thank you.
Mike: So I have one comment. I think we’re going to give Darren a new nickname, “Darren the Spammer Shaw” here, based on this research methodology. The question I have is, have you seen a shift in the relative importance of these primary data distributors over the years in terms of who is buying them or who’s using them directly and the potential influence they might have?
Darren: I think so. I think if you think about what Nyagoslav said, that Localeze commented on is that, like, a site might accept a fee from one of these data aggregators, but if they’re not getting multiple sources confirming that data, then it might not go live, whereas, I think, if you go back five years, the data was trusted as is. Like, you got a listing at Localeze, it’s going to show up on this site, for sure.
And another thing I thought was interesting is that Localeze also passes the data back with what’s called a “Trust Score.” So it can be a scale from 1 to 10 of how trustworthy that data is. And so when a listing shows up on Localeze and the only place they ever saw that business data appear was our submission, then it has a low Trust Score. And that Trust Score was passed to all of their partners, and so then the partners can say, “Well, we’re only going to display listings in our site that have a trust score of X or higher.” Because Localeze has already done the work for us. They’ve already determined that this business doesn’t have a lot of other sources backing it up. And so I assume all of the other aggregators do something very similar. So that makes sense — if I’m buying data from Infogroup or Localeze, I want to know how trustworthy is that data. And so I think that that’s actually a flaw in our empirical research. We thought we had this concept that, okay, we’re going to create a business…we’re going to make a fake business, we’ll stick it in Infogroup and see how it distributes. Well, we don’t really get to see the full picture. You don’t get to see that, it’s going to hit all those sites. Because a lot of those sites might just discard the data because it doesn’t have a high enough Trust Score for them to say, “We’re going to put that in our database.”
Mike: Note on Localeze, they get to see all the phone switches. They do a lot of…
Darren: For sure.
Mike: And so they know if a given number has traffic, for example, which is one of their offline attribution methods, which raises this other question, which is offline sources, right, like the IRS or like government databases, have you done any research into those, Nyagoslav, in the United States? I know it’s different in different countries. Like in Saudi Arabia, the government literally is the source for all business data, right? So there, it’s a singular source. The United States, it’s much more confused. But have you done any research into typical sources or the standard or the most important sources that are offline?
Nyagoslav: Sure. So basically, in the United States it’s not just a matter of different states. It’s also a matter of different industries. So in some industries there are industry-specific, very important arcs … if I could call them “arc” … sources of data. So, for example, in the medical industry, like doctors, dentists, you have this national provider identifier, NPI, that comes from…I believe it’s called NPDES or something like this. It’s “National Provider Enumeration System.” So basically, in this system, each medical provider is assigned a unique number. So a lot of sites, including federal, for example, they source data for medical practitioners based on this NPI number. So in the medical industry, this is by far the most important industry-specific arc source of information. So it’s not really a citation because it’s technically a government source. So in the legal industry, for example, you have the state bar associations. So most of the legal business directories like Justia, AVVO, Lawyers.com — they source information from the state bar associations. So this is, another government…if you could call it “government”…type of entity that provides business data to a large number of business directories. We haven’t looked, in particular, in, like for example, in the United States is not some municipal chambers of commerce or business merchant associations or something like this, which Google trusts the most when it comes to data.
Mike: I would suggest that I know the answer to that, and it’s the Better Business Bureau in terms of these types of directories, particularly in service-area businesses, they put a great deal of trust in the Better Business Bureau having that business.
Nyagoslav: Sure. If…
Darren: Write that one down, people. Better Business Bureau, that’s a good one.
Mary: Also, in the U.S., you have to register your business. You have to get a business license from the state, sometimes from the county, sometimes from the city. So I have always assumed…because I’ve seen it have very negative impacts when the Secretary of State business license information was incorrect. I think that has a really big impact.
Mike: Yep, and the route for that, typically, is Acxiom. They got their start actually scanning court documents and digitizing it. That was sort of their core business model, as opposed to Infogroup, which scanned phone books, right, and Localeze, which scanned phone numbers. So they each sort of had a different origin story, and they each brought different source data, originally, to the mix. I think they all actually trade information between them at some level as well these days.
So it’s 21 minutes in. Why don’t we take a break and we’ll call this the look at the top-level data sources. And we’ll come back together for the next segment. How’s that sound for you guys?
Mary: Sounds good.
Mike: Allrighty. So thank you very much for joining us today, and we’ll see you in a few minutes for the next segment.