CommercialPOV

Joe Brown, Director of the Advanced Information Management Solutions, SRA

What is data mining? How would you define it?

Data mining is turning data into information. It’s information that you can use to make informed decisions with, or “actionable decisions.”

The best thing about data mining is that your model typically comes from historical decisions. The data drives decisions. It’s not something that somebody dreams up.

Would you say that it’s easier to do today than a decade ago?

I see really data mining as really the convergence of three different technologies. One is the progression of machine learning and pattern learning. The second is the maturing of relational databases—like Oracle. And the third is the high performance computing. The cost of computing has dropped so much over the last ten years that quite frankly—to do the things that we do to things we do today in our labs—we would have to own a Cray computer to do the modeling that we do. We crunch a lot of data.

How long do companies typically store old consumer data?

People that are in the business and make decisions based on data really don’t like to get rid of the data, although sometimes they have to. Sometimes with the sheer volume, it’s really not economically feasible to keep it all. There’s typically a three year window.

Does this have the potential to discriminate unfairly against certain groups?

If a lot of your bogus returns come from a certain area, is it really unfair? Should they be screening old ladies and children at the airports, or middle-easterners who fit the profile of terrorists? If that’s where all the terrorist entities come from, maybe that’s where we should be doing the profiling.

Have you ever heard of anyone trying to do psychographic profiling to determine a state of mind based on transactional history?

No…uh-uh.

So from the way you describe SRA, you guys are trying to create filter systems that increase some measurable number, or decrease a loss.

Right. Almost everything that we do is pointed towards either making money or saving money for somebody or some entity.

It sounds like SRA’s technology is focused on looking at an aggregate group rather than an individual.

That’s right. We try to put a person into one of three or five or twenty groups and then try to profile that group. It’s very much a statistics of groups kind of thing. Rarely is it individualized.

So you’re trying to segment people into categories. So—for example—you might be lumped into a category of 50,000 people who Visa feels are unlikely to pay their bill.

I think that’s a potential problem, and certainly that’s going to go on in certain levels. Anything that you can dream up I can build a profile on. But profiles are really only going to hold averages. There will always be people on the fringe of that who are distanced from the norm.

What could be done with the technology if you started feeding all these new forms of data (Tivo viewing habits, grocery store card data, etc.)?Just curious what could be done…

I think the data access problems right now are probably insurmountable. If you look at something as simple as two companies merging their data—look how difficult that can be. These concerns are probably not going to happen for a long time.

Presumably a lot of companies are overwhelmed by the sheer volume of data. How much data is typically in a Fortune 500 company’s database?

I couldn’t even begin to guess. Certainly in the multiple terabyte range. We typically work with a very focused group on data from only two or three different sources. What we do is go in and take a part of the data for the specific group that we’re working with. We’ll often times end up building a data warehouse for a very specific application—and it’ll be on the order of half a terabyte. That’s what we’ll use as the historical data, and that will typically cover 1 to 3 years of the kinds of transactions that group is used to seeing.

Where do you feel like the data mining industry is headed?

I think trend-wise, we’re kind of getting away from what I would call “point solutions.” In other words, somebody has a problem, you drag the neural net or the decision tree out of the tool box, you solve that problem, and you throw your hands up and declare success. I think we’re getting away from that notion and beginning to think ‘well, I’ve got object-oriented programming to draw from,’ or ‘I’ve got some cool visualization stuff to draw from.’ It’s basically become more of a commodity at this point, and you can basically pull this stuff out as yet another tool to use to solve a problem. And the fact that it’s data mining or intelligence or adaptive—or whatever adjective you want to put with that—it’s getting to the point where it’s not nearly as important. People don’t make such a big deal out of it. Really these data mining technologies are getting much more mainstream.

Are there any hurdles that the data mining industry is pushing to solve?

I think that the hurdle that has always existed is still in the data. Typically you’re still dealing with a legacy system, you’re still dealing with ugly data, and you’re dealing with trying to figure out ways to pull data out of one solution and load it into your data warehouse.

When we go talk to customers, I tell them that we’re only going to spend about 30% of the time actually building the data mining model…and we’re going to spend 70% of our time worrying about your data and trying to ingest your data. So I still think the data problem is the part that hasn’t been solved very well. It’s very easy and routine to go take somebody’s data—to pull it in and do all the pre-processing one needs to do to that to plug it into one of these tools, and off you go. So the data isn’t the very glamorous part, but it’s certainly the hard and expensive part of the problem right now.

Home | About GlobalPOV | About the Editors | How to Participate | Subscribe/Contact Us | Archived Issues | Privacy Policy