| |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
CommercialPOV
Joe Brown, Director of the Advanced Information Management Solutions, SRA
What is data mining? How would you define it?
Data mining is turning data into information. Its information that you can use to make informed decisions with, or actionable decisions.
The best thing about data mining is that your model typically comes from historical decisions. The data drives decisions. Its not something that somebody dreams up.
Would you say that its easier to do today than a decade ago?
I see really data mining as really the convergence of three different technologies. One is the progression of machine learning and pattern learning. The second is the maturing of relational databaseslike Oracle. And the third is the high performance computing. The cost of computing has dropped so much over the last ten years that quite franklyto do the things that we do to things we do today in our labswe would have to own a Cray computer to do the modeling that we do. We crunch a lot of data.
How long do companies typically store old consumer data?
People that are in the business and make decisions based on data really dont like to get rid of the data, although sometimes they have to. Sometimes with the sheer volume, its really not economically feasible to keep it all. Theres typically a three year window.
Does this have the potential to discriminate unfairly against certain groups?
If a lot of your bogus returns come from a certain area, is it really unfair? Should they be screening old ladies and children at the airports, or middle-easterners who fit the profile of terrorists? If thats where all the terrorist entities come from, maybe thats where we should be doing the profiling.
Have you ever heard of anyone trying to do psychographic profiling to determine a state of mind based on transactional history?
No
uh-uh.
So from the way you describe SRA, you guys are trying to create filter systems that increase some measurable number, or decrease a loss.
Right. Almost everything that we do is pointed towards either making money or saving money for somebody or some entity.
It sounds like SRAs technology is focused on looking at an aggregate group rather than an individual.
Thats right. We try to put a person into one of three or five or twenty groups and then try to profile that group. Its very much a statistics of groups kind of thing. Rarely is it individualized.
So youre trying to segment people into categories. Sofor exampleyou might be lumped into a category of 50,000 people who Visa feels are unlikely to pay their bill.
I think thats a potential problem, and certainly thats going to go on in certain levels. Anything that you can dream up I can build a profile on. But profiles are really only going to hold averages. There will always be people on the fringe of that who are distanced from the norm.
What could be done with the technology if you started feeding all these new forms of data (Tivo viewing habits, grocery store card data, etc.)?Just curious what could be done
I think the data access problems right now are probably insurmountable. If you look at something as simple as two companies merging their datalook how difficult that can be. These concerns are probably not going to happen for a long time.
Presumably a lot of companies are overwhelmed by the sheer volume of data. How much data is typically in a Fortune 500 companys database?
I couldnt even begin to guess. Certainly in the multiple terabyte range. We typically work with a very focused group on data from only two or three different sources. What we do is go in and take a part of the data for the specific group that were working with. Well often times end up building a data warehouse for a very specific applicationand itll be on the order of half a terabyte. Thats what well use as the historical data, and that will typically cover 1 to 3 years of the kinds of transactions that group is used to seeing.
Where do you feel like the data mining industry is headed?
I think trend-wise, were kind of getting away from what I would call point solutions. In other words, somebody has a problem, you drag the neural net or the decision tree out of the tool box, you solve that problem, and you throw your hands up and declare success. I think were getting away from that notion and beginning to think well, Ive got object-oriented programming to draw from, or Ive got some cool visualization stuff to draw from. Its basically become more of a commodity at this point, and you can basically pull this stuff out as yet another tool to use to solve a problem. And the fact that its data mining or intelligence or adaptiveor whatever adjective you want to put with thatits getting to the point where its not nearly as important. People dont make such a big deal out of it. Really these data mining technologies are getting much more mainstream.
Are there any hurdles that the data mining industry is pushing to solve?
I think that the hurdle that has always existed is still in the data. Typically youre still dealing with a legacy system, youre still dealing with ugly data, and youre dealing with trying to figure out ways to pull data out of one solution and load it into your data warehouse.
When we go talk to customers, I tell them that were only going to spend about 30% of the time actually building the data mining model
and were going to spend 70% of our time worrying about your data and trying to ingest your data. So I still think the data problem is the part that hasnt been solved very well. Its very easy and routine to go take somebodys datato pull it in and do all the pre-processing one needs to do to that to plug it into one of these tools, and off you go. So the data isnt the very glamorous part, but its certainly the hard and expensive part of the problem right now.
|
|
 |
 |
 |
 |
 |
|
 |
 |
 |
 |
|
|
 |