Like many others I've been watching the whole PRISM issue unfurl with an increasing measure of amusement and amazement, mainly that people are surprised and shocked. There is so much BS being spouted in every direction, I thought it may help to remind everyone of the 10 Rules Of Social Data Mining:
1. "Data wants to be Free", and most people give it away as if it was. It may want to be free, but it is very valuable.
2. Everyone who can collect it therefore, will collect it.
3. Those that collect, will share or sell - the more data you have, the more valuable it is - there is a Metcalfe's Law of data.
4. That means Governments too, not just Google. Why were you surprised? Did you note that was Government s. Not just yours. Especially if all your data is on servers in another country.
5. They won't just collect it, they will store it. All of them. For a long time. A very long time.
6. They will all tell you it is all anonymised, and can't be used to find YOU. They are lying.
7. They will tell you they will only use it for specific things, not generic datamining, no need to worry. Worry. Remind yourself of (1) above.
8. They will tell you that if you have nothing to hide, you have nothing to fear. Fear. Yesterday's amusing peccadillo is tomorrow's thought crime.
9. They will finally tell you it is "necessary" to take it all, and store it forever - for cheaper services, for better efficiency, to win the War On Terror*. It isn't.
10. I will tell you I can see into your soul for the price of letting you put a few photos up and telling other people what you had for lunch. You better believe it.
If, while reading all the hoo-ha, you keep all that in mind, you may not lose your head
And (again) be careful what you put online. Datamining algorithms can tell a hell of a lot even from 30 days worth of your "what I had for lunch" tweet data once it's cross correlated with all the others in your network and teh other data out there that can be cross referenced to you
* They who can give up essential liberty to obtain a little temporary safety will get neither liberty nor safety - Benjamin Franklin
Science Fiction writer (Culture series) and "Normal" fiction novelist Iain Banks (Wasp Factory etc) has died, aged 59.
Consider Phlebas, who was once handsome and tall as you. May he rest in peace.
I'm liking Microsoft's Kate Crawford, she too is a Big Data sceptic - her " 6 Myths of Big Data" in the NYT is exactly the sort of thing we would write, so we've copied it (expurgated) here, with a few comments [in brackets]. In essence she thinks that Big Data boosters (aka Fundamentalists) are labouring under the misapprehension that more data = more facts = more accuracy, and she has pointed out 6 myths around this:
Myth 1: Big Data is New
In 1997, there was a paper that discussed the difficulty of visualizing Big Data, and in 1999, a paper that discussed the problems of gaining insight from the numbers in Big Data. That indicates that two prominent issues today in Big Data, display and insight, had been around for awhile. “But now it’s reaching us in new ways,” because of the scale and prevalence of Big Data, Ms. Crawford said. That also means it is a widespread social phenomenon, like mobile phones were in the 1990s, that “generates a lot of comment, and then disappears into the background, as something that’s just part of life.” [Never mind 1997, Big Datasets have been datamined by Telcos - the first large social network owners - large retailers, airlines and bottom feedin credit card companies for several decades]
Myth 2: Big Data Is Objective
Over 20 million Twitter messages about Hurricane Sandy were posted last year. That may seem sufficient for a picture of whom the storm affected. However, the 16 percent of Americans on Twitter tend to be younger, more urban and more affluent than the norm. “Very few tweets came out of Breezy Point, or the Rockaways,” Ms. Crawford said. “These were very privileged urban stories.” And some people, privileged or otherwise, put information like their home addresses on Twitter in an effort to seek aid. That sensitive information is still out there, even though the threat is gone. That means that most data sets, particularly where people are concerned, need references to the context in which they were created. [And of course, Twitter is not the only source of large reams of data, the reason so many people focus on it is, like the drunk looking for his key under the street-light, it is far easier to look there]
Myth 3: Big Data Doesn’t Discriminate
“Big Data is neither color blind nor gender blind,” Ms. Crawford said. “We can see how it is used in marketing to segment people.” Facebook timelines, stripped of data like names, can still be used to determine a person’s ethnicity with 95 percent accuracy, she said. Information like sexual orientation among males is also relatively easy to identify. (Women are tougher to pinpoint.) That information can be used to determine what kind of advertisements, for example, that people receive.It’s important to remember that whenever people start creating data sets, these become fallible human tools. “Data is something we create, but it’s also something we imagine,” Ms. Crawford said. [And they are prey to all the same biasses that small dataset work is prone to - being big doesn't make it better - all data models are wrong, by definition]
Myth 4: Big Data Makes Cities Smart
“It’s only as good as the people using it,” Ms. Crawford said. Many of the sensors that track people as they manage their urban lives come from high-end smartphones, or cars with the latest GPS systems. “Devices are becoming the proxies for public needs,” she said, “but there won’t be a moment where everyone has access to the same technology.” In addition, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments. Sorry, IBM. Take that, Cisco. That goes for you, too, Microsoft, Ms. Crawford’s employer. All these big technology companies have Smart Cities initiatives. [Quite - as we have pointed out ad nauseam on this blog, freeing your data can be just another way of putting you in chains. But then, if you had nothing to hide, why wouldn't you want us to have your data, I hear you say.....]
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. “With just two, you can identify 50 percent of them,” Ms. Crawford said. “With a fingerprint, you need 12 data points to identify somebody.” Likewise, smart grids can spot when your friends come over. Search engine queries can yield health data that would be protected if it came up in a doctor’s office. [This is well known by anyone working with databases, but the Big Data fans - and open Data apostles - are strangely quiet about this issue]
Myth 6: You Can Opt Out
Last December, Instagram, the photo-sharing site, changed its terms of service to allow it to share customer’s photos more broadly, even use images in ads. What it didn’t have was a paid option, in which a person could, for a fee, not be part of that. Even if that option existed, Ms. Crawford said, this would imply a two-tier system — people who could afford to control their data and those who could not. [As we've argued before, the online world shows many worrying signs of devolving into a "feudal 2.0" system, with the digital serfs trading their data for access to crappy services relying on Ad funding, and you will only get truth by paying for it
In other words, Big Data behaves in much the same way as Not-So-Big Data really.
We'd also add a Coda - Statistics, and to an extent Operations Research (or Decision Maths or whatever the latest in-word is), is the science (or art, too often) of estimating what large data sets will contain from much smaller datasets, and once those sample datasets are above a certain size they are fairly indistinguishable from the overall dataset, so long as they are properly randomly sampled. A lot of the "insights" from Big Data - the "80/20" in my experience - are usually quite easy to glean from small datasets and "Big Maths". In fact, if I may be so bold, I do think a lot of the Big Data hoo-ha is from people whose main grasp of maths is spreadsheets with $ sign denominations.
We will definitely keep a closer eye on Ms Crawford's work. I suspect Microsoft may as well
Hot on the heels of our look at McKinsey's Top 10 Technologies, we look at their " Disruptive Dozen" (They really have taken the "Number Of X" blogpost trope to heart  ).
The first thing that hits me, looking at the chart above, is that some of these are relatively tiny in impact, so its unclear how they will be "disruptive" in any significant way. The other thing that hits me is that three of these tiny ones are new energy sources. This implies a huge discrepancy between the new energy source hype/expectations, and the likely reality. I am impressed that they have not been carried away with the hype around 3D printing, we believe it won't be that high impact either ( see here)
Looking at the high impact technologies, the Cloud and Mobile internet are both already large, and their change vectors are well known. Existing systems have existed for quite some time, so why will these be disruptive? If you look at previous true disruptions, it usually comes from the "first off" delta with high penetration which has arguably already happened, not the later "build ons", in this case to the existing, globally available Server Farm/Web Hosting and Mobile Internet systems.
And as for "Knowledge Automation", this is supposed to remove the jobs of the 200 million Western knowledge workers. These, however are the people who are most connected and have political power. Besides, we suspect - again if the past predicts the future - that offshoring to lower wage economies is far more likely. Whichever, selling Automation and Globalisation was one thing when it was for blue collar workers in the boomtime, it'll be another thing when it hits lawyers, doctors and bankers in the Great Recession. History implies this will be a bunfight....
The areas that they do flag that we agree are both large and disruptive are the Internet of Things, and Advanced Robotics (of which autonomous vehicles are really just a subset). More on our thinking about these here and here.
They seem not to have featured one thing they did have in their Top 10, ie the next 3 biliion people in poor countries joining the internet via mobile systems - now history suggests that will be extremely disruptive across all vectors.
It's a bit Curate's Eggy in my view, no doubt to spur debate. Anyway, the report is well worth a read, and there is a livechat later today on the topic on #McKDisrupt on Twitter
McKinsey's latest set of ICT trends. This is a follow on from their 2010 forecast (we reviewed that here). Here is the Broadstuff expurgated version, and probably one of the few where you are likely to get a bit of a qualified reality check [In Brackets]
1. Joining the social matrix
Social technologies are much more than a consumer phenomenon: they connect many organizations internally and increasingly reach outside their borders. The social matrix also extends beyond the cocreation of products and the organizational networks we examined in our 2010 article. Now it has become the environment in which more and more business is conducted. Many organizations rely on distributed problem solving, tapping the brain power of customers and experts from within and outside the company for breakthrough thinking.
[Its a transaction cost game - see our thoughts on this over here - and in very early days. It is also largely Just Another Channel, not a New Paradigm - that was so 2012]
2. Competing with ‘big data’ and advanced analytics
Three years ago, we described new opportunities to experiment with and segment consumer markets using big data. As with the social matrix, we now see data and analytics as part of a new foundation for competitiveness. Global data volumes—surging from social Web sites, sensors, smartphones, and more—are doubling faster than every two years. The power of analytics is rising while costs are falling. Data visualization, wireless communications, and cloud infrastructure are extending the power and reach of information.
[The problem with Big Data (or data analysis, as it used to be known in the Olde Days, c 2010) is that the low hanging fruit is soon plucked (if it hasn't been already - people used quaint old ideas like Statistics and Operations Research to estimate these larger datasets and optimise things in them thar Old Days). The other issue is Beautiful Mind Syndrome* where people see patterns in the data where none exist (or worse, you anchor on non patterns you want to see) and chase after will o' the wisps. Big Data is a lot like SEO - typically once you've done the fairly obvious things (that you may well have already done - see Statistics, use of, above) you can shave small incremental benefits off things by analysing them to the nines, but a crap business will still be a crap business and you're better off changing that than crunching yet more data]
3. Deploying the Internet of All Things
Tiny sensors and actuators, proliferating at astounding rates, are expected to explode in number over the next decade, potentially linking over 50 billion physical entities as costs plummet and networks become more pervasive. What we described as nascent three years ago is fast becoming ubiquitous, which gives managers unimagined possibilities to fine-tune processes and manage operations.
[We have been active in this area for 20 odd years, and the hype still outpaces the reality unfortunately. Unit costs per sensor and the cost of integration, lack of standards and reliability of these systems at any scale are still huge barriers to widespread deployment. But it is coming, one day...see our take on it over here]
4. Offering anything as a service
The buying and selling of services derived from physical products is a business-model shift that’s gaining steam. An attraction for buyers is the opportunity to replace big blocks of capital investment with more flexible and granular operating expenditures. A prominent example of this shift is the embrace of cloud-based IT services. Cosmetics maker Revlon, for example, now operates more than 500 of its IT applications in a private cloud managed by an external provider. It saved $70 million over two years, and when one data center in Venezuela was hit by a fire, the company was able to shift operations to New Jersey in two hours. Moves like this, which suggest that cloud-delivered IT can be reliable and resilient, create new possibilities for the provision of mission-critical IT through external assets and suppliers.
[The best X As A Service operators are making thin margins at best, and increasingly having to grow by acquisition (Salesforce.com), and most are still heavily subsidised by their parent companies or running on investors money. The hassle factor of running key services remotely, with the same level of reliability and flexibility as in house, is still not a slam dunk for anything but the very basic commodity services. Exits (and valuations) are a function of hype over reality still]
5. Automating knowledge work
Physical labor and transactional tasks have been widely automated over the last three decades. Now advances in data analytics, low-cost computer power, machine learning, and interfaces that “understand” humans are moving the automation frontier rapidly toward the world’s more than 200 million knowledge workers.
[This we will watch with interest, as those 200 million are the bulk of the educated elite in developed countries, they are connected and (still) wealthy, and are not going to take this development lying down. Business leaders were able to push Globalisation when it was boomtime and only low end blue collar jobs were being hit, lets see if it still works with high end white collar ones, in Great Recession hit democratic countries. Also, the automation of knowledge work has been a holy grail through at least the last 3 main tech cycles - remember Knowledge Engineering, Artificial intelligence etc - and so far has still not got very far in practice]
6. Engaging the next three billion digital citizens
As incomes rise in developing nations, their citizens are becoming wired, connected by mobile computing devices, particularly smartphones that will only increase in power and versatility. Although several emerging markets have experienced double-digit growth in Internet adoption, enormous growth potential remains: India’s digital penetration is only 10 percent and China’s is around 40 percent. Rising levels of connectivity will stimulate financial inclusion, local entrepreneurship, and enormous opportunities for business.
[The next 3 billion citizens are very uninteresting by Western advertising economics standards, but Western price Ads (or selling data to advertisers) is required to fund the Western Dotcom industry at its current cost levels. It will thus be interesting to see the business models used for the next 3 billion, because most of the current Ad based ones won't work with Western based infrastructures]
7. Charting experiences where digital meets physical
The borders of the digital and physical world have been blurring for many years as consumers learned to shop in virtual stores and to meet in virtual spaces. In those cases, the online world mirrors experiences of the physical world. Increasingly, we’re seeing an inversion as real-life activities, from shopping to factory work, become rich with digital information and as the mobile Internet and advances in natural user interfaces give the physical world digital characteristics.
[Remember 2nd Life? All this was promised, back then. It tanked, and nothing has changed that reduces the ikelihood of another failure just yet]
8. ‘Freeing’ your business model through Internet-inspired personalization and simplification
After nearly two decades of shopping, reading, watching, seeking information, and interacting on the Internet, customers expect services to be free, personalized, and easy to use without instructions. This ethos presents a challenge for business, since customers expect instant results, as well as superb and transparent customer service, for all interactions—from Web sites to brick-and-mortar stores. Fail to deliver, and competitors’ offerings are only an app download away.
[Today, most of your competitors are still no better than you, finding the right App is a nightmare, it's not a given it'll work on your device, and there is a lot of stuff we still won't buy online - see below. But this is probably one of the areas that is tightening up faster than some of the others]
9. Buying and selling as digital commerce leaps ahead
The rise of the mobile Internet and the evolution of core technologies that cut costs and vastly simplify the process of completing transactions online are reducing barriers to entry across a wide swath of economic activity. Amped-up technology platforms are enabling peer-to-peer commerce to replace activities traditionally carried out by companies and giving birth to new kinds of payment systems and monetization models.
[Except that there is a large swathe of things we don't want to buy without physically examining them first, and a larger swathe we won't buy if we don't trust the seller, service, payment process etc. We are happy in the main with buying low cost, low risk commodities online. For everything else, we still like face to face communication. Its changing, but slowly, and in fact we are starting to see online retailers look at bricks and mortar stores to increase their penetration. Very few payment systems are "new", most have been thought about for 20-odd years (just look at all the dotcom initiatives), typically the blockers have been technology costs, vested interests or lack of user takeup]
10. Transforming government, health care, and education
The private sector has a big stake in the successful transformation of government, health care, and education, which together account for a third of global GDP. They have lagged behind in productivity growth at least in part because they have been slow to adopt Web-based platforms, big-data analytics, and other IT innovations. Technology-enabled productivity growth could help reduce the cost burden while improving the quality of services and outcomes, as well as boosting long-term global-growth prospects.
[I've been doing Internet strategy for nigh on 20 years, this area is always just about to take off in a big way. However data privacy, the problem of getting the last 20% to use new services to realise the financial benefits and related worries about democratic representation, the huge discrepancy between who pays and who benefits from public money, and expensive past failures are all big barriers to major adoption. We see no compelling short term drivers for major changes. In the medium term the cost of providing for the elderly, and public services, will force changes.].
We are not saying these things won't come to pass, but we are saying it is going to happen far more slowly than this paper implies. This will mainly be a slow evolution, not revolution. To help you work out what is real, what is coming, and what is hot air we present the Broadsuff Modifiied Gartner Hype Curve, complete with the Perpetual Hype Re-Cycle for those topics that go up and down the curve time after time but never seem to create a new service.
*I've just invented it, but I'll bet it exists by 2015 as anyone who has worked with big datasets or simulaton models has seen this phenomenon. The behaviour of anchoring - seeing things you want to see that aren't there, or stopping when you see something you like and not testing for anti-patterns - is also already well documented
More Recent Articles |
|