What is big data, part 3


first part we have learned about data and how they can be used to extract some metadata, or any value.


Second part explain the term Big Data and showed how it has become the industry, the cause of which was the impact of the economy. This, the third part, which should be a logical continuation of the previous two and the whole thing should make sense now — sad, sometimes ironic, and sometimes frightening. You can see for yourself how the technological, business, and even social contracts in the future, already overridden by big data in a way that we are only now beginning to understand. And perhaps they will never be controlled.


which would not have carried out the analysis of a supercomputer or compiled manually in 1665 table of the lists of the dead, some aspects of big data has existed much longer than we can imagine.


the Dark side of big data. Historically, the role of big data has not always been crystal clear. The idea of processing digits, leading to a quantitative rationalization for something that we wanted to do there since then, as we got extra money.


Remember the corporate raiders of the 80s and their new-fangled weapons spreadsheet? Spreadsheet — a rudimentary database, allowed the 27-year-old bachelor with a PC and three scraps of questionable data to engage their superiors in the looting of the pension Fund of the company and redeem shares at the expense of the loan. Without personal computers and spreadsheets, there would be neither Millenson Michaels (Michael Milkens), no Aivanov Boesky (Ivan Boeskys). It was just a version of big data period of the 80s. Pedants will say that it's not big data, but in cultural terms they had the same effect, which is industry, which today we call big data. At that time it was a lot of data.


do you Remember Reaganomics? Economist Arthur Laffer argued that to raise government revenues by lowering taxes for the rich. Some people still believe in it, but they're wrong.


Software trading — buying and selling of stock using computer algorithm collapsed wall street in 1987, because the firm took it, but didn't fully understand how to use. Each individual computer didn't understand what the other computers are working simultaneously and perhaps in response to the same rules, turning what was supposed to be an organized retreat in panic selling.


long-term capital Management in the 90s put derivatives in a position in which they had never been, only to spectacularly fail because no one knew what derivatives. If the government did not intervene in time, wall street would have collapsed again.


Enron in the early 2000-ies used giant computers to play in the energy markets, or so it seemed, until the company imploded. The story of Enron, you remember, was based on the fact that computing giants have made the company smarter, but in reality was used to mask the deception and to manipulate the market. "Pay no attention to the man behind the curtain!"


Global banking crisis of 2007 was partly driven by the large computers to generate allegedly committed financial products, which in the end, the market is not able to adapt. But was it caused by deregulation (reducing state influence on the economy)? Deregulation may persuade financiers to recklessness but more important role played by the Moore's law, and the cost of computing has decreased to a level where it may well be possible technically rape regulation.Technology has given rise to the temptation to chase after large prey.


within such failures, usually based on lies or undermined by lies. How else can you make a top-rated bonds secured by a mortgage using only garbage mortgages? Lie.


to Build conclusions on the basis of incorrect method or incorrect data — a big problem. Some experts would argue that it is easy to fix with the larger big data. And maybe it's possible, but the background of such actions shows bad results.


there is irony, and it is that when we believe incorrect big data, they are usually somehow connected with money or some manifestation of force, but we have the same tendency not to believe true big data, when they relate to politics or religion. Therefore, great scientific data is often fighting for recognition of their ideas by the skeptics, those who deny climate change or support the teaching of creationism.


Big Data and insurance. Big data has already changed our world in many ways. Take, for example, health insurance. There was a time when actuaries of insurance companies have studied statistics of morbidity and mortality to establish insurance rates. Here was involved the metadata — data about data — mainly because activists couldn't dig deep enough to get beyond the broad groups of holders of insurance policies by individuals. In this system, the profitability of insurance companies grows linearly with the growth of customers. Profit from most customers is small, so health insurance companies needed to increase the number of insured — the more the better.


Then in the 90s something happened: the cost of computing has reached a level when it became cost-effective to calculate likely health outcomes on an individual basis. It threw the business of health insurance from establishing rules for the rejection of compensation. In the U.S. business model of health insurance has moved from the coverage of the maximum number of people to the minimum amount and selling insurance only to healthy people, those who are in need of health care.


Profit of insurance companies soars, but we have millions of uninsured families.


Taking into account that society needs healthy people, selling insurance only to healthy, obviously, could not long continue. A new economic bubble, which had expected its end — so there was Obamacare (the healthcare system with the patient protection and availability service). On this subject it is possible to write the whole book, but you have to understand that something was going to happen to the insurance system has changed, to achieve social goals. the Trusting electronic smart systems somehow finding a distinctive way to cover a larger percentage of the population, while continuing to expand the boundaries of profit — insane behavior.


Big scary Google. Scam, which too often underlies the big data permeates through the entire economy and affects those people we decided to consider the epitome of big data, or even their creators. Google, for example, wants us to believe he knows what he's doing. I'm not saying that Google is not a fantastic creation and is not important, but they are built in the inaccessible algorithm, which they won't explain, as well as Bernie Madoff would not explain his investment methods. Big scary Google, he obtained universal wisdom.


Maybe, but who knows for sure?


the Truth is that all the money they do!


the Truth about advertising. Google bills advertisers, but their entire income is advertisers that pay the bills. Although it may seem banal, often advertisers don't want to know how well things are going at their company. If it was transparent enough how ridiculously low the yield from a larger volume of advertising, the advertising Agency went out of business. And because the Agency does not only produce advertising, but also place it somewhere for customers from the point of view of the advertising industry sometimes it's better not to know.


it Turns out that the Internet is criminalized. News aggregator Huffington Post has told its authors to use the terms search engine optimization in publications that estimated increase readership. Does it work? Not very clear, although studies suggest that the rate of search engine optimizer are coming up, if you insert gibberish in posts that do not make sense.


In the end, what happens is, we give in and accept a lower standard of performance. If you can help Match.com or eHarmony to find the best pair? No. But it's funny to think that they can do it, so what the hell...


Again, the strange thing is that we cynically refer to data when talking about science (climate change, creationism, etc.), but almost entirely without cynicism, when it comes to business data.


Now it is interesting to note here a fact. Data Google — unprocessed, not been analyzed in greater volume available to other companies, so why Google does not have an effective competitor in search? Microsoft's Bing certainly has access to the same information that Google does, but they have one-sixth of the users. Here it all comes down to perception of the market and Bing is not perceived as a suitable alternative to Google, though he is such.


This game is great data.


the Other, unlit, side of the story. Apple data center in North Carolina is estimated at $1 billion, it was built even before the death of Steve jobs. I spent the day parked in front of the gate of this object, and counted one to enter or exit the car. I kept count of what would be required server capacity, if Apple kept in this building multiple copies of all existing Land data, and averaged eight percent available to them at the moment space. This building is capable of containing two million servers.


Later I met with the sales agent who sold Apple every server in the building — all 20 000, he said.


Twenty thousand a lot of servers for iTunes, but they take a percentage of the area of the entire building. What is happening there? It is cheating big data: spending $1 billion on construction, Apple ever look at wall street (and Apple's competitors) as a player in Google play.


Not to say that big data is unrealistic, because they are real. Have Amazon.com any other huge retailers, including Walmart, big data is very real, because these companies real data need to be successful on the subtle facets of profitability. Walmart success has always been built on information technologies. In e-Commerce, where bought and sold the real thing, the client always remains the client.


For Google and Facebook, the customer is the product. Google and Facebook traded.


All this time, Moore's law successfully operates nakoldovat increasingly cheap and powerful computing. As we said in the first part, every decade, computing power for the same price is increased by a factor of 100, only thanks to Moore's law. A computer transaction is needed to sell tickets through the SABRE in 1955, dropped a billion times to date. What was a reasonable expense of $10 per ticket in 1955, today a tiny part of a penny, which makes no sense to consider. In the value system SABRE computing is virtually free. It completely changes what we can do with computers.



Your personal intelligence service. Computing has become so inexpensive, but personal data has penetrated so deeply that now some cloud apps turn your smartphone into a kind of machinery for mining Edgar Hoover (FBI) or the NSA today. One such tool was called Refresh and is depicted in the picture. Refresh then was absorbed by LinkedIn and is absorbed by the Microsoft, but the sample is still in effect. Enter someone's name into the phone and hundreds of computers — literally hundreds — searched social media and the web, making operational dossier on the person with whom you have a business meeting or you just want to sit with him at the bar and you don't just see everything about this man: his life, work, family, education, the system can keep track of how to intersect your life with him, to anticipate the questions you could ask or conversation topics that you might want to develop. All this within one second. And free.


the Failure of artificial intelligence. back in the 80's was a popular area called artificial intelligence, the basic idea of which was to find out how experts do what they do, reduce those tasks to a set of rules, then program computers with those rules and effectively replace experts. The goal was to teach computers to diagnose diseases, translate languages, even to find out what we want, but they are not able to understand.


It didn't work.


Artificial intelligence, or as it is called AI (Artificial Intelligence), sucked hundreds of millions of venture capital dollars in silicon valley before it was declared bankrupt. Although artificial intelligence is not clearly drawn, it was that we simply did not have enough computing power at the relevant time price to achieve these ambitious goals. But thanks to Map Reduce and cloud infrastructure, today we have more than enough processing power to create artificial intelligence.


speed hump it is Ironic that the key idea of artificial intelligence was to give the language of computers, but in reality happened is that a significant part of the success of Google was effective in the separation of language from computers and human language. Data standards XML and SQL, which are at the basis of almost all web content are not used in Google, because they realized that data structures are tailored to the person reading does not make sense for computers to communicate with each other. Due to the fact that the person is no longer required for computer communication, there has been significant progress in the field of machine learning. the This is very important, please read it again.


Understand, in the modern version of artificial intelligence we don't need to teach computers performing human tasks: they teach themselves.


Google Translate, for example, can use online, for free by anyone to translate text in different combinations between more than 70 languages. This statistical translator uses billions of sequences of words that appear in two or more languages. Here it is in English means that French. No parts of speech, any subject or verbs, no grammar at all. The system simply finds it. So that theory is not necessary. It clearly works, but we can't exactly say how, because the whole process is driven by data. Over time Google Translate will be improved more and more, making the translation based on the so-called correlation algorithms — rules that never leave the car and too complicated for people to even understand.


Brain Google. In Google there is one thing called Google Vision, recently it was 16,000 microprocessors — the equivalent of about one-tenth of the visual cortex of the human brain. He specializiruetsya on computer vision and trained in the same way as Google Translate, with a huge number of samples — in this case still images (a billion still images) taken from YouTube videos. Google Vision examines the image of the 72 hours and, in fact, teaches itself to recognize two times more than any computer on Earth. Give it an image and it will find another similar. Tell him that the image of a cat, and he will be able to recognize cats. Remember, it takes three days. How long does it take to recognize a cat in a newborn baby?



Exactly the same as Watson from IBM won Jeopardy (the Russian version of the TV show called "jeopardy" — approx. TRANS.), just reworking the questions from past issues: there was no underlying theory.


Let's get another couple of steps. Conducted research, data-driven, based on magnetic resonance tomography (MRI) images of the living brains of convicted criminals. This system differs from the example with Google's Vision, except that we examine here is another question — recidivism, the likelihood that the offender will violate the law again and return to prison after release. Again, without any underlying theory Google Vision seems to be able to distinguish between those MRI images of criminals that are capable of re-offense and those who are not able. Google's success rate for predicting the Commission of a crime is based solely on a single brain scan, and is 90+ percent. Should MRI become the tool for deciding which prisoners to release on parole? Sounds a little similar to the movie Minority Report (the minority report) Tom cruise. In this scheme, there is a huge estimated economic benefit for the whole society, but it contains a terrible aspect of the lack of theory underpinning: it works, because it works.


After that, Google scientists looked at MRI normal people while they watched billions of YouTube frames. Reworking a large enough data set these images and the resulting MRT, the computer can predict what watching the unsub.


This is called mind reading... and, again, we don't know how it works.


Promote the science, eliminating the scientists.


What do scientists do? They theorize. Big data in some cases, make the theory unnecessary or simply impossible. In 2013 the Nobel prize in chemistry was awarded to three biologists, all of which their research was built on the conclusion made by computer algorithms to explain the chemistry of enzymes. Upon receipt of this award neither enzyme was not injured.


today's Algorithms are improving twice as fast Moore's law.


What is changing is the emergence of a new workflow, information technology, which begins with the traditional:


    the
  1. the New hardware brings new software.
  2. the New software is written for new areas of activities, provided with new iron. the

  3. Moore's Law reduces the cost of iron over time, and new software becomes customer-oriented.
  4. the
  5. Rinse, repeat.

To the next generation.


    the
  1. Massive parallelism allows you to organically obtain new algorithms
  2. the
  3. New algorithms working in the consumer technology
  4. the
  5. Moore's Law effectively forced, albeit with some risks (we don't understand their algorithms)
  6. the
  7. Rinse, repeat

what's the point? A new style of implementation is beyond what is always required for a significant technological leap, a new computing platform. What will happen after mobile phones, people ask? This will be after mobile phones. How would it look? Nobody knows, and perhaps all this will matter.


in 10 years Moore's law will increase the capacity of the processor in 128 times. Throws more cores at solving problems and exploiting the fast pace of development of the algorithms, we need to increase this value in 128 times: in a total of 16,384. Remember Google Vision is currently is equivalent to 0.1 volume of the visual cortex. Now multiply that by 16 and get 384 1 638 equivalents in the visual cortex. That's where it leads.


ten years from now computer vision will be able to see things that we don't understand, just as dogs can smell cancer.


We beat against the wall of our ability to generate the corresponding theory, and at the same time finding in big data hacks to any means to further improve the results. The only problem is only that we don't understand how something works. How much time is left until the moment when we lose control?



Around 2029, according to ray Kurzweil, we will reach a technological singularity.


this year, says the famous futurist (and gurgler), for $1,000 you can buy computing power, which will correspond to 10 000 human intellects. For the price of a PC, says ray, we can use more computing power than we can understand or even explain. The present supercomputer in every garage.


combined with equally fast networks this could mean your computer — or whatever device you had — can go through in real time absolutely everything ever written words to answer literally any question. Leaving no not covered threshold.


Hide not come. Apply it to a world where every electric device is a sensor supplies a signal network, and we will have not only an incredibly effective fire alarm systems, are we likely to lose any privacy.



Nobody knows.


If you read the whole series and happens to be a Google employee, you can feel that you were attacked because much of what I describe, may threaten the current way of life, and the name "Google" is often found in the text. But it's not. More precisely, it is not so. Google — a convenient target, but the same work is done now companies like Amazon, Facebook and Microsoft, and about one hundred or maybe more other startups. Google is not the only one. And regulation of Google (which is trying to make Europeans) or attempts to throw him out of business probably will not change anything. The future is coming no matter what. Five of these hundreds of startups will be a huge success, and will be sufficient to forever change the world.


So we came to the self-driving car. Companies such as Google and its competitors are thriving thanks to the production of increasingly fast and cheap computing, because it makes their probable suppliers of products and services, data-driven, in the future. This is the future of the industry.


Today, if you take the cost of parts of the modern car, the bundle of wires that connects all the electrical bits and controls the entire mechanism is worth more than the engine and transmission! It shows what's the priority: teams and communication, and not the movement. But these costs are sharply reduced, and their functional capabilities are increased at the same rate. The amount of $10,000 allocated for the Google self-driving car, will fall to zero in a decade, after all new cars will be self-administered.


Do all new cars and self-driving nature of the car culture will change completely. Vehicles will be everywhere, they will go the legal speed limit, and between them there will be only one meter. This will increase the capacity of roads 10 times.


the same effect can have an impact on air travel. Self-governing aircraft can result in a large number of small aircraft, flocks of birds fly straight to the destination.


maybe we cease to travel. Increased computing power and faster network already enable telepresence — video conferencing life size, where the consumer needs.


Perhaps the only real communication with people outside their village, there will be times when we get to them physically touching.


All this and much more likely. Bioinformatics — the application of massive computing power in medicine, in combination with correlation algorithms and machine learning will seek answers to questions that we haven't asked and never will ask.


Maybe we will win and diseases and aging, which means that we will die at the hands of criminals, suicides or tragic accidents.


the Company's big data rush headlong, capturing important positions of the providers of the future. Moore's law went far beyond the bounds where it became inevitable. We have reached the point of no return.


And the world changes completely, and we can only guess how it will look and who -- or what -- will be his to control.


(Translated by Natalia bass)

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

ODBC Firebird, Postgresql, executing queries in Powershell

Installation LivestreetCMS on MODX Revolution package 10 clicks

The Ministry of communications wants to ban phones without GLONASS