Tag Archives: Big Data

How to make your life easier with Google

Ok, maybe the title is a little bit too optimistic, but today I want to talk about one of the many Google products that makes our daily life better.  Everyone uses Google, either for personal matters or for business interest but how many have heard about The Google Prediction API?

This, as the most of their projects, comes to our help, by learning algorithms to analyze your historic data and predict likely future outcomes. It can be very helpful, especially in the case where big amounts of data are to be handled. You can also say that Big Data is not anymore the future, it’s now and you have to know how to take advantage of it.

Among the uses of Prediction API we can mention, separation of certain types of messages, considering the languages that are written in for specific answers, or spam detection, based on comparison to a lists of already marked spam messages. But maybe the most important use case that we can think of is the purchase prediction, the ability to understand the customer’s behavior and to decide whether or not he is going to make a purchase from your e-commerce business.

In the past, this would have been done using a regression model, being very time consuming and quite hard and this is why I believe that Google Prediction API is one of the tools that will make your life easier and increase profit on your internet business.

Web Scraping’s 2013 Review – part 2

As promised we came back with the second part of this year’s web scraping review. Today we will focus not only on events of 2013 that regarded web scraping but also Big data and what this year meant for this concept.

First of all, we could not talked about the conferences in which data mining was involved without talking about TED conferences. This year the speakers focused on the power of data analysis to help medicine and to prevent possible crises in third world countries. Regarding data mining, everyone agreed that this is one of the best ways to obtain virtual data.

Also a study by MeriTalk  a government IT networking group, ordered by NetApp showed this year that companies are not prepared to receive the informational revolution. The survey found that state and local IT pros are struggling to keep up with data demands. Just 59% of state and local agencies are analyzing the data they collect and less than half are using it to make strategic decisions. State and local agencies estimate that they have just 46% of the data storage and access, 42% of the computing power, and 35% of the personnel they need to successfully leverage large data sets.

Some economists argue that it is often difficult to estimate the true value of new technologies, and that Big Data may already be delivering benefits that are uncounted in official economic statistics. Cat videos and television programs on Hulu, for example, produce pleasure for Web surfers — so shouldn’t economists find a way to value such intangible activity, whether or not it moves the needle of the gross domestic product?

We will end this article with some numbers about the sumptuous growth of data available on the internet.  There were 30 billion gigabytes of video, e-mails, Web transactions and business-to-business analytics in 2005. The total is expected to reach more than 20 times that figure in 2013, with off-the-charts increases to follow in the years ahead, according to researches conducted by Cisco, so as you can see we have good premises to believe that 2014 will be at least as good as 2013.


Hello Big Data

If you are interested in the scraping business you have probably heard by now of a concept called Big Data. This is, as the name says, a collection of data that is so big and complex that it is very hard to process. Nowadays it is estimated that a normal Big Data cell would be around tens of exabytes, meaning around 10 to the power of 18 bytes, but it is estimated that until 2020 more than 18000 exabytes of data will be created.

There are many pros and cons of Big Data because, while some organisations wouldn’t know what to do with a collection of data bigger than few dozen terabytes, others wouldn’t consider analyzing data smaller than that. Another point of view, and one of the major cons that is attributed to Big Data is the fact that with such big amount of data, a correct sampling is very hard to do,  and so, major errors could interrupt the analyzing process. On the other hand, Big Data provided a revolution in science and more generalist, in economy. It is enough for us to think that only in Geneva, for the Large Hadron Collider there are more than 150 million sensors, delivering data about 40 million times per second about 600 collision per second. As for the Business sector, the one that we are interested in, we can say that  Amazon , handles each day queries from more than half a million third party sellers, dealing with millions of back end operations each day. Another example is that of Facebook who has to handle each day more than 50 billion photos.

Generally, there are 4 main characteristics of Big Data: First of them, and the most obvious one is the volume, of which i have already talked and said that it’s growing at an exponential rate. The second main characteristic is the speed of Big Data. This also grows in direct connection with the volume because it is expected that as the world evolve the processing units to be faster. A third category it is considered to be the variety of data. Only 20 percent of all data is structured data, and only this can be analyzed by traditional approach. The structured data is in direct connection with the fourth characteristic, the veridicity of them, which is essential for the whole process to have accurate results.

To end with I would say that even if not many have heard of it, Big Data is already  a part of our lives, influencing the world we live in for many years already. This influence can only grow in the next decades until everybody will be heard of it and how decisions are made through Big Data.